carrow@itd.nrl.navy.mil (Steven Carrow) (01/02/91)
I am currently doing research on implementing a Rete-based system on a hypercube architecture. I have done some searching of the literature here at Naval Research Laboratory, but I have not found much beyond the 1988 AAAI article written by Gupta and Tambe. I was wondering if anyone knew of further research in or implementations of Rete-based systems on hypercubes. I am also interested in exploring possible applications of vectorization techniques in this field; any info on this would be appreciated, as my search here came up dry. Pointers on where else to look will be appreciated as well, as my research skills are rusty. Although I used the NRL library, I do not know that I used it well. My US Mail address is: Steve Carrow NRL, Code 5570 4555 Overlook Ave., SW Washington, DC 20735-5000 Thank you for your time. ***************************************************************************** * Steven A Carrow * ARPA: carrow@itd.nrl.navy.mil * * Code 5570 * UUCP: {backbone}!uunet! * * Naval Research Lab * nrl-css!carrow (I think) * * Washington, DC 20375 * Disclaimer: I can barely speak for * * (202) 767-9125 * myself, much less the government. * *****************************************************************************
moskowit@paul.rutgers.edu (Len Moskowitz) (01/04/91)
Though they're not working specifically on the Hypercube, Sal Stolfo at Columbia University and Anurag Acharya at CMU have been doing much of the work on parallelizing OPS5-type languages. We at Bendix have CMU's CParaOPS5 running here on our Suns and with a little luck it'll soon be running on our new 8 node Tadpole 88000 RISC multi-processor. CParaOPS5 uses a truly parallel Rete. We've spoken recently with Stolfo and his group is doing very interesting work on changing OPS5's semantics to allow truly parallel programming. Might pay to contact him. I'd like to hear what you're doing. Have you looked into other approaches than parallel Rete, such as rule level parallelism or multiple independent rule systems? ======================================================================== Len Moskowitz Allied-Signal Aerospace Bendix Test Systems Division Teterboro, NJ 07608 moskowitz@bendix.com moskowitz%TSD@atc.bendix.com moskowit@paul.rutgers.edu
schmolze%cs.tufts.edu@RELAY.CS.NET (01/10/91)
I am not on the parallel bulletin board, but learned of several messages concerning the above subject, in which I have some research results. Would you please post the message below? I think it will be of interest to those involved in the ongoing discussion. I'm not ready to join this bulletin board just yet, so I've included a note to respond directly to me. Thanks -- Jim ----- I am not a member of this bulletin board, but have seen some messages from it and wanted to respond to requests for information. There is much research on parallelizing OPS5-like languages for distributed-memory machines that is ongoing, and the messages I've seen regarding this topic did not mention all of it. So, here is some more of it, which includes work of mine. 1. My own work has focused on the PARS system, which runs OPS5 on a hypercube (by NCUBE) where: - the WM and PM is distributed over the processors, - each processor runs its own basic loop (MATCH-SELECT-ACT), - the processors run asynchronously for the most part, - actions that affect the WM or PM stored in other processors are forwarded as needed, - certain communications are performed during the basic loop so as to guarantee: * serializability -- the resulting overall WM is one that could have been produced on a serial machine running OPS5 from the original WM and PM. This is my correctness criterion. Note that PARS does not guarantee any particular conflict resolution strategy. * no errors due to temporary WM inconsistencies -- while actions are being communicated between processors, temporary WM inconsistencies can occur which can lead to non-serializable results. These problems are prevented. * no deadlocks -- in order to guarantee serializability, certain rules must be prevented from co-executing at the same time. I perform this in a way that prevents deadlocks. The theoretical foundation of PARS is described in: "A Parallel Asynchronous Distributed Production System", by Schmolze and Goel, Proceedings of the Eighth National Conference on Artificial Intelligence (AAAI-90), Boston, MA, July 1990. Not all of the proofs are not included in that paper, but the missing ones are nearly identical to the proofs appearing in the following paper that builds upon earlier work of Ishida and Stolfo and runs multiple rules in parallel in a synchronous fashion for shared-memory machines. "Guaranteeing Serializable Results in Synchronous Parallel Production Systems", Technical Report 89-5, Dept. of Computer Science, Tufts University, October 1989. (For those interested in shared memory approaches with multiple rule firings, I am in the midst of collecting benchmark measurements for a variety of algorithms. This work should be completed by February 1.) The loss of a conflict resolution strategy in PARS (and pretty much all similar works) is unfortunate but necessary. To make up for some if its impact, in PARS I have extended OPS5 slightly to include a special provision for forming rule groups. Basically, only the rules in the "current" group can be executed. This addresses the main problem that the loss of a conflict strategy causes. The PARS system is itself in the midst of being completed. It will run on the NCUBE hypercube and is based on the C version of serial OPS5 written by Miranker et al at U. Texas, which is called OPS5C. The only dependence on NCUBE (versus, say, Intel) is the manner of starting it up and of sending messages, thus it could be ported fairly easily. 2. Work that is similar to my PARS work but done independently is that of Ishida and Yokoo (of NTT Labs in Japan) and Gasser (of USC). Their system also addresses the problem of running OPS5 on a distributed memory machine in an asynchronous fashion. The resulting system is very similar to PARS except that: - it does not prevent deadlocks nor problems arising from temporary WM inconsistencies, - it offers a dynamic distribution algorithm, whereas PARS offers a static distibution algorithm. The best parts of PARS and this system can easily be combined, something that Ishida and I have discussed. As far as I know, Ishida et al are not currently constructing an implementation. Contact: shochi.ishida@ntt-20.ntt.jp 3. Work on distributed memory machines is also being pursued at U Texas by Miranker et al. At my last reading of their work, they were still in the midst of designing their system. Most of their results at that time were with regard to re-writing rule sets so as to take advantage of parallel hardware. (Of course, this group has a host of other results from TREAT to lazy serial matching algorithms to etc.) Contact: miranker@cs.utexas.edu That's all for now. If you wish to contact me, please do so directly as Iam not a regular member of this bulletin board. Jim Schmolze Dept. of Computer Science Phone: 617/381-3681 Tufts University Internet: schmolze@cs.tufts.edu Medford, MA 02155 USA
Anurag.Acharya@dravido.soar.cs.cmu.edu (01/10/91)
Distributed memory machines have not been popular platforms for implementing Rete and other production system match algorithms. There have been several efforts to place production systems on distributed memory machines but most of them were either not pursued to the point of actual implementation or did not yield encouraging results. Here is a list of some of the projects that I know about with a reference to each one of them. 1. The Dado project at Columbia : Salvatore Stolfo, Dan Miranker [1] 2. The Non-Von project at columbia : Bruce Hillyer and David Shaw [2] 3. The Pesa-1 project at university of kaiserslautern, germany : Franz Schreiner and Gerhard Zimmermann [3] 4. The Rubic Project at University of Southern Calif : Dan Moldovan [4] 5. The Cupid project at university of waterloo : Michael Kelly and Rudolph Seviora [5] 6. Mapping Rete onto a dynamically reconfigurable tree-structured architecture : BT Smith and D Middleton [6] 7. Mapping Rete onto the Connection machine using Oflazer's algorithm : Rex Flynn [7] 8. Compiling constant productions into Activity Flow Networks and mapping them onto the Connection machine : Guy Blelloch [8] 9. Mapping Rete onto the MIT tagged token dataflow architecture : Jean-Luc Gaudiot, Sukhan Lee and Andrew Sohn [9] In addition, there was some work done by a couple of Japanese researchers which was published in a large-scale database/knowledgebase conference -- the names and details of the work escape at the moment but I have a hazy recollection that the approach taken resembled that taken by the CUPID project. Most of these efforts were based on custom hardware. Milind Tambe and Anoop Gupta published a paper in AAAI that described a mapping for fine-grained stock hardware, i.e. with no special support for production systems. [10] Milind Tambe and myself followed up on this. We modified the original mapping which was intended for fine-grained machines for medium-grained machines (that is machines with 2-25 processors). We performed simulations to evaluate the mapping and study the effect of varying the computation/communication ratio on the speedups. The simulations assumed a crossbar interconection network and Sparc-based processing elements. Results indicated that it was possible to achieve reasonable speedups (where reasonable means between 4 and 10 fold) on such machines. This work has been reported at ICPP'89 [11]. A more detailed version of the report is soon going to be published in the IEEE transaction on parallel and distributed systems. I had some communication with Bill Harding, who was a grad student at the Air Force Institute of Technology (at the Wright-Patterson Air Force Base) about placing production systems on hypercubes. He was working with Gary Lamont who is on the faculty of the institute. However, I have not been able to get in touch with him since Feb last or thereabouts. Attractive as they might seem for parallelization, production systems have not been effectively parallelized. To the best of my knowledge, only two parallel implementations have been made available : ParaOPS5 and CParaOPS5. Both of these are on shared memory machines and are available from the School of Computer Schience, Carnegie Mellon University. [1] Salvatore Stolfo and Dan Miranker, "The Dado Production System Machine", Journal of Parallel and Distributed Computing (JPDC), volume 3, 1986, pp 269-296 [2] Bruce Hillyer and David Shaw, "Execution of OPS5 Production systems on a Massively Parallel Machine", JPDC, vol 3, 1986 pp 236-268 [3] Franz Schreiner and Gerhard Zimmermann, "Pesa-1 - A Parallel Architecture for Production Systems", ICPP'87, pp 167-169 [4] Dan Moldovan, "RUBIC: A Multiprocessor for Rule-based Systems", IEEE Transactions on Systems, Man and Cybernetics, vol 19 (4),Jul/aug 89 [5] Michael Kelly and Rudolph Seviora, "Performance of OPS5 Matching on CUPID", Microprocessing and Microprogramming, vol 27, 1989, pp 397-404 [6] BT Smith and D Middleton, "Exploiting Fine-grained Parallelism in Production Systems", National Conf of the Canadian Society for the Computational Studies of Intelligence, 1987, pp 262-270 [7] Rex Flynn, "Placing Soar on the Connection Machine", AAAI workshop on "How Slow Elements can Think so Fast". [8] Guy Blelloch, "CIS : A Massivley Concurrent Rule-based System", Probably in AAAI -- I have the paper but don't have the reference [9] Jean-Luc Gaudiot, Suckhan Lee and Andrew Sohn, "Data-driven Multiprocessor Implementation of the Rete Match Algorithm", ICPP'88 [10] Anoop Gupta and Milind Tambe, "Suitability of Message Passing Computers for Implementing Production Systems", NCAI'88, pp 687-692 [11] Anurag Acharya and Milind Tambe, "Production Systems on Message Passing Computers: Simulation Results and Analysis", ICPP'89. anurag ps: Steve, I tried to get this message to you but my mail either hung in the mail queue or bounced right back at me after a couple of days.