CSL.GERLACH@SU-SIERRA.ARPA (Sharon Gerlach) (02/10/86)
From: Sharon Gerlach <CSL.GERLACH@SU-SIERRA.ARPA> On Friday, Feb 21, Anoop Gupta, a CSL faculty candidate from CMU, will be speaking on "Parallelism in Production Systems" in MJH 352 at 3:15. Parallelism in Production Systems Anoop Gupta Department of Computer Science Carnegie-Mellon University Pittsburgh, PA 15213 Production systems (or rule-based systems) are widely used in Artificial Intelligence for modeling intelligent behavior and building expert systems. Most production system programs, however, are extremely computation intensive and run quite slowly. The slow speed of execution has prohibited the use of production systems in domains requiring high performance and real-time response. The talk will elaborate on the role of parallelism in the high-speed execution of production systems. On the surface, production system programs appear to be capable of using large amounts of parallelism -- it is possible to perform match for each production in a program in parallel. Our research shows that in practice, however, the speed-up obtainable from parallelism is quite limited, around 10-fold as compared to initial expectations of 100-fold to 1000-fold. The main reasons for the limited speed-up are: (1) there are only a small number of productions that are affected (require significant processing) as a result of a change to working memory and (2) there is a large variation in the processing requirement of these productions. Since the number of affected productions is not controlled by the implementor of the production system interpreter (it is governed mainly by the author of the program and the nature of the problem), the solution to the problem of limited speed-up is to somehow decrease the variation in the processing cost of affected productions. We propose a parallel version of the Rete algorithm which exploits parallelism at a very fine grain to reduce this variation. We further suggest that to exploit the fine-grained parallelism, a shared-memory multiprocessor with 32-64 high performance processors should be used. For scheduling the fine-grained tasks consisting of about 50-100 instructions, a hardware task scheduler is proposed. The results presented in the talk are based on simulations done for a large set of production systems exploiting different sources of parallelism. The simulation results show that using the suggested multiprocessor architecture (with individual processors performing at 2 MIPS), it is possible to obtain execution speeds of 5000-27000 working memory element changes per second. This corresponds to a speed-up of 5-fold to 27-fold over the best known sequential implementation using a 2 MIPS processor. This performance is also higher than that obtained by other proposed parallel implementations of production systems.