E1AR0002@SMUVM1.BITNET (02/25/86)
Bibliography of Technical Reports Computer Science Department University of Wisconsin July 1985 - December 1985 For copies, send requests to: Technical Reports Librarian Computer Science Department University of Wisconsin 1210 W. Dayton St. Madison, WI 53706 %A Honesty Cheng Young %T Evaluation of a Decoupled Computer Architecture and the Design of a Vector Ex tension %D July 1985 %R TR 603 %I Computer Sciences Department, University of Wisconsin %C Madison, WI %X Abstract: To meet increasing demands for computing power, systems must exploit parallelism at all levels. A balance between the processing speeds of a machine's subsystems is critical for overall system performance. A high performance system must have a fast scalar mode as well as a vector mode which reduces the fraction of work that the scalar mode must process. PIPE (Parallel Instructions and Pipelined Execution) is a very high performance computer architecture intended for a heavily pipelined VLSI implementation. The primary goal of the PIPE architecture is fast execution of scalar programs. PIPE has several novel features in order to meet this goal. These features include architectural queues which reduce the effective memory accessing delay; a prepare-to-branch instruction which decreases the penalty incurred by branches; and decoupled execution mode which increases the instruction initiation rate. In this study, a compiler (for a subset of Pascal) has been developed to evaluate the effectiveness of these architectural features. A code scheduler takes advantage of the architectural queues and the prepare-to-branch instruction. Software pipelining (which can prefetch operands across loop boundaries) also takes advantage of these features. A code generator generates two instruction streams (of sequential tasks), that run on the two processors required for PIPE's decoupled mode. A queue-based vector extension is proposed. This extension has the following characteristics: (1) two level control used to support out-of-order instruction initiation, (2) multiple classes of registers, (3) a very short vector start-up time (one clock period), (4) branch instructions used only for implementing high level language control structures, (5) elements of a queue may be read repeatedly, and (6) easily vectorizable property. We demonstrate that the original PIPE architecture supports fast execution of scalar programs, and that the vector extension facilitates vectorization. %A Carl deBoor %A Klaus Hollig %A Sherman Riemenschneider %T Convergence of Cardinal Series %D August 1985 %R TR 604 %I Computer Sciences Department, University of Wisconsin %C Madison, WI %X Abstract: The result of this paper is a generalization of our characterization of the limits of multivariate cardinal splines. Let &~~M sub n~~& denote the &n&-fold convolution of a compactly supported function &~~M~ epsilon ~L sub 2 (R sup d )~~& and denote by .EQ S sub n~:=~"{"sum from {j epsilon Z sup d}~c(j)M sub n ( cdot ~-~j)~:~c~ epsilon ~l sub 2 (Z sup d )"}" .EN the span of the translates of &~~M sub n&. We prove that there exists a set &~~OMEGA~~& with &~~vol sub d ( OMEGA )~=~(2 pi ) sup d~~& such that for any &~~f epsilon ~L sub 2 (R sup d ),& .EQ dist(f, S sub n )~->~0~~as~~n~->~ inf , .EN if and only if the support of the Fourier transform of &f& is contained in &~~OMEGA bar&. %A Tobin J. Lehman %A Michael J. Carey %T A Study of Index Structures for Main Memory Database Management Systems %D July 1985 %R TR 605 %I Computer Sciences Department, University of Wisconsin %C Madison, WI %X Abstract: One approach to achieving high performance in a database management system is to store the database in main memory rather than on disk. One can then design new data structures and algorithms oriented towards making efficient use of CPU cycles and memory space rather than minimizing disk accesses and using disk space efficiently. In this paper we present some results on index structures from an ongoing study of main memory database management systems. We propose a new index structure, the T-tree, and we compare it to existing index structures in a main memory database environment. Our results indicate that the T-tree provides good overall performance in main memory. %A O. L. Mangasarian %A T.-H. Shiau %T Error Bounds for Monotone Linear Complementarity Problems %D July 1985 %R TR 606 %I Computer Sciences Department, University of Wisconsin %C Madison, WI %X Abstract: We give a bound on the distance between an arbitrary point and the solution set of a monotone linear complementarity problem in terms of a condition constant which depends on the problem data only and a residual function of the violations of the complementarity problem conditions by the point considered. When the point satisfies the linear inequalities of the complementarity problem, the residual consists of the complementarity condition &~~x(Mx~+~q)~~& plus its square root:&~~(x(Mx~+~q)) sup half &. This latter term is essential and without which the error bound cannot hold. We also show that another natural residual that has been employed to bound errors for strictly monotone linear complementarity problems, fails to bound errors for the monotone case considered here. %A Pudukkottai K. Subramanian %T Iterative Methods of Solution for Complementarity Problems %D June 1985 %R TR 607 %I Computer Sciences Department, University of Wisconsin %C Madison, WI %X Abstract: Many problems in optimization theory such as linear programming, quadratic programming and problems arising in diverse areas such as economic equilibria, electronic circuit simulation and free boundary problems can be formulated as complementarity problems. It is well known that where the matrices involved are large sparse matrices, the usual pivoting methods are not very efficient and sometimes even fail. This thesis is a contribution to the ongoing research in the area of iterative methods for the solution of linear and nonlinear complementarity problems. We begin by considering complementarity problems where the operators are monotone and consider their Tihonov regularizations. We obtain bounds for the solutions of the perturbed problems and in particular, estimates for the rate of growth to these solutions. In the case of linear complementarity problems with positive semidefinite matrices, these results reduce the solution of the LCP to the solution of a sequence of positive definite symmetric quadratic programs. We propose SOR (successive overrelaxation) algorithms to solve these subproblems. In the case of complementarity problems with nonempty interior, we show that given any preassigned feasibility tolerance our algorithm solves the problem by solving a Tihonov regularization problem for a single value of the parameter. We consider monotone complementarity problems as fixed point problems. We prove convergence of iterative algorithms which find fixed points of carefully constructed projection maps using summability methods. Since the solution of the nonlinear complementarity problem is equivalent to the solution of a system of nonlinear equations, we develop damped Gauss-Newton methods which under appropriate hypotheses solve this system with a local quadratic convergence rate. We present computational results which show that the use of SOR methods in conjunction with the Gauss-Newton methods is computationally effective. %A David P. Anderson %A Lawrence H. Landweber %T A Grammar Based Methodology for Protocol Specification and Implementation %D July 1985 %R TR 608 %I Computer Sciences Department, University of Wisconsin %C Madison, WI %X Abstract: A new methodology for specifying and implementing communication protocols is presented. This methodology is based on a formalism called "Real-Time Asynchronous Grammars" (RTAG), which uses a syntax similar to that of attribute grammars to specify allowable message sequences. In addition RTAG provides mechanisms for specifying data-dependent protocol activities, real-time constraints, and concurrent activities within a protocol entity. RTAG encourages a top-down approach to protocol design that can be of significant benefit in expressing and reasoning about highly complex protocols. As an example, an RTAG specification is given for part of the Class 4 ISO Transport Protocol (TP-4). Because RTAG allows protocols to be specified at a highly detailed level, major parts of an implementation can be automatically generated from a specification. An RTAG parser can be written which, when combined with an RTAG specification of a protocol and a set of interface and utility routines, constitutes an implementation of the protocol. To demonstrate the viability of RTAG for implementation generation, an RTAG parser has been integrated into the kernel of the 4.2 BSD UNIX operating system, and has been used in conjunction with the RTAG TP-4 specification to obtain an RTAG-based TP-4 implementation in the DoD Internet domain. %A Seymour V Parter %T On the Distribution of the Singular Values of Toeplitz Matrices %D August 1985 %R TR 609 %I Computer Sciences Department, University of Wisconsin %C Madison, WI %X Abstract: In 1920, G. Szego proved a basic result concerning the distribution of the eigenvalues &"{"lambda sub j sup (n) "}"~~& of the Toeplitz sections &~~T sub n [f]~~& where &f( THETA )~ epsilon ~L sub inf (- pi ,~ pi )~~& is a real-valued function. Simple examples show that this result cannot hold in the case where &~~f( THETA )~~& is not real valued. In this note, we give an extension of this theorem for the singular values of &T sub n [f]~~& when &~~f( THETA )~=~f sub 0 ( THETA ) R sub 0 ( THETA )~~& with &~~f sub 0 ( THETA )~~& real-valued and &R sub 0 ( THETA )~~& continuous, periodic (with period &2 pi )~& and &~~|R sub 0 ( THETA )|~=~1&. In addition, we apply the basic theorem of Szego to resolve a question of C. Moler. %A Seymour V. Parter %T On an Estimate for the Three-Grid MGR Multigrid Method %D August 1985 %R TR 610 %I Computer Sciences Department, University of Wisconsin %C Madison, WI %X Abstract: The MGR &[ nu ]& algorithm of Ries, Trottenberg and Winter, Algorit hm 2.1 of Braess and Algorithm 4.1 of Verfurth are all algorithms for the numerical solution of the discrete Poisson equation based on red-black Gauss-Seidel smoothing iterations. In this work we consider the extension of the MGR[0] method to the general diffusion equation &~~-\(gr cdot p \(gr u~=~f&. In particular, for the three grid scheme we extend an interesting and important result of Ries, Trottenberg and Winter whose results are based on Fourier analysis and hence intrinsically limited to the case where &~~OMEGA~~& is a rectangle. Let &~~OMEGA~~& be a general polygonal domain whose sides have slope &~~+- 1~,~~0~~& and &~~inf .~~& Let &~~epsilon sup 0~~& be the error before a single multigrid cycle and let &~~epsilon sup 1~~& be the error after this cycle. Then &|| epsilon sup 1 || sub L sub h ~<= ~ 1 over 2 (1+Kh)|| epsilon sup 0 || sub L sub h~~& where &||~~|| sub L sub h~~& denotes the energy or operator norm. When &~~p(x,y)~==~& constant, then &~~K~==~0&. %A Aaron Jonathan Gordon %T Ordering Errors in Distributed Programs %D August 1985 %R TR 611 %I Computer Sciences Department, University of Wisconsin %C Madison, WI %X Abstract: With processors becoming small and inexpensive, many researchers attempt to decrease program runtime by combining processors into a multicomputer and running programs distributed over these processors. Debugging in a distributed environment is different from debugging on a uniprocessor. On a uniprocessor, the order in which a process's events occur is deterministic. In a distributed environment events occur concurrently on different processors. The order in which events occur cannot be easily determined; a program that works correctly one time may fail subsequently if the timing between processors changes. Traditional methods of debugging (such as putting in print statements and recompiling the program or recompiling the program with a debug flag on) are inadequate since they change the program and therefore change the timing. For this research, I have investigated distributed program bugs that depend on the relative order between events. These ordering errors include events which always occur in the wrong order and events whose order of occurrence is time-dependent. In this research, I characterize these timing errors and misorderings and show necessary conditions for their occurrence. Using my model of a distributed system, I prove which features can be used in combination to avoid ordering errors. I use these results to make suggestions to those writing distributed programs, developing distributed programming languages and designing distributed operating systems. I then explain drawbacks to preventing ordering errors and show ways to detect them as they occur. Finally, I describe a tool (called TAP) to aid the programmer in discovering the causes of ordering errors in running programs. I also show that TAP is useful in finding other types of distributed program bugs. %A David P. Anderson %T A Grammar-Based Methodology for Protocol Specification and Implementation %D September 1985 %R TR 612 %I Computer Sciences Department, University of Wisconsin %C Madison, WI %X Abstract: A new methodology for specifying and implementing communication protocols is presented. This methodology is based on a formalism called "Real-Time Asynchronous Grammars" (RTAG), which uses a syntax similar to that of attribute grammars to specify allowable message sequences. In addition RTAG provides mechanisms for specifying data-dependent protocol activities, real-time constraints, and concurrent activities within a protocol entity. RTAG encourages a top-down approach to protocol design that can be of significant benefit in expressing and reasoning about highly complex protocols. As an example, an RTAG specification is given for part of the Class 4 NBS Transport Protocol (TP-4). Because RTAG allows protocols to be specified at a highly detailed level, major parts of an implementation can be automatically generated from a specification. An RTAG parser can be written which, when combined with an RTAG specification of a protocol and a set of interface and utility routines, constitutes an implementation of the protocol. To demonstrate the viability of RTAG for implementation generation, an RTAG parser has been integrated into the kernel of the 4.2 BSD UNIX operating system, and has been used in conjunction with the RTAG T-4 specification to obtain a working TP-4 implementation in the DOD Internet environment. %A Barton P. Miller %A Cui-qing Yang %T Measuring Distributed Programs: A Hierarchical and Interactive Approach %D September 1985 %R TR 613 %I Computer Sciences Department, University of Wisconsin %C Madison, WI %X Abstract: We have designed an interactive tool, called IPS, for performance measurement and analysis of distributed programs. IPS is based on a hierarchical model of distributed computation which maps the program's behavior to different levels of abstraction. The hierarchical model unifies performance data from the whole program level down to procedure and statement level. Users are able to maneuver through the hierarchy and analyze the program measurement results at various levels of details. IPS allows the programmer to interactively evaluate the performance history of a distributed program. Because of the regular structure of the hierarchy, IPS can provide information to guide the programmer to the cause of performance bottlenecks. Critical path analysis is used, in conjunction with simple performance metrics to direct the programmer in identifying bottlenecks. %A Yaoshuang Qu %A Lawrence H. Landweber %A Miron Livny %T Performance Modeling and Optimization of Paring %D September 1985 %R TR 614 %I Computer Sciences Department, University of Wisconsin %C Madison, WI %X Abstract: This paper introduces the PaRing, a new token ring architecture, and studies analytical approaches for modeling and optimizing its performance. The PaRing is distinguished from earlier ring architectures by its ability to support concurrent multiple communication paths on a single loop in an efficient and fair manner. A multiple server queueing network model, in which a communication path is viewed as being served by a server, is used to determine packet waiting time for symmetric traffic. The key to the modeling is to convert the multiple system into multiple independent and identical single server systems. The model is verified by simulation. The performance of the PaRing depends on the number of concurrent paths. An optimization method is developed to maximize the number of concurrent paths. %A Klaus Hollig %A Charles A. Micchelli %T Divided Differences, Hyperbolic Equations and Lifting Distributions %D September 1985 %R TR 615 %I Computer Sciences Department, University of Wisconsin %C Madison, WI %X Abstract: We discuss the relationship between divided differences, fundamental function of hyperbolic equations, multivariate interpolation and polyhedral splines. %A Yaoshuang Qu %A Lawrence H. Landweber %A Miron Livny %T Paring: A Token Ring Local Area Network With Concurrency %D September 1985 %R TR 616 %I Computer Sciences Department, University of Wisconsin %C Madison, WI %X Abstract: In this paper we introduce the PaRing, a token ring architecture. The PaRing ring is distinguished from earlier ring architectures by its ability to concurrently support multiple communication paths on a single loop in an efficient and fair manner. This is accomplished by allowing more than one station to transmit a packet at a time. The result is higher throughput and shorter response time than is the case for the standard token ring or the register insertion ring. %A Koujuch Liou %T Design of Pipelined Memory Systems for Decoupled Architectures %D October 1985 %R TR 617 %I Computer Sciences Department, University of Wisconsin %C Madison, WI %X Abstract: Design of memory systems for decoupled architectures is the theme of this dissertation. The decoupled architecture uses hardware queues to architecturally decouple memory request generation from algorithmic computation. This results in an implementation that has two separate instruction streams that communicate via hardware queues. Thus, performance is improved through parallelism and efficient memory referencing. Techniques for increasing memory bandwidth and algorithms for servicing memory requests are incorporated into the memory system designs within these two constraints: (1) the operands placed in the hardware queue must be in the correct order, and (2) the needed operands are the only operands that can be placed in the hardware queue. Techniques such as pipelining, interleaving, servicing requests out of arrival order, and cache memory are investigated. Two strategies for servicing memory requests are studied: (1) to service requests according to their priorities, and (2) to minimize the total request service time. For the first strategy, the priority of each request type is derived from the characteristics of memory reference and possible bottleneck during decoupled computations. The second strategy results in a request scheduling policy, Free-Module-Request-First, that is proven to be able to minimize the total request service time. A sequence control scheme must be used with the Free-Module-Request-First scheduling policy in order to deliver the memory outputs to the hardware queue in the correct order. This sequence control scheme is also used to track cache hits and misses, so that a data cache can be implemented in the memory system without difficulty. The designed data cache can not only support flexible fetch and replacement cache algorithms, it can also detect memory access hazards and short-circuit the Read-After-Write requests. Therefore, the penalty of memory access hazards can be greatly reduced. The combination of the designed data cache and the pipelined interleaved memory system using Free-Module-Request-First scheduling policy results in a high-performance memory system, that is capable of servicing memory nearly no conflict delay under the particular workload defined in the trace files. %A Mary K. Vernon %A Mark A. Holliday %T Performance Analysis of Multiprocessor Cache Consistency Protocols Using Generalized Timed Petri Nets %D November 1985 %R TR 618 %I Computer Sciences Department %C Madison, WI %X Abstract: We use an exact analytical technique, based on Generalized Timed Petri Nets (GTPNs), to study the performance of shared bus cache consistency protocols for multiprocessors. We develop a general framework within which the key characteristics of the Write-Once protocol and four enhancements that have been combined in various ways in the literature can be identified and evaluated. We then quantitatively assess the performance gains for each of the four enhancements. We consider three levels of data sharing in our workload models. One of the enhancements substantially improves system performance in all cases. Two enhancements are shown to have negligible effect over the range of workloads analyzed. The fourth enhancement shows a small improvement for low levels of sharing, but shows more substantial improvement as sharing is increased, if we assume a "good access pattern". The effects of two architectural parameters, the blocksize and the main memory cycle time are also considered. %A Wei-Chung Hsu %T Register Allocation for VLSI Processors %D November 1985 %R TR 619 %I Computer Sciences Department, University of Wisconsin %C Madison, WI %X Abstract: The performance of a single-chip CPU is often restricted by limited off-chip memory bandwidth. In this report, we study how to effectively use registers - one kind of on-chip memory - to reduce off-chip memory access. To use a limited number of registers efficiently, a good register allocation should handle spilling effectively. We study the minimization of Load/Store instructions in local register allocation. Both an optimal and an efficient heuristic algorithm are proposed for local register allocation. We also suggest the use of a trace optimization technique for global register allocation. %A Klaus Hollig %T Multivariate Splines %D December 1985 %R TR 620 %I Computer Sciences Department, University of Wisconsin %C Madison, WI %X Abstract: Multivariate B-splines are defined as volume densities of convex polyhedra. Two special cases, simplex splines and box splines, give rise to natural generalizations of univariate spline functions. While simplex splines yield smooth piecewise polynomial spaces on fairly general triangular meshes, box splines correspond to regular triangulations and share many of the computationally attractive features of tensor products. In this paper, the basic properties of these new classes of spline functions are discussed as well as their application to surface approximation. %A Noga Alon %A Amnon Barak %A Udi Manber %T On Disseminating Information Reliably Without Broadcasting %D December 1985 %R TR 621 %I Computer Sciences Department, University of Wisconsin %C Madison, WI %X Abstract: A general scheme for collecting information in a multicomputer system is presented. The scheme allows data to be exchanged efficiently and quickly among many machines without broadcasting. In addition, certain operations can be performed on the data while it is being collected. Several applications to distributed computing are discussed. As a byproduct of the combinatorial part of the paper, a problem of Babai and Erdos in combinatorial group theory is solved. %A Carl de Boor %A Klaus Hollig %T B-splines Without Divided Differences %D December 1985 %R TR 622 %I Computer Sciences Department, University of Wisconsin %C Madison, WI %X Abstract: This note develops the basic B-spline theory without using divided differences. Instead, the starting point is the definition of B-splines via recurrence relations. This approach yields very simple derivations of basic properties of spline functions and algorithms. %A David Kamowitz %T SOR and MGR[&nu&] Experiments on the Crystal Multicomputer %D December 1985 %R TR 623 %I Computer Sciences Department, University of Wisconsin %C Madison, WI %X Abstract: This report describes implementations of the red/black SOR algorithm and of the MGR[&nu&] multigrid algorithm on the Crystal multicomputer. Rates of convergence and observed efficiencies for both algorithms are compared. %A Hongjun Lu %T Distributed Query Processing With Load Balancing in Local Networks %D December 1985 %R TR 624 %I Computer Sciences Department, University of Wisconsin %C Madison, WI %X Abstract: This thesis presents a new approach to distributed query processing in locally distributed database systems, load-balanced query processing (LBQP), which integrates distributed query processing and load balancing. Several observations about previous research work in distributed query processing motivated this study. First, only a few query processing algorithms have been developed specifically for distributed databases based on local networks. Second, the use of multiple copies of data to improve performance in a distributed database system has not been addressed by most existing algorithms. Finally, and perhaps most importantly, existing query optimization algorithms have considered only the static characteristics of a distributed database. The algorithms reported here consider the dynamic load status of the system and the existence of multiple copies of data to provide better performance than is achievable with purely static planning techniques. The dynamic query allocation problem for a distributed database system with fully-replicated data was studied first using simulation. Two new heuristic algorithms were proposed for dynamically choosing a processing site for a newly arrived query in a local network environment. Both of these heuristics use knowledge about queries obtained during query optimization, such as their estimated CPU and I/O requirements. A simulation model was developed and used to study these heuristics and compare them to an algorithm that simply balances the number of jobs at each site. The results of this study indicate that knowledge of query processing requirements can be used effectively to improve overall system performance through dynamic query allocation. In order to obtain empirical results regarding distributed query processing in local area networks, a testbed was built using an experimental distributed system, the Crystal multicomputer, to conduct experiments on the performance of distributed join algorithms. Eight different distributed join methods were implemented using the testbed. Join queries with a variety of relation sizes, join selectivities, and join column value distributions were experimentally studied. The performance results obtained indicate that pipelined join methods outperform sequential methods over a wide range of join queries. It was also found that the communications costs in a local network environment are certainly not a dominant factor with respect to performance. A three-phase load-balanced query processing algorithm, Algorithm LBQP, was developed based on these experimental results and the results of the study of dynamic query allocation. This algorithm first statically generates a processing plan for a query in a locally distributed database system, ignoring the physical storage sites of the relations referenced by the query. A dynamic query unit allocation algorithm is then applied to the plan to determine the processing sites for each relation. Finally, specific processing methods for the distributed joins in the resulting plan are selected. A model of distributed database systems with partially-replicated data was used to investigate the performance of the dynamic query unit algorithm of LBQP. The results show significant improvements in performance, including improvements in both the mean waiting time for queries and the overall system throughput. %A Michael J. Carey %A Hongjun Lu %T Load Balancing in A Locally Distributed Database System %D December 1985 %R TR 625 %I Computer Sciences Department, University of Wisconsin %C Madison, WI %X Abstract: Most previous work on query optimization in distributed database systems has focused on finding optimal or near-optimal processing plans based solely on static system characteristics, and few researchers have addressed the problem of copy selection when data is replicated. This paper describes a new approach to query processing for locally distributed database systems. Our approach uses load information to select the processing site(s) for a query, dynamically choosing from among those sites that have copies of relations referenced by the query. Query compilation is used to produce a statistically-optimized logical plan for the query, and then a dynamic optimization phase converts this logical plan into an executable physical plan at run-time. This paper motivates the separation of static and dynamic optimization, presents algorithms for the various phases of the optimization process, and describes a simulation study that was undertaken to investigate the performance of this approach. Our simulation results indicate that load-balanced query processing can provide improvements in both query response times and overall system throughput as compared to schemes where execution sites are either statically or randomly selected. %A R. R. Meyer %T Trust Region Methods for Piecewise-Linear Approximation %D December 1985 %R TR 626 %I Computer Sciences Department, University of Wisconsin %C Madison, WI %T Trust region methods are analyzed for piecewise-linear approximation algorithms for linearly-constrained nonlinear optimization. Convergence to the optimal value is demonstrated for continuously differential convex objectives and for certain classes of nondifferentiable convex objectives. Computationally, this approach has the nice property that the approximation is generally more accurate than a linear approxmation yet the subproblems to be solved at each iteration remain linear programs. The method is also well-suited to convex network optmization, since it preserves the network structure of the subproblems, thereby allowing the use of the very fast techniques available for linear network optimization. For problem classes such as traffic assignment in which the critical coupling between variables occurs in the objective function, the separability of the approximation makes possible a decomposition into independent subproblems that may be solved in parallel in a multiprocessing environment. %A W. Harry Plantinga %A Charles R. Dyer %T An Algorithm for Constructing the Aspect Graph %D December 1985 %R TR 627 %I Computer Sciences Department, University of Wisconsin %C Madison, WI %X Abstract: In this paper we consider the size of aspect graphs, first in the convex case, and then in the general case. In particular, we give upper and lower bounds on the maximum size (including vertex labels) of &~ THETA (n sup 3 )~& and &~~ THETA (n sup 5 )~~& respectively, and algorithms for constructing the aspect graph which run in time &~~O (n sup 4 )~~& and &~~O (n sup 6 )~~&. The algorithm for the general case makes use of a new object representation called the aspect representation or asp. We also show a different way to label the aspect graph in order to save a factor of n in the asymptotic size and construction time (at the expense of retrieval time) in both the convex and general cases, and we suggest alternatives to the aspect graph which require less space and store more information. %A Sheldon Klein %T The Invention of Computationally Plausible Knowledge Systems in the Upper Paleolithic %D December 1985 %R TR 628 %I Computer Sciences Department, University of Wisconsin %C Madison, WI %X Abstract: The problem of computing human behavior by rules can become intractable with large scale knowledge systems if the human brain, like a computer, is a finite state automaton. The problem of making such computations at a pace fast enough for ordinary social interaction can be solved if appropriate constraints apply to the structure of those rules. There is evidence that systems of such constraints were invented in the Upper Paleolithic, and were of sufficient power to guarantee that the time necessary for computation of behavior would increase only linearly with increases in the size and heterogeneity of world knowledge systems. Fundamentally, there was just one type of computational invention, capable of unifying the full range of human sensory domains, and consisting of an analogical reasoning method in combination with a global classification scheme. The invention may have been responsible for the elaboration of language and culture structures in a process of co-evolution. The encoding of the analogical mechanism in iconic visual imagery and myth structures may have given rise to the phenomenon of Shamanism. The theory is testable, and one of its implications is that the structuralism of Levi-Strauss has an empirical foundation.