LAWS@SRI-AI.ARPA (11/26/84)
From: AIList Moderator Kenneth Laws <AIList-REQUEST@SRI-AI> AIList Digest Sunday, 25 Nov 1984 Volume 2 : Issue 161 Today's Topics: Benchmarking Reading Comprehension Reviews - AI Abstracts & IEEE Computer & High Technology & Learnability, Humor - Brain Structure, Algorithms - Many-Body Problem & Macro Cacheing & Linear Programming, Seminar - Set Theoretic Problem Translation (CMU) ---------------------------------------------------------------------- Date: Sun 25 Nov 84 01:50:44-EST From: Wayne McGuire <MDC.WAYNE%MIT-OZ@MIT-MC.ARPA> Subject: Benchmarking Reading Comprehension Does anyone know if any objective standards or tests have been devised for rating or benchmarking the power of natural language understanding systems? In the world of chess exists a standard international system for rating players which can be applied to chess-playing programs. I think it would be useful to devise a similar system for natural language understanding software. Such a benchmarking scheme would make it possible to track the rate of progress in the most fundamental branch of computational linguistics, and to compare the performance of competing systems. The National Bureau of Standards might be an appropriate organization to oversee a project of this kind. Perhaps such a benchmarking system could be based on the reading comprehension sections of the SAT or GRE exams. A GRE-style multiple choice test for natural language understanding would avert the problem of wrongly jumbling the capacity to understand--to recognize propositions, reason, and draw inferences--with the ability of a program to answer questions with well-formed discourse, a domain of skill which is really quite separate from pure comprehension. It would be desirable to establish standard tests for every major language in the world. Is there an existing natural language understanding system in the world that can read even at the level of a third grader? Probably not. To that researcher or research team in the world which first designs (no doubt decades from now) a program which consistently scores at least 700 on the reading comprehension sections of standardized tests like the SAT or GRE could be offered, perhaps, a major cash prize. ------------------------------ Date: Sat 24 Nov 84 15:27:29-PST From: Ken Laws <Laws@SRI-AI.ARPA> Subject: AI Abstracts Two ads for AI abstracts and indices have recently crossed my desk: Scientific Datalink is offering a four-volume index to the AI research reports that they offer in microfiche (one volume of authors and titles, one of subjects, and two of abstracts). The price is $375 now, $495 after publication. Individual volumes are not offered separately in this ad. For more information, write to Ms. Chia Reinhard, Scientific Datalink, 850 Third Avenue, New York, NY 10022. (212) 838-7200. ECI/Intelligence is offering a new journal, Artificial Intelligence Abstracts, at $295 for 1985. (The mockup of the first issue is dated October 1984, but the text consists of such gems as "Ut einim ad minim veniam, quis nostrud exercitation laboris nisi ut aliquip ex ea commodo consequet.") The journal offers to keep you up to date on market research, hardware and software developments, expert systems, financial planning, and legislative activities. There is a similar journal for CAD/CAM. The AI advisory board includes Harry Barrow, Michael Brady, Pamela McCorduck, and David Shaw. ECI/Intelligence also offers a full-text document order service from their microfiche collection. For more info, write to them at 48 West 38 Street, New York, NY 10018. -- Ken Laws ------------------------------ Date: Sat 24 Nov 84 14:51:17-PST From: Ken Laws <Laws@SRI-AI.ARPA> Subject: IEEE Computer Articles AI is mentioned a few times in the November issue of IEEE Computer. On p. 114, there are excerpts from the keynote Compcon speech by Robert C. Miller, a senior vice president of Data General. He is touting expert systems, and estimates that overall sales of AI-related products will increase from $100-150 million this year to $2.5 billion by the end of the decade. P. 117 has a very short mention of NCAI and the coming IJCAI. P. 133 has L. Elliot's review of Learning and Teaching with Computers, by Tim O'Shea and John Self. The book is evidently about half AI (Logo, MYCIN, knowledge engineering, and epistemology) and half computer-assisted learning (teaching styles, learning styles, tutoring strategies). The rest of the magazine is mostly about Teradata's database machines, the PICT graphical programming language, workstations in local area networks, and some overviews of software engineering at NEC and GTE. -- Ken Laws ------------------------------ Date: Sat 24 Nov 84 15:02:27-PST From: Ken Laws <Laws@SRI-AI.ARPA> Subject: High Technology Articles The December issue of High Technology has some interesting articles for computer folk. On p. 9 there's a short item about Logicware's (Hungarian-developed) MPROLOG, a "modular" PROLOG for IBM PCs and mainframes, 68000-based micros, VAXen, and other hosts. Other articles review the best of current microcomputer equipment (chiefly PC-AT and Macintosh), survey the field of visual flight and driving simulators, and present an excellent introduction to database structures and machines (esp. relational databases). -- Ken Laws ------------------------------ Date: Sun 25 Nov 84 15:25:50-PST From: Ken Laws <Laws@SRI-AI.ARPA> Subject: CACM Article on Learning The November issue of CACM includes "A Theory of the Learnable", by L.G. Valiant of Harvard. I am not competent to evaluate the article, which is based on theorems in computational complexity, but I can characterize it as follows: The author is considering the class of concepts in propositional logic that can be learned in a polynomial number of steps from a source of positive examples (produced as required in accordance with a probability distribution) and an oracle that can classify an arbitrary Boolean vector as a positive or negative exemplar. The classes that are found to be learnable are 1) conjunctive normal form expressions with a bounded number of literals in each clause (no oracle required), 2) monotone disjunctive normal form expressions, and 3) arbitrary expressions in which each variable occurs just once (no examples required, but the oracle must be especially capable). The method of learning used is such that the learned concept may occasionally reject true exemplars but will not accept false ones. The closing remarks contain this interesting quote: An important aspect of our approach, if cast in its greatest generality, is that we require the recognition algorithm of the teacher and learner to agree on an overwhelming fraction of only the natural inputs. Their behavior on unnatural inputs is irrelevant, and hence descriptions of all possible worlds are not necessary. If followed to its conclusion, this idea has considerable philosophical implications: A learnable concept is nothing more than a short program that distinguishes some natural inputs from some others. If such a concept is passed on among a population in a distributed manner, substantial variations in meaning may arise. More importantly, what consensus there is will only be meaningful for natural inputs. The behavior of an individual's program for unnatural inputs has no relevance. Hence thought experiments and logical arguments involving unnatural hypothetical situations may be meaningless activities. -- Ken Laws ------------------------------ Date: Tue 20 Nov 84 17:36:43-PST From: Richard Treitel <TREITEL@SUMEX-AIM.ARPA> Subject: quote {:-) Attributed to Marvin Minsky (someone tell me if it's wrong) "I'll bet the human brain is a kludge." - Richard ------------------------------ Date: Sat 24 Nov 84 14:36:33-PST From: Ken Laws <Laws@SRI-AI.ARPA> Subject: Many-Body Problem Need software to run 10,000-body simulations? A VAX Pascal program is discussed in the November CACM Programming Pearls column. Optimization brought the run time down from one year to one day. -- Ken Laws ------------------------------ Date: 22 Nov 84 17:20 PST From: JonL.pa@XEROX.ARPA Subject: Macro cacheing: Interlisp-D Interpreter as "malgorithm"? Jay Ferguson suggests, in his contribution of 18 Nov 84 13:55:58-PST, that an explanation for the timing differences between using a FOR loop and using LDIFF in the original "Interlisp-D malgorithm" is because "Each time you call a FOR statement interpetively the translation occurs." This is not the case -- the Interlisp interpreter (in all implementations of Interlisp, I believe) caches the results of any macro or CLISP expansion into a hash array called CLISPARRAY; see secton 16.8 of the Interlisp Reference Manual (Oct 1983). In fact, the figures supplied by Jay show a speed difference of a factor of 17, which would be consistent with the basic loop being compiled in LDIFF (a "system" function) and being interpreted in the FOR. The question of "cacheing" as noted above is a complicated one, and in Jay's defense, I can say that it is not at all clearly outlined in the IRM. For example, it lays the burden on the "in-core" structure editor of examining CLISPARRAY on any change, to de-cache the expansions for any code that is modified; but random modifications (caused by, say, redefinition of a function upon which the expansion depends), don't cause de-cacheing, and this is the source of some very obscure bugs. Furthermore, lots of cruft may stick around forever because the garbage collector does not reclaim items placed in the cache; for this reason, it is advisable to ocasionally do (CLRHASH CLISPARRAY). MacLisp provides three options for macro expansions which are controllable on a macro-by-macro basis (CLISP is, unfortunately, a kludge dating to pre-macro Interlisp days -- it could and should be implemented entirely as a set of macros, so I will view it in that light for the rest of this discussion): (1) do no cacheing, (2) "displace" the original list cell containing the macro call with a form which contains both the original and the expanded code [compiler and interpreter use the expanded code, prettyprinter uses the original], and (3) cache the expansion in a hash array called MACROMEMO. While all these options can be run in any Lisp that implements macros by giving the expansion function a pointer to "the whole cell", Common Lisp provides the *macroexpand-hook* facility so that the cacheing code which is common to all macros can be put in one place, rather than distributed throughout the many macro bodies. -- JonL -- ------------------------------ Date: 22 Nov 84 07:15:10 PST (Thu) From: Carl Kaun <ckaun@aids-unix> Subject: Linear Programming Algorithms The recent discussion of the Karmarkar Linear Programming algorithm on this newsgroup has stirred me to comment. I wouldn't think a cubic time linear programming algorithm to be any great deal, indeed I present one myself shortly. An algorithm that solves the INTEGER linear programming problem, however, is something else again. My understanding is that the Khachiyan algorithm solved the integer problem in polynomial time. If the Karmarkar algorithm does also, then it is truly worthy. But it has not been apparent in the popularized discussion I have been reading that this is so. Perhaps those on the net who are more in the know can tell us whether it is or not. Ever since I took Luenberger's course in Linear and Nonlinear Programming (Stanford EES department, 1973) I have wondered why people didn't apply the powerful nonlinear tools to the linear problem. I played around with the idea and came up with an algorithm I call "gradient step linear programming". I've never done anything with it because it's so simple and obvious that it seemed someone had to have thought of it before. Because the algorithm follows the gradient as best it can subject to the constraints, from a geometric point of view it travels through the "interior" of the polytope, much as has been described for the Karmarkar algorithm. Optimality is achieved in no more than N steps, each step requiring o(N^2) numerical operations, where N is the dimension of the space. Mathematical notation isn't easy on a terminal. I adopt the convention of representing vectors by preceding them with an underline, as "_x". Subscripts I represent using a colon, _c:j being the j-th vector _c. The inner product is represented by < * , * >. I use a functional form (i.e. f( )) to represent things like sums. The rest should be fairly obvious. A statement of the linear programming problem, convenient for the description of the algorithm, follows. This problem statement is readily converted into other forms of the linear programming problem. The problem is to maximize with respect to the N-dimensional vector _x the linear functional: <_c , _x > subject to the constraints: <_a:j , _x > >= b:j for j = 1, 2, ..., M The vector '_c' is often called the cost vector when a minimization problem is being considered. M >= N, as otherwise the solution is unbounded. The procedure for finding an initial feasible vector _x(0) is essentially identical to the procedure for finding an optimum vector. For now an initial feasible vector _x(0) in the interior of the polytope defined by the constraints may simply be assumed. The initial gradient vector, maximizing the rate of change subject to the active constraints, is _c(0) = _c. At each stage, the idea of the algorithm is to move along the current gradient vector (subject to the active constraints) as far as possible, until a previously inactive constraint is encountered. The direction of change is modified by the most recent active constraint, until no motion in the direction of the gradient is possible. This is both the formal and the obvious stopping condition for the algorithm. The details of the algorithm follow: Step 1: given a current solution point _x(n), determine the step size s (giving the next solution point _x(n+1) = _x(n) + s*_c(n), identifying at the same time the next active constraint. D:j = < _x(n) , _a:j > - b:j ( >= 0 ) s:j = D:j / < _c(n) , _a:j > for j in the set of inactive constraints. s = min { s:j } , and the next active constraint has the index j(n) providing the minimum, which is also the maximum feasible step size. Step 2: Apply the Gram-Schmidt procedure (i.e., projection) to remove the component of the most recent active constraint from the gradient direction, so that subsequent steps will not result in violation of the active constraint. It is necessary first to remove all components of previous active constraints from the newly actived constraint to insure that the adjusted gradient direction will not violate any previous acitve constraint. _a(n) = _a:j(n) - sum(k = 0 to (n-1))[ _a(k) * <_a:j(n),_a(k)> / <_a(k),_a(k)> ] _c(n+1) = _c(n) - _a(n) * <_c(n),_a(n)> / <_a(n),_a(n)> Steps 1 and 2 are repeated until _c(n+1)=0, at which point _x(n) is the optimal solution to the linear programming problem. Additional tests to detect and recover from degeneracy are easily added. A detailed proof of optimality is straightforward but somewhat tedious. Intuitively, the algorithm is optimal because steps are always taken along the direction maximizing the rate of change of the functional, subject to the active constraints. At the optimal point, there is no feasible motion improving the functional. Stated differently, the original cost vector lies in the space spanned by the gradients of the constraints, and this is the formal (Lagrange) optimization condition. It is only necessary to add constraints to the set of active constraints because the optimization space is convex, and therefore changes in the functional improvement direction (and reduction in the rate of improvement) result only from encountering new constraints and having to turn to follow them. Note that the number of iterations is simply the number of dimensions N of the space, this being also the number of vectors required to span the space. Each iteration entails the removal of o(N) vector components from the new constraint, and the removal of a vector component entails o(N) multiplications and additions. Similarly, determining the step size requires the computation of o(N) inner products, each requiring o(N) multiplications and additions. Finding the iniitial feasible vector requires about the same effort in general. Thus overall the algorithm presented for solving the linear programming problem requires O(N**3) arithmetic operations. An initial feasible point can be determined starting from an arbitrary point (say the origin), identifying the unsatisfied constraints, and moving in directions that satisfy them. It may be more direct to simply start with a "superoptimal" point, say K*_c for suitably large K, and iterate using essentially the previously described algorithm along the negative constrained gradient directions to feasibility. By duality, the resulting feasible point will also be optimal for the original problem. Carl F. Kaun ckaun@aids-UNIX 415/941-3912 ------------------------------ Date: 21 November 1984 1014-EST From: Staci Quackenbush@CMU-CS-A Subject: Seminar - Set Theoretic Problem Translation (CMU) [Forwarded from the CMU bboard by Laws@SRI-AI.] Name: Robert Paige Date: November 27, 1984 Time: 10:00 - 12:00 Place: WeH 8220 Title: "Mechnical Translation of Set Theoretic Problem Specifications into Efficient RAM Code" Many computer problems can be expressed accurately and easily in the following form: 'find the unique set s subset of t satisfying predicate k(s) and minimizing objective function f(s)'. Although such specifications are generally unsolvable, we can provide rather broad sufficient conditions under which these formal problem statements can be compiled into efficient procedural implementations with predictable time and space complexities. A toy implementation of such a compiler has been implemented and used within the RAPTS transformational programming system. Our methodology depends on three fundamental program transformations, two of which resemble classical numerical techniques. The first transformation solves roots of set theoretic predicates by iterating to a fixed point. It turns an abstract functional program specification into a lower level imperative form with emerging strategy. The second transformation is a generalized finite differencing technique. It implements program strategy efficiently by forming access paths and incremental computations. The third transformation is a top down variant of Schwartz's method of data structure selection by basings. It replaces sets and maps by conventional storage structures. The method will be illustrated using two examples -- graph reachability and digraph cycle testing. This is a special 2-hour lecture with a 10-minute break in the middle. ------------------------------ End of AIList Digest ********************