LAWS@SRI-AI.ARPA (11/30/84)
From: AIList Moderator Kenneth Laws <AIList-REQUEST@SRI-AI> AIList Digest Friday, 30 Nov 1984 Volume 2 : Issue 164 Today's Topics: Algorithms - Karmarkar Algorithm & Linear Programming, Seminars - Search Complexity & User Interfaces (IBM-SJ) & A Semantical Definition of Probability (CSLI Stanford) & Learning in Stochastic Networks (CMU) ---------------------------------------------------------------------- Date: 27 Nov 1984 17:19:38-EST (Tuesday) From: S.Miller@wisc-rsch.arpa Subject: Karmarkar Algorithm [Forwarded from the SRI-AI bboard by Laws@SRI-AI.] The Karmarkar algorithm was presented at STOC (Symposium on Theory of Computing) on May 1, 1984 (STOC '84, p. 302) "A New Polynomial time Algorithm on Linear Programming". The STOC proceedings are available from the ACM if your location doesn't have them. ------------------------------ Date: Mon 26 Nov 84 17:33:08-PST From: Walter Murray <OR.MURRAY@SU-SIERRA.ARPA> Subject: Linear Progamming Algorithms. [Forwarded from the Stanford bboard by CKaun@AIDS-UNIX.] Some recent bboard messages have referred to linear programming. The algorithm by Karmarkar is almost identical with iterative reweighted least squares (IRLS). This latter algorithm is used to solve approximation problems other than in the l2 norm. It can be shown that the form of LP assumed by Karmarkar is equivalent to an l infinity approximation problem. If this problem is then solved by the IRLS algorithm the estimates of the solution generated are identical to those of the Karmarkar algorithm (assuming certain free choices in the definition of the algorithms). Perhaps it should be added that the algorithm is not held in high regard in approximation circles. To solve a an l infinity problem it is usually transformed to an LP and solved using the simplex method. A message from Kaun (forwarded by Malachi without a heading) described an algorithm for LP which Kaund claimed requires o(n^3) work. It is easy to demonstrate the algorithm may fail to converge to the solution. The following is a cross-section of a hole consisting of straight sides. Water is poured into this hole from the point x. x o o . . o o . . . . o o . . o The water hits a facet. It continues to fall until it hits a second facet which is a vertex. Unless the water is prepared to leave the first facet hit it will not reach the bottom. ------------------------------ Date: 27 Nov 84 11:45:16 PST (Tue) From: Carl Kaun <ckaun@aids-unix> Subject: Linear Programming Algorithm Murray is correct -- the algorithm as stated will not usually converge to the solution. One problem is that removing components from the gradient does not automatically force it to zero after N steps, as I asserted. It looks to me like the gradient stepping idea can still be used in a more complicated scheme, and that the computational time for the algorithm will be o(M*N^2), where M is the number of constraints. But I want to verify the details better than I did for my original message before saying more. I still wonder if finding such a solution to the continuous (as opposed to the integer) linear programming problem has any significance. ckaun@aids-unix ------------------------------ Date: 27 Nov 84 22:31:30 PST (Tue) From: Carl Kaun <ckaun@aids-unix> Subject: Gradient Step Linear Programming (again) Well, here we go again. Let's see if this try stands up to scrutiny. The claim is that the algorithm following (gradient step linear programing) solves the linear programming problem in at most o(M^2*N +M*N^3) operations. I still don't know if that has any significance. As before, the idea is to step, as best one can subject to the constraints, along the gradient. The terminating conditions are similar to the algorithm given previously. As before, the mathematical notation represents vectors by preceding them with an underline, as "_x". Subscripts are represented using a colon, _c:j being the j-th vector _c. The inner product is represented by < * , * >. A functional form (i.e. f( )) is used to represent things like sums. The rest should be fairly obvious. The statement of the linear programming problem is also as before, being to maximize with respect to the N-dimensional vector _x the linear functional: <_c , _x > subject to the constraints: <_a:j , _x > >= b:j for j = 1, 2, ..., M M>= N, as otherwise the solution is unbounded. Assume for the moment an initial feasible vector _x(0) in the interior (so that there are initially no active constraints) of the polytope defined by the constraints. _c:0 = _c. All constraints are potentially active. A. From the current solution point _x(n), find the constraint limiting motion in the direction _c:n, and the maximum feasible step size s>0 giving the next solution point: _x(n+1) = _x(n) + s*_c:n For j = 1, 2, ... M and j not a currently active constraint, compute D:j = <_x(n), _a:j> - b:j ( >= 0 ) s:j = - D:j / <_c(n), _a:j> s = min { s:j | s:j>0} , and the next active constraint has the index j(n) providing the minimum. B. The next step is to compute a movement direction aligned with the gradient (thus enabling improvement in the functional) that also satisfies the active constraints. The first active constraint was identified in the previous step, thus: _c(0) = _c - _a:j(n) * <_c, _a:j(n)> / <_a:j(n), _a:j(n)> C. Next determine which of the constraints active in the previous cycle are active in this step, and modify the movement direction accordingly. A previously active constraint a:j is active in this cycle if <_a:j, _c(i)> < 0. That is, motion along the current direction _c(i) would violate the constraint. If the constraint is active, then the Gram-Schmidt procedure is applied to _a:j to orthogonalize the vectors involved and thereby determine the component to be removed from _c(i), yielding _c(i+1). _a(i) = _a:j - sum (n=0 to i-1) [ <_a:j, _a(n)> / <a:j, a:j> ] _c(i+1) = _c(i) - _a(i) * <_c(i), _a(i)> / <_a(i), _a(i)> When all of the previously active constraints have been determined to be either active or inactive for the current cycle, the next step direction is _c:n = _c(i), for the latest i. (It appears necessary, for each determination of a _c(i), to scan the entire set of constraints which were active in the previous cycle (but have not yet been determined to be active in the current cycle) before deciding that none is active in the current cycle. Practically, there will be only one active constraint in most of the cycles, and the trajectory of the algorithm passes through various of the facets of the polytope most of the time.) The stopping condition results when _c(i) = _0; that is, when the objective gradient _c lies in the cone formed from the combination of negatively scaled gradients of the constraints. This is the Kuhn-Tucker condition of optimality. Equivalently,N (linearly independent) constraints are found to be active. I don't remember that the Kuhn-Tucker conditions are sufficient, but in any event this is the optimal point because there is no feasible motion direction which improves the objective. Unlike the previous algorithm, in this the identification of new constraints can result in movement away from a previously active constraint. When this happens, the previously active constraint can be totally removed from further consideration, due to the convexity of the problem (this assertion seems obvious, but has not been PROVED by me). The algorithm encounters a new active constraint each cycle, and therefore converges in at most M cycles, this being the maximum number of constraints that can be newly encountered. In practice again, the trajectory of the algorithm will generally be such that convergence will occur in many fewer cycles than M. Steps A-C are repeated until the stopping condition occurs. As indicated above, the algorithm converges in at most M cycles. For each cycle, step A requires o(N) multiplications and additions to compute the inner product, etc. for each of o(M) constraints, for a total of o(MN) operations. Step B requires o(N) operations, which scarcely affects the overall timing. Step C can potentially result in the identification of N-1 active constraints. Each such identification requires the removal of o(N) orthogonal components, and each such removal entails o(N) operations, for an overall count of o(N^3) operations to remove the effects of previously identified active constraints. Also, o(N) constraints may have to be scanned to determine if they are active for each such identification, each such determination requiring o(N) operations; resulting again in a total of o(N^3) operations for step C. Performing steps A and C therefore requires o(M^2*N + M*N^3) operations. An initial feasible point can be determined starting from an arbitrary point (say the origin), identifying the unsatisfied constraints, and moving in directions that satisfy them. It may be more direct to simply start with a "superoptimal" point, say K*_c for suitably large K, and iterate using essentially the previously described algorithm along the negative constrained gradient direction to feasibility. The resulting feasible point will also be optimal for the original problem. Carl F. Kaun ckaun@aids-UNIX 415/941-3912 ------------------------------ Date: Wed, 28 Nov 84 17:12:55 PST From: IBM San Jose Research Laboratory Calendar <calendar%ibm-sj.csnet@csnet-relay.arpa> Reply-to: IBM-SJ Calendar <CALENDAR%ibm-sj.csnet@csnet-relay.arpa> Subject: Seminars - Search Complexity & User Interfaces (IBM-SJ) [Forwarded from the SRI bboard by Laws@SRI-AI.] IBM San Jose Research Lab 5600 Cottle Road San Jose, CA 95193 CALENDAR (DECEMBER 3 - 7, 1984) Wed., Dec. 5 Computer Science Seminar 10:00 A.M. HOW HARD IS NP-HARD? 2C-012 This talk examines the average complexity of depth-first search for two different search models. The first model has no cutoff at unpromising internal nodes, but does terminate at a leaf node when the leaf node represents a successful search outcome. This model leads to an average complexity that grows anywhere from linearly to exponentially in the depth of the tree depending on the probability of choosing the best branch to search first at each internal node of the tree. Good decisions lead to linear complexity and bad decisions lead to exponential complexity. The second model examines tree searching with internal cutoff when unpromising paths are discovered. In this model, the search terminates successfully when it reaches the first leaf. The model is representive of branch-and-bound algorithms that guarantee that the first leaf reached is a successful leaf. Roth's D-algorithm for generating test vectors for logic circuits fits this model, and White's efficient algorithm for solving the Traveling Salesman problem also fits except for the distribution cutoff probabilities. Our model shows that the number of nodes visited during a depth-first search grows at most linearly on the average, regardless of cutoff probability. If cutoff probability is very high, the search fails with a very high probability, and visits an average number of nodes that grows as O(1) as the tree depth increases. If cutoff probability is very low, then the algorithm finds a successful leaf after visiting only O(N) nodes on the average where N is the depth of tree. Many NP-complete problems can be solved by depth-first searches. If such problems can be solved by algorithms that order the depth-search first to terminate at the first leaf, then this work and the work by Smith suggests that the average complexity might grow only polynomially in the tree depth, rather than exponentially as the worst-case analysis suggests. H. S. Stone, IBM Yorktown Research Host: B. D. Rathi Thurs., Dec. 6 Computer Science Seminar 10:00 A.M. APPLICATIONS OF COGNITIVE COMPLEXITY THEORY 2C-012 TO THE DESIGN OF USER INTERFACES The cognitive complexity project has two major objectives. The first is to gain a theoretical understanding of the knowledge and thought processes that underlie successful use of computer-based systems (e.g., text editors). The second goal is to develop a design technology that minimizes the cognitive complexity of such systems as seen by the user. Cognitive complexity is defined as the amount, content, and structure of the knowledge required to operate a system. In this particular work, the knowledge is described as a production system. The computer-based system is described as a generalized transition network. Quantitative predictions, derived from the production system, are shown to account for various aspects of user performance (e.g., training time). The talk will include a brief presentation of the design methodology based on the production system formalism. Prof. D. E. Kieras, University of Michigan, Ann Arbor Prof. P. G. Polson, University of Colorado, Boulder Host: J. L. Bennett ------------------------------ Date: Wed 28 Nov 84 17:24:47-PST From: Dikran Karagueuzian <DIKRAN@SU-CSLI.ARPA> Subject: Seminar - A Semantical Definition of Probability (CSLI Stanford) [Excerpted from the CSLI Newsletter by Laws@SRI-AI.] SEMINAR IN LOGIC AND FOUNDATIONS OF MATHEMATICS Speaker: Prof. Rolando Chuaqui, Catholic University of Chile and IMSSS Title: A Semantical Definition of Probability Place: Room 381-T, 1st floor Math. Corner Time: Monday, December 3, 4:15-5:30 p.m. ABSTRACT: The analysis proposed in this lecture is an attempt to formalize both chance and degree of support. Chance is considered as a dispositional property of the objects plus the experimental conditions (i.e. what is called the chance set-up). Degree of support measures the support that the evidence we have (i.e. what we accept as true) gives to propositions. Chance, in this model, is determined by the set K of possible outcomes (or results) of the chance set-up. Each outcome is represented by a relational structure of a certain kind. This set of structures determines the algebra of events, an algebra of subsets of K, and the probability measure through invariance under a group of symmetries. The propositions are represented by the sentences of a formal language, and the probability of a sentence, phi in K, P[K](phi), is the measure of the set of models of phi that are in K. P[K](phi) represents the degree of support of phi given K. This definition of probability can be applied to clarify the different methods of statistical inference and decision theory. ------------------------------ Date: 27 November 1984 1607-EST From: David Ackley@CMU-CS-A Subject: Seminar - Learning in Stochastic Networks (CMU) [Forwarded from the CMU bboard by Laws@SRI-AI.] "Learning evaluation functions in stochastic parallel networks" Thesis Proposal Tuesday, December 4, 1984, at 3:30pm in 5409 WeH. Although effective techniques exist for adjusting linear coefficients of features to produce an improved heuristic evaluation of a game position, the creation of useful features remains poorly understood. Recent work on parallel learning with the Boltzmann Machine suggests that the creation of useful new features and the tuning of coefficients of existing features can be integrated into a single learning process, but the perceptual learning paradigm that underlies the Boltzmann Machine formalism is substantially different from the reinforcement learning paradigm that underlies most game-learning research. The thesis work will involve the development of a reinforcement-based parallel learning algorithm that operates on a computational architecture similar to the Boltzmann Machine, and drives the creation and refinement of an evaluation function given only win/lose/draw reinforcement information while playing a small game such as tic-tac-toe. The thesis work will test several novel ideas, and will have implications for a number of issues in machine learning and knowledge representation. ------------------------------ End of AIList Digest ********************