[net.ai] AIList Digest V2 #164

LAWS@SRI-AI.ARPA (11/30/84)

From: AIList Moderator Kenneth Laws <AIList-REQUEST@SRI-AI>


AIList Digest            Friday, 30 Nov 1984      Volume 2 : Issue 164

Today's Topics:
  Algorithms - Karmarkar Algorithm & Linear Programming,
  Seminars - Search Complexity & User Interfaces  (IBM-SJ) &
    A Semantical Definition of Probability  (CSLI Stanford) &
    Learning in Stochastic Networks  (CMU)
----------------------------------------------------------------------

Date: 27 Nov 1984 17:19:38-EST (Tuesday)
From: S.Miller@wisc-rsch.arpa
Subject: Karmarkar Algorithm

          [Forwarded from the SRI-AI bboard by Laws@SRI-AI.]

The Karmarkar algorithm was presented at STOC (Symposium on Theory of
Computing) on May 1, 1984 (STOC '84, p. 302)
"A New Polynomial time Algorithm on Linear Programming".
The STOC proceedings are available from the ACM if your
location doesn't have them.

------------------------------

Date: Mon 26 Nov 84 17:33:08-PST
From: Walter Murray <OR.MURRAY@SU-SIERRA.ARPA>
Subject: Linear Progamming Algorithms.

         [Forwarded from the Stanford bboard by CKaun@AIDS-UNIX.]

Some recent bboard messages have referred to linear programming. The
algorithm by Karmarkar is almost identical with iterative reweighted
least squares (IRLS). This latter algorithm is used to solve approximation
problems other than in the l2 norm. It can be shown that the form of
LP assumed by Karmarkar is equivalent to an l infinity approximation
problem. If this problem is then solved by the IRLS algorithm the
estimates of the solution generated are identical to those of the
Karmarkar algorithm (assuming certain free choices in the definition
of the algorithms). Perhaps it should be added that the algorithm is
not held in high regard in approximation circles. To solve a
an l infinity problem it is usually transformed to an LP and solved using
the simplex method.

A message from Kaun (forwarded by Malachi without a heading)
described an algorithm for LP which Kaund claimed requires o(n^3) work. It
is easy to demonstrate the algorithm may fail to converge to the solution.
The following is a cross-section of a hole consisting of straight
sides. Water is poured into this hole from the point x.


                  x


                o                       o
                .                       .
                o                       o
                 .                     .
                  .                   .
                   o                 o
                      .         .
                           o

The water hits a facet. It continues to fall until it hits a second
facet which is a vertex. Unless the water is prepared to leave the first facet
hit it will not reach the bottom.

------------------------------

Date: 27 Nov 84 11:45:16 PST (Tue)
From: Carl Kaun <ckaun@aids-unix>
Subject: Linear Programming Algorithm

Murray is correct -- the algorithm as stated will not usually converge
to the solution.  One problem is that removing components from the
gradient does not automatically force it to zero after N steps, as I
asserted.  It looks to me like the gradient stepping idea can still be
used in a more complicated scheme, and that the computational time for
the algorithm will be o(M*N^2), where M is the number of constraints.
But I want to verify the details better than I did for my original
message before saying more.

I still wonder if finding such a solution to the continuous (as opposed to
the integer) linear programming problem has any significance.

                                                ckaun@aids-unix

------------------------------

Date: 27 Nov 84 22:31:30 PST (Tue)
From: Carl Kaun <ckaun@aids-unix>
Subject: Gradient Step Linear Programming (again)


Well, here we go again.  Let's see if this try stands up to scrutiny.
The claim is that the algorithm following (gradient step linear
programing) solves the linear programming problem in at most
o(M^2*N +M*N^3) operations.  I still don't know if that has any significance.
As before, the idea is to step, as best one can subject to the constraints,
along the gradient.  The terminating conditions are similar to the
algorithm given previously.

As before, the mathematical notation represents vectors by preceding them
with an underline, as "_x".  Subscripts are represented using a colon, _c:j
being the j-th vector _c. The inner product is represented by < * , * >.  A
functional form (i.e. f(  )) is used to represent things like sums. The rest
should be fairly obvious.

The statement of the linear programming problem is also as before, being to
maximize with respect to the N-dimensional vector _x the linear functional:
               <_c , _x >
 subject to the constraints:
               <_a:j , _x > >= b:j   for j = 1, 2, ..., M
M>= N, as otherwise the solution is unbounded.

Assume for the moment an initial feasible vector _x(0) in the interior (so
that there are initially no active constraints) of the polytope defined by
the constraints. _c:0 = _c.  All constraints are potentially active.

A.  From the current solution point _x(n), find the constraint limiting
motion in the direction _c:n, and the maximum feasible step size s>0  giving
the next solution point:  _x(n+1) = _x(n) + s*_c:n
    For j = 1, 2, ... M and j not a currently active constraint, compute
          D:j = <_x(n), _a:j> - b:j    ( >= 0 )
          s:j = - D:j / <_c(n), _a:j>
    s = min { s:j | s:j>0} , and the next active constraint has the index
j(n) providing the minimum.

B.  The next step is to compute a movement direction aligned with the
gradient (thus enabling improvement in the functional) that also satisfies
the active constraints.  The first active constraint was identified in the
previous step, thus:
          _c(0) = _c - _a:j(n) * <_c, _a:j(n)> / <_a:j(n), _a:j(n)>

C.  Next determine which of the constraints active in the previous cycle
are active in this step, and modify the movement direction accordingly.  A
previously active constraint a:j is active in this cycle if
          <_a:j, _c(i)> < 0.
That is, motion along the current direction _c(i) would violate the
constraint.  If the constraint is active, then the Gram-Schmidt
procedure is applied to _a:j to orthogonalize the vectors involved and
thereby determine the component to be removed from _c(i), yielding _c(i+1).
          _a(i) = _a:j - sum (n=0 to i-1) [ <_a:j, _a(n)> / <a:j, a:j> ]
          _c(i+1) = _c(i) - _a(i) * <_c(i), _a(i)> / <_a(i), _a(i)>
When all of the previously active constraints have been determined to be
either active or inactive for the current cycle, the next step direction is
          _c:n = _c(i), for the latest i.

(It appears necessary, for each determination of a _c(i), to scan the
entire set of constraints which were active in the previous cycle (but have
not yet been determined to be active in the current cycle) before deciding
that none is active in the current cycle.  Practically, there will
be only one active constraint in most of the cycles, and the
trajectory of the algorithm passes through various of the facets of the
polytope most of the time.)

The stopping condition results when _c(i) = _0; that is, when the objective
gradient _c lies in the cone formed from the combination of negatively
scaled gradients of the constraints.  This is the Kuhn-Tucker condition
of optimality.  Equivalently,N (linearly independent) constraints are found
to be active.  I don't remember that the Kuhn-Tucker conditions are
sufficient, but in any event this is the optimal point because there
is no feasible motion direction which improves the objective.

Unlike the previous algorithm, in this the identification of new constraints
can result in movement away from a previously active constraint.  When this
happens, the previously active constraint can be totally removed from further
consideration, due to the convexity of the problem (this assertion seems
obvious, but has not been PROVED by me).  The algorithm
encounters a new active constraint each cycle, and therefore converges
in at most M cycles, this being the maximum number of constraints that
can be newly encountered.  In practice again, the trajectory of the
algorithm will generally be such that convergence will occur in many fewer
cycles than M.

Steps A-C are repeated until the stopping condition occurs.

As indicated above, the algorithm converges in at most M cycles.  For each
cycle, step A requires o(N) multiplications and additions to compute the
inner product, etc. for each of o(M) constraints, for a total of o(MN)
operations.  Step B requires o(N) operations, which scarcely affects the
overall timing.  Step C can potentially result in the identification of N-1
active constraints.  Each such identification requires the removal of o(N)
orthogonal components, and each such removal entails o(N) operations, for
an overall count of o(N^3) operations to remove the effects of previously
identified active constraints.  Also, o(N) constraints may have to be
scanned to determine if they are active for each such identification,
each such determination requiring o(N) operations; resulting again in
a total of o(N^3) operations for step C.  Performing steps A and C therefore
requires o(M^2*N + M*N^3) operations.

An initial feasible point can be determined starting from an arbitrary point
(say the origin), identifying the unsatisfied constraints, and moving in
directions that satisfy them.  It may be more direct to simply start with a
"superoptimal" point, say K*_c for suitably large K, and iterate using
essentially the previously described algorithm along the negative constrained
gradient direction to feasibility.  The resulting feasible point
will also be optimal for the original problem.

                                                Carl F. Kaun

                                                ckaun@aids-UNIX
                                                415/941-3912

------------------------------

Date: Wed, 28 Nov 84 17:12:55 PST
From: IBM San Jose Research Laboratory Calendar
      <calendar%ibm-sj.csnet@csnet-relay.arpa>
Reply-to: IBM-SJ Calendar <CALENDAR%ibm-sj.csnet@csnet-relay.arpa>
Subject: Seminars - Search Complexity & User Interfaces  (IBM-SJ)

           [Forwarded from the SRI bboard by Laws@SRI-AI.]

                      IBM San Jose Research Lab
                           5600 Cottle Road
                         San Jose, CA 95193

                             CALENDAR
                       (DECEMBER 3 - 7, 1984)

  Wed., Dec. 5 Computer Science Seminar
  10:00 A.M.  HOW HARD IS NP-HARD?
  2C-012      This talk examines the average complexity of
            depth-first search for two different search models.
            The first model has no cutoff at unpromising internal
            nodes, but does terminate at a leaf node when the
            leaf node represents a successful search outcome.
            This model leads to an average complexity that grows
            anywhere from linearly to exponentially in the depth
            of the tree depending on the probability of choosing
            the best branch to search first at each internal node
            of the tree.  Good decisions lead to linear
            complexity and bad decisions lead to exponential
            complexity.  The second model examines tree searching
            with internal cutoff when unpromising paths are
            discovered.  In this model, the search terminates
            successfully when it reaches the first leaf.  The
            model is representive of branch-and-bound algorithms
            that guarantee that the first leaf reached is a
            successful leaf.  Roth's D-algorithm for generating
            test vectors for logic circuits fits this model, and
            White's efficient algorithm for solving the Traveling
            Salesman problem also fits except for the
            distribution cutoff probabilities.  Our model shows
            that the number of nodes visited during a depth-first
            search grows at most linearly on the average,
            regardless of cutoff probability.  If cutoff
            probability is very high, the search fails with a
            very high probability, and visits an average number
            of nodes that grows as O(1) as the tree depth
            increases.  If cutoff probability is very low, then
            the algorithm finds a successful leaf after visiting
            only O(N) nodes on the average where N is the depth
            of tree.  Many NP-complete problems can be solved by
            depth-first searches.  If such problems can be solved
            by algorithms that order the depth-search first to
            terminate at the first leaf, then this work and the
            work by Smith suggests that the average complexity
            might grow only polynomially in the tree depth,
            rather than exponentially as the worst-case analysis
            suggests.

            H. S. Stone, IBM Yorktown Research
            Host:  B. D. Rathi

  Thurs., Dec. 6 Computer Science Seminar
  10:00 A.M.  APPLICATIONS OF COGNITIVE COMPLEXITY THEORY
  2C-012      TO THE DESIGN OF USER INTERFACES
            The cognitive complexity project has two major
            objectives.  The first is to gain a theoretical
            understanding of the knowledge and thought processes
            that underlie successful use of computer-based
            systems (e.g., text editors).  The second goal is to
            develop a design technology that minimizes the
            cognitive complexity of such systems as seen by the
            user.  Cognitive complexity is defined as the amount,
            content, and structure of the knowledge required to
            operate a system.  In this particular work, the
            knowledge is described as a production system.  The
            computer-based system is described as a generalized
            transition network.  Quantitative predictions,
            derived from the production system, are shown to
            account for various aspects of user performance
            (e.g., training time).  The talk will include a brief
            presentation of the design methodology based on the
            production system formalism.

            Prof. D. E. Kieras, University of Michigan, Ann Arbor
            Prof. P. G. Polson, University of Colorado, Boulder
            Host:  J. L. Bennett

------------------------------

Date: Wed 28 Nov 84 17:24:47-PST
From: Dikran Karagueuzian <DIKRAN@SU-CSLI.ARPA>
Subject: Seminar - A Semantical Definition of Probability  (CSLI
         Stanford)

         [Excerpted from the CSLI Newsletter by Laws@SRI-AI.]


            SEMINAR IN LOGIC AND FOUNDATIONS OF MATHEMATICS

Speaker: Prof. Rolando Chuaqui, Catholic University of Chile and IMSSS
Title:   A Semantical Definition of Probability

Place:   Room 381-T, 1st floor Math. Corner
Time:    Monday, December 3, 4:15-5:30 p.m.

ABSTRACT:  The analysis proposed in this lecture is an attempt to formalize
both chance and degree of support.  Chance is considered as a dispositional
property of the objects plus the experimental conditions (i.e. what is
called the chance set-up).  Degree of support measures the support that the
evidence we have (i.e. what we accept as true) gives to propositions.
Chance, in this model, is determined by the set K of possible outcomes (or
results) of the chance set-up.  Each outcome is represented by a relational
structure of a certain kind.  This set of structures determines the algebra
of events, an algebra of subsets of K, and the probability measure through
invariance under a group of symmetries.  The propositions are represented
by the sentences of a formal language, and the probability of a sentence,
phi in K, P[K](phi), is the measure of the set of models of phi that are
in K.   P[K](phi) represents the degree of support of phi given K.  This
definition of probability can be applied to clarify the different methods
of statistical inference and decision theory.

------------------------------

Date: 27 November 1984 1607-EST
From: David Ackley@CMU-CS-A
Subject: Seminar - Learning in Stochastic Networks  (CMU)

           [Forwarded from the CMU bboard by Laws@SRI-AI.]

     "Learning evaluation functions in stochastic parallel networks"
                           Thesis Proposal
             Tuesday, December 4, 1984, at 3:30pm in 5409 WeH.

Although effective techniques exist for adjusting linear coefficients of
features to produce an improved heuristic evaluation of a game position,
the creation of useful features remains poorly understood.  Recent work
on parallel learning with the Boltzmann Machine suggests that the
creation of useful new features and the tuning of coefficients of
existing features can be integrated into a single learning process, but
the perceptual learning paradigm that underlies the Boltzmann Machine
formalism is substantially different from the reinforcement learning
paradigm that underlies most game-learning research.  The thesis work
will involve the development of a reinforcement-based parallel learning
algorithm that operates on a computational architecture similar to the
Boltzmann Machine, and drives the creation and refinement of an
evaluation function given only win/lose/draw reinforcement information
while playing a small game such as tic-tac-toe.  The thesis work will
test several novel ideas, and will have implications for a number of
issues in machine learning and knowledge representation.

------------------------------

End of AIList Digest
********************