[comp.ai] How do games learn?

marwk@levels.sait.edu.au (04/02/91)

> Go-moku or Runjyu
> -----------------
> Another comment:  In 1961 I wrote a playing/learning program (based on
> Samuel's Checker-learning method) for this game.  It (almost) ran on a
> computer now known as UNIVAC I but then known a THE UNIVAC.
>
> --Dick Wexelblat  (rlw@ida.org) 703 845 6601

How can one write a program that learns?  I do not see how.

One has parameters in the heuristic evaluation functions, so how can these
change based on experience?

Thanks.
Ray
--
University of South Australia   | Plus ca change, plus c'est la meme chose.
P.O. Box 1                      | Ghing thien me how, ming thien gung me how.
Ingle Farm                      | Knobs, knobs everywhere,
South Australia                 |              just vary a knob to think!

minsky@media-lab.MEDIA.MIT.EDU (Marvin Minsky) (04/03/91)

In article <16087.27f874ec@levels.sait.edu.au>
marwk@levels.sait.edu.au writes in regard to Samuel's Checker-learning method
>
>How can one write a program that learns?  I do not see how.
>One has parameters in the heuristic evaluation functions, so how can these
>change based on experience?

This is explained in the classic paper, A. L. Samuel, "Some studies in
machine learning using the game of checkers," IBM J. Res. 
Dev.,vol. 3,pp. 211-219;July,1959.  There is a summary of this in my
"Steps toward AI" paper which you can find reprinted in a book called
Computers and Thought, by Feigenbaum and Feldman.  Samuel uses several
different learning methods,  and has an important discussion of the
"credit assignment" problem.

markh@csd4.csd.uwm.edu (Mark William Hopkins) (04/04/91)

In article <16087.27f874ec@levels.sait.edu.au> marwk@levels.sait.edu.au writes:
>How can one write a program that learns?  I do not see how.
>
>One has parameters in the heuristic evaluation functions, so how can these
>change based on experience?

("Least Squares Method")
By minimizing the square of an error function using any of a standard set of
minimization routines covered in numerical techniques texts...

It's up to you to determine how to form the error function.  The most natural
way is to use a search technique to find an optimum value for some evaluation
function looking several moves ahead, and use that as the basis for determining
corrections to be made on the evaluation function applied to the current board.

You need boundary conditions.  Evaluation on a terminal board is done without
search and error-correction to a WIN, LOSE (or DRAW).

Reinforcement learning uses more complicated techniques to accomplish the
the same goal of training a predictor.

Also, boundary conditions don't actually prove necessary all the time.  It is
sometimes possible to train the program to learn what constitutes a WIN, LOSS
or DRAW board from zero-knowledge.

I don't think the Samuel's checker playing program used any boundary
conditions...

rlw@IDA.ORG (Richard Wexelblat) (04/05/91)

In article <16087.27f874ec@levels.sait.edu.au> marwk@levels.sait.edu.au writes:
>How can one write a program that learns?  I do not see how.
>
>One has parameters in the heuristic evaluation functions, so how can these
>change based on experience?

If you define learning as "plays better over time, based on experience"
then the trivial algorithm I used was (in short form) thus.

Assume a set of numeric values that are combined in some way to
determine the value of a move.  Determine in a suitable way (see below)
that in a given situation Move B (which the program didn't make) would
have been better than Move A (which the computer DID make).  Bump the
values a delta to increase the likelihood of B being made next time an
A/B choice is encountered.

How to determine a better move?  Ask an expert... or play out all variations
of an endgame from a certain point... or play a fixed vs. a randomly
modified strategy.  There are many variants based on having a program
play against itself.

-- 
--Dick Wexelblat  (rlw@ida.org) 703 845 6601
  Can you accept an out of state sanity check?