[comp.ai.neural-nets] Data Complexity

aam9n@uvaee.ee.virginia.EDU (Ali Minai) (10/06/89)

I have a question which might or might not make sense. Still, since
it is of great interest to me, I shall be grateful if people share
their views, suggestions and references on this topic with me. Of
course, I shall be particularly grateful if someone pointed out the
absurdity of the whole issue.

The question is this:

Given deterministic data {(X,Y)} where X is in a finite interval of
n-d real space and Y is in a finite interval of m-d real space,
what *structural* measures (if any) have people suggested for
the "complexity" of the data?

This immediately raises several other issues:

1) What possible notions of "complexity" might one admit in this regard?

2) What might be the *order* of this complexity? Should it, for example,
be defined over the set of all points, all pairs, all triplets etc.?

3) Given that the data is not uniformly distributed over the domain (i.e.
it is "lumpy"), what assumptions must be made regarding the blank areas?
Should these be statistical? (probably yes).

4) How can such a "complexity" measure be made scale-invariant?

etc.

Also, what about such complexity measures for continuous functions?
I mean measures defined structurally, not according to the type
of the function (e.g. degree for polynomials).

My gut feeling is that some sort of information-based measure is
appropriate, but whatever is used must be efficiently computable.
Have other people tried to deal with this problem? Since I am sure
they have, where can I find the material? What would be the appropriate
area of mathematics to look for such references? I am currently searching
approximation theory, statistical modelling, maximum-entropy theory,
measure theory, information theory, complexity theory (both system
and computational) and non-linear dynamics literature. Obviously, I
will miss more than I will find, hence this request. A particularly
helpful thing would be names of journals which regularly carry material
relevant to this issue. Also, is the International Journal of General
Systems still published? If not, what did it turn into? Or rather, who
carries articles about "the theory of things", "the calculus of
indications" etc? I am extremely interested.

Thank you.

Ali Minai
Dept. of Electrical Engg.
Thornton Hall,
University of Virginia,
Charlottesville, VA 22903.

aam9n@uvaee.ee.virginia.edu

pa1159@sdcc13.ucsd.EDU (Matt Kennel) (10/07/89)

In article <517@uvaee.ee.virginia.EDU> aam9n@uvaee.ee.virginia.EDU (Ali Minai) writes:
>
>
>The question is this:
>
>Given deterministic data {(X,Y)} where X is in a finite interval of
>n-d real space and Y is in a finite interval of m-d real space,
>what *structural* measures (if any) have people suggested for
>the "complexity" of the data?
>

I don't know what other people have suggested, but I'll spout
out a suggestion of my own: crib ideas from the nonlinear dynamics
people.  For a rough-and-ready measure of the complexity of the
input space, why not use fractal dimension?  And for the output
space, what about the sum of the positive Lyapunov exponents of the
nonlinear mapping?


>4) How can such a "complexity" measure be made scale-invariant?

That's exactly what the fractal dimension does?
>
>etc.
>
>Also, what about such complexity measures for continuous functions?
>I mean measures defined structurally, not according to the type
>of the function (e.g. degree for polynomials).

I don't understand what you mean by "defined structurally."

>Ali Minai
>aam9n@uvaee.ee.virginia.edu

Matt Kennel
UCSD physics
pa1159@sdcc13.ucsd.edu

cybsys@bingvaxu.cc.binghamton.edu (CYBSYS-L Moderator) (10/10/89)

[ Cross-posted from CYBSYS-L@BINGVMB ]
Really-From: heirich%cs@ucsd.edu (Alan Heirich)

I am also interested in the question of complexity measures raised by
Ali Minai.  I think it is an important issue for anyone concerned with
self organizing systems -- i.e. how can you measure the extent to which
a system is organized?  

Some candidates I've thought of are Shannon entropy, Kolmogorov entropy,
"integrality" (a measure defined by I-don't-remember-who, discussed by
Brooks in the recent book "Information, entropy and evolution", which I
assume readers of this list are familiar with, or should be).  I would
like to hear about other possibilties.

-------------------------
Alan Heirich     Comp. Sci. & Eng., Cognitive Science
C-014 University of California, San Diego 92093

heirich@cs.ucsd.edu
aheirich@ucsd.bitnet

ib@apolling (Ivan N. Bach) (10/12/89)

Many scientists believe that you cannot accept something as a science unless you
can measure it in some way.  Scientists often prove or disprove their theories by
performing experiments in which they measure some key quantities, and then decide 
whether the results of their measurements are in accordance with the proposed theory.
You must also be able to repeat such experiments.

When it comes to expert systems and neural nets, many authors tend to make all
kinds of extravagant claims that are usually not supported by any objective measurements.
For example, they claim that a given expert system has achieved the capability of
a human expert.  This would, presumably, mean that the expert system is as complex as
those components of the human brain that are the material basis for the human
expertise, unless the expert system is designed more efficiently than the human brain.

When I worked as a consultant at AT&T Bell Laboratories, I did some research on the
measuring of the complexity of biological and artificial information systems.  I was
surprised by how much work has already been done in this area.  I came to the conclusion
that you can use entropy calculations to determine the MAXIMUM amount of information that
can be stored in a particular information system, i.e., its maximum information capacity.

For example, I calculated the maximum amount of information that can be stored in a
fused human egg cell because that egg cell contains a blueprint for a human being.
One way of measuring the complexity of any system is to find out how much information
is needed to completely describe a blueprint for that system, and, for example, transmit
it to a nearby star.  The instructions for building all the proteins in a human body
are stored in DNA molecules in the nucleus of the egg cell.  The information is encoded
by using an alphabet that consists of just four characters A (adenine), C (cytosine),
G (guanine), and T (thymine).  To calculate the maximum information capacity of a DNA
DNA molecule, you have to assume that the distribution of bases is completely random,
i.e., that there are no restrictions imposed on the order in which bases can appear
in the DNA chain.  The distribution of bases in real cells is not random, and, therefore,
the actual information capacities of such cells are less than the potential maxima.
The maximum capacity of a fused human egg cell is several gigabits.

Once the genes of a human being are completely mapped, we will be able to calculate
exactly the information capacity of those genes.  In the meantime, we can calculate
the maximum possible capacity of human genes.  That is better than not having an
objective measurement, and it has a number of precedents in mathematics.  Mathematicians
can sometimes determine the limits within the values of a particular function must be, 
but they may not be able to calculate the actual function values.
   
I also came to the conclusion that the information capacity of an information system
depends on the number of internal states that are actually used to store information.
For example, you can view a stone as a system with just one internal state (0 bits of
information) if you do not use its atoms to store information.  You can view an electrical
relay that can be closed or open as a system with just two internal states (1 bit of
information) if you use the open state to store a 0 and the closed state to store a 1,
etc.  You can calculate the maximum information capacity of an expert system by taking
into account the total number of characters and the number of characters in the alphabet.

Researchers have calculated the information capacities of viruses, insects, mammals, etc.
It is interesting that the ordering of biological systems by their information capacity
corresponds very closely with the usual ordering of biological systems into lower (less
complex) and higher (more complex) species.  After you calculate the information capacity
of different artificial and biological systems, you can present your results in the form
of a diagram.


        relay           expert system 
          |                 |                                           information
          V                 V                                           capacity in bits
       0 ++--...---+--------+---------------------------------...---+-------------->
         ^         ^                                                ^ 
         |         |                                                |
       stone    virus                                         human egg cell


I came to the conclusion that an artificial system could achieve the same level of
intelligence (information processing) as a human being with a smaller number of
internal states if its design was more optimal than the design of the human being. 
Unfortunately, over several billion years nature has used genetic algorithms to
produce biological information systems of incredible complexity and optimization.
The level of miniaturization used in DNA is about four orders of magnitude greater than 
the level of miniaturization used in integrated circuits with a 1-micron geometry.  

British researchers could not figure out how the information for producing all the
proteins needed to construct a virus could be encoded in just a couple of thousand
bases.  The information stored in DNA is read in triplets.  They discovered that the
code for one protein started at a certain location on the DNA chain.  The code for
another protein started on the next base, i.e., the codes overlapped.

I came to the conclusion that an artificial intelligence system that would be at
the same intellectual level as a human being would have to be extremely complex even
if its design is highly optimized.  We will need a very large number of internal states
to implement an artificial subsystem that would imitate the capabilities of, for
example, a human eye simply because of the large amount of information that must be
processed in real time.  I do not believe that such complex systems can be produced
by conventional coding of computer systems.  I think that we will have to use automatic,
self-organizing methods based on genetic algorithms to automatically and gradually develop 
more and more complex artificial systems until we eventually achieve and surpass the 
complexity and intellectual capability of human beings.

An artificial information system will have a different internal model of the
outside world than a human being, and a totally alien set of values, unless such a
system has the shape of a human baby and goes through the experiences of adolescence,
adulthood, and old age, or we control the information it receives from the outside world
in such a way that it gets the impression that it is a human baby and that it is going
through the experiences of a human being.  I am not saying that we should try to create
such an artificial system.  I am just saying that an immobile artificial system that
does not have the shape of a human body, and that receives information from the outside
world through some sensors that have no relationship to human senses, cannot possibly
develop an intellect similar to ours.  We are more likely to develop an alien race
of artifical beings than to develop artificial beings which cannot be distinguished from
human beings as described in Isaac Asimov's novels on robots.  

If you are interested in this topic, I can post detailed entropy calculations and 
references to relevant books and articles.

thearlin@vdsvax.crd.ge.com (Thearling Kurt H) (10/13/89)

In article <4522@imagen.UUCP> ib@apolling (Ivan N. Bach) writes:
>
>If you are interested in this topic, I can post detailed entropy calculations and 
>references to relevant books and articles.


Please do.  If not, could you send it to me via email?

kurt

-----------------------------------------------------------------------
Kurt Thearling                                     General Electric CRD
                                                    Bldg. KW, Room C301
thearlin@vdsvax.crd.ge.com                                   P.O. Box 8
kurt@bach.csg.uiuc.edu                           Schenectady, NY  12301
-----------------------------------------------------------------------

schiebel@a.cs.wvu.wvnet.edu (Darrell Schiebel) (10/14/89)

In article <4522@imagen.UUCP>, ib@apolling (Ivan N. Bach) writes:

> If you are interested in this topic, I can post detailed entropy calculations and 
> references to relevant books and articles.

Yes, I'd very much like to see more about this topic, especially a
bibliography.

Darrell Schiebel

(schiebel@a.cs.wvu.wvnet.edu)

ib@apolling (Ivan N. Bach) (10/16/89)

How can we measure the capacity to store and transmit information? Claude Shannon,
founder of the theory of information, suggested that the capacity of a system
to store and transmit information can be determined by calculating the degree of
randomness or disorganization in the system [1]. He decided to call this measure
"entropy," because he realized that his formula:

H = -K x sum of (pi x log pi) for all i=1 to i=a

(where "H" is the entropy of a system with "a" possible states or events, "K" is
a constant, and "pi" is the probability of occurrence of the i-th state or event)
was essentially the same as the formula that is used in statistical thermodynamics
to calculate the entropy or disorganization of a thermodynamic system.

It can be proved mathematically [2] that the entropy of a system will be at its
maximum when all system states or events are equally probable and independent,
i.e., when the probability of each event in a system with "a" states or events
is equal to 1/a, and the probability of an event is independent from the probabilities
of all other events. This maximum entropy is equal to log a.

When we set the constant "K" to 1, and use base 2 to calculate logarithms, the unit
of entropy is called a "bit."

One of the best books on the use of Shannon's entropy formula to calculate the
information capacity of biological information systems is the book "Information
Theory and The Living System" written by Lila Gatlin [3].

If a system has only one state, the probability of occurrence of this state is
equal to 1. The entropy of such a system is equal to zero.

The entropy of an information system with 2 states will be maximum when the
probability of each event is equal to 1/2. This follows from the basic requirement
that the sum of all probabilities of all possible events must always be equal to 1.
The entropy of such a system is:

Hmax = -(p1 x log2 of p1 + p2 x log2 of p2) bits
= -(1/2 x log2 of 1/2 + 1/2 x log2 of 1/2) bits
= -(log2 of 1/2) bits
= log2 of 2 bits (because -log2 of 1/a = log2 of a)
= 1 bit

Computer programmers usually think of a single-bit variable as a variable which can
have either value 0 or value 1. The maximum entropy or information capacity of a
single-bit variable is 1 bit.

Lila Gatlin showed that Claude Shannon's formula could be used to calculate the
entropy of a randomly chosen base in the DNA chain of bases in the nucleus of a living
cell. If all four types of bases in a DNA chain are equiprobable, the entropy of a
randomly chosen base in the DNA chain is equal to:

Hmax = -log2 of 1/4 bits
= log2 of 4 bits
= 2 bits

Computer programmers know that they have to use at least 2 bits to encode 4 different
values. The maximum entropy for a randomly chosen symbol from a sequence of symbols
selected from an alphabet of "a" symbols (a=4 for DNA) is equal to log2 of "a" bits.

If all bases in a DNA chain are not equiprobable, the probability of a certain type
of base occurring at a random position in a DNA chain is proportional to the number of
bases of its type in the whole chain, i.e., it depends on the composition of DNA.
All bases in the DNA of "Escherichia coli" bacteria are equiprobable. Bases in the
DNA of "Micrococcus lysodeikticus" appear with the following probabilities:

p(A)=p(T)=.145 and p(C)=p(G)=.355

because adenine (A) bonds only to thymine (T), and cytosine (C) bonds only to guaning
(G). Therefore, the entropy of a randomly chosen base in the DNA chain of "Micrococcus
lysodeikticus" is equal to:

H = -(.145 x log2 of .145 + .145 x log2 of .145 + .355 x log2 of .355 +
.355 x log2 of .355) bits
= 1.87 bits

Notice that this value is smaller than the maximum value of 2 bits for equiprobable
bases.

Lila Gatlin came to the conclusion that the entropy of a complex system is equal to
the sum of entropies of its components.

The number of base pairs per sex or germ cell ranges from 10,000 for bacteria to over
1,000,000,000 for mammals. If we assume equiprobable bases, the maximum total entropy
of the DNA chain in a human sex cell is about 5.8 gigabits (2.9 billion nucleotide pairs
x 2 bits per nucleotide pair). When a sperm cell fuses with an egg cell, the maximum
entropy of the DNA chain in the nucleus of the fertilized cell is equal to 11.6 gigabits.

Notice that the DNA in a fertilized egg cell contains just a "blueprint" for a human body.
Dancoff and Quastler [4] [5] tried to estimate very roughly the entropy of an adult human
body by calculating the amount of information required to specify the type, position, and
orientation of atoms in the body. There are probably about 60 different types of atoms in
the human body. Assuming 64 different atoms which appear with equal frequency, the
maximum entropy of a randomly selected atom is equal to 6 bits. However, not all types
of atoms occur with the same frequency. Only three types of atoms, carbon, hydrogen and
oxygen, are present in significant amounts. Their approximate relative frequencies in
the human body are 0.66, 0.22 and 0.07, respectively. The frequency of all other types
of atoms is about 0.05. Therefore, the actual entropy of a human body is about 1.4 bits
per atom. About 23 more bits are needed to specify proper orientation of each atom. That
brings the total entropy per atom to about 24.4 bits. It is not necessary to specify
that many bits per each atom, because more than 90% of body's weight consists of water
and other comparatively featureless substances. Therefore, the total entropy of a
human body is:

H = 24.4 bits x 10% of 7x10 to the power of 27 (the total number of atoms in the body)
= 2x10 to the power of 28 bits (or 10 to the power of 24 pages of English text)

Similar astronomical figures were calculated in different ways by Linschitz [6] and
Morowitz [7].

What I find fascinating is that the information needed to specify the blueprint for a
human body is so efficiently compressed that it can be placed into the nucleus of each
of about 50 trillion cells in that body. Each sex cell contains only half a blueprint.
Each non-sex cell contains all the information needed to build every type of protein in
a human body. Which types of proteins are actually built depends on the position of
biological switches.

Several valuable lessons relevant to the design of extremely complex neural networks
can be deduced from the studies of biological systems.

1. The storage of a neural network is usually measured in the number of interconnects.
If you can calculate the number of different states of a neural network, you can
calculate its maximum information capacity by assuming that all states are
equiprobable. If the probability of occurrence of each state is either known or
can be calculated, you can calculate the actual information capacity of your
network.

Once you can calculate the capacity of different networks, you can objectively
compare them, and determine which network represents the most efficient way of
storing information.

In Fig. 2.13 on page 33 of the DARPA Neural Network Study [27], you can see
a comparison of the storage and speed of a leech, worm, fly, etc. It is
estimated that a human brain has the storage capacity of 10 to the power
of 14 interconnects, and the speed of 10 to the power of 16 interconnects per
second.

2. Many researchers are trying to find analytical methods for designing optimal neural
networks. I suggest that we use genetic algorithms to design extremely complex and
optimized neural networks automatically. Genetic algorithms require an objective
function or a goal. We can specify the minimum number of neural nodes or the minimum
number of layers in a neural network as our goal. We can then use a random process
to generate generations of neural nets. We can use the objective function to
evaluate members of each generation, and discard those members which do not achieve
a good enough result with respect to the objective function. We can then cross
the most successful members in each generation until we get an optimal network.

Research has shown that application of a priori knowledge of which areas of the
domain of a complex objective function should be explored usually results in
suboptimal results, i.e., in the finding of a local optimum. Genetic algorithms
based on random processes, thoroughly but not exhaustively explore all areas
of the objetive function's domain, and usually result in the finding of an overall
or global optimum [7][8].

I have come to the conclusion that very intelligent systems must be very
complex. We cannot use conventional computer programming methods to specify
each detail of an extremely complex information system. If we restrict ourselves
to only analytical methods of constructing neural networks, we will not
be able to construct networks that will be complex enough to have the information
capacity equal to or greater than the capacity of a human brain.

3. The methods used to construct human neural networks are totally different from
the methods which are now used to construct artificial neural networks and design
complex VLSI circuits. The technology of biological systems can be viewed as
an extremely advanced and alien technology. To not take advantage of this
technology would be like coming across a UFO or a planet with very advanced
technology, and passing up the opportunity to study and apply that technology.

The Turing machine is an abstract automaton with a head that is used to read
instructions from an abstract tape. Most computer scientists think that the
Turing machine is used to prove theorems in the theory of automata, and that
such a mechanism should not be implemented because there is no practical use
for it. If you compare the diagram of a Turing machine in a textbook
on the theory of automata with the diagram of a ribosome in a textbook on
genetics or molecular biology, you will notice remarkable similarities.

A ribosome in the cytoplam of a cell has a head which is used to read the
instructions for the building of proteins which are stored on a "tape"
called messenger RNA. Since ribosomes (the work stations for the manufacturing
of proteins) are located outside the nucleus of a cell, and the DNA molecules
are safely stored in the nucleus, messenger RNA molecules are needed to copy
the instructions for building proteins from DNA molecules, and carry them to
ribosomes.

I would not be surprised if our colleagues who are reading the "sci.nanotech"
newsgroup expressed an interest in such a mechanism. There must be a number
of other ingenious mechanisms in our cells that are just waiting to be
discovered.

4. Instead of constructing a full-blown neural network, we could specify a
"blueprint" for contructing that network. I think that the amount of information
in the blueprint can always be smaller than the amount of information in the
network if there is any redundancy or repetition in the structure of the
network. If a very complex network is used to store large amounts of
similar types of information, whole sections of that network should have
identical or very similar structure. There is no reason why we should use
a different structure each time we want to store a data item of a certain
type. If whole areas of the network are going to have the same structure,
we could specify in the blueprint that a certain number of subnetworks of
a given type is needed in a particular location, and we may not care how each
detail of each subnetwork is going to be constructed. I suspect that Mother
Nature had coded recursive subroutine calls into our DNA molecules long time
before computer programmers rediscovered such calls.

5. If you use a certain "alphabet" of components to construct your neural net,
you should use each type of component with the same frequency if you want
achieve the maximum information capacity.

Main references for the artificial evolution and genetic algorithms seem to be:

A. Turing (1936-1952) [8]
Turing machines, the chemical basis of morphogenesis.

J. von Neumann (1952-53) [9]
Reproduction and evolution of automata.

L. Fogel, A. Owens, and M. Walsh (1966) [10]
Artificial evolution through simulated evolution.

J. Bagley (1967) [11]
Behavior of adaptive systems.

R. Rosenberg (1967) [12]
Relationship between genotype and phenotype.

R. Weinberger (1970) [13]
Computer simulation of a living cell.

E. Goodman (1972) [14]
Adaptive behavior of simulated bacterial cells.

N. Foo and J. Bosworth (1972) [15]
Various aspects of genetic operators.

D. Frantz (1972) [16]
Non-linearities in genetic adaptive research.

J. Holland (1962, 1975 and 1976) [17][18][19]
Theory of adaptive systems and spontaneous emergence of self-reproducing
automata.

K. DeJong, A. Brindle, and A. Bethke (1975-81) [20]
Genetic algorithms and function optimization.

S. Smith (1980) [21]
Learning systems based on genetic algorithms.

J. Grefenstette (1983) [22]
Optimization of genetic search algorithms.

M. Mauldin (1984) [23]
Maintaining diversity in genetic search.

A. Goldberg and I. Pohl (1984) [24]
Complexity theory and AI research.

D. Goldberg (1985, 1989) [25][26]
Application of genetic algorithms.

REFERENCES

1. Shannon, C. E., "A Mathematical Theory of Communication," "Bell System Technical
Journal," Vol. 27, pp. 379-423 and 623-656. (Reprinted in C. E. Shannon and W.
Weaver, "The Mathematical Theory of Communication," pp. 3-91, U. of Illinois
Press, Urbana, 1949.)

2. Khinchin, A. I., "Mathematical Foundations of Information Theory," Dover, New York,
1957.

3. Gatlin, Lila, "Information Theory and The Living System," Columbia University Press,
New York, 1972.

4. Dancoff, S. M. and H. Quastler, "The Information Content and Error Rate of Living
Things," in "Information Theory in Biology" edited by H. Quastler, U. of Illinois Press,
Urbana, 1953.

5. Calow, Peter, "Biological Machines: A Cybernetic Approach to Life," Edward Arnold
Publishing Co., London, England, 1976.

6. Linschitz, H., "Information Content of a Bacterial Cell" in "Information Theory in
Biology," edited by H. Quastler, U. of Illinois Press, Urbana, 1953.

7. Morowitz, H. J., "Bull. Maths. Biophys.," Vol. 17, 1955, pp. 81-86.

8. Turing, A. M., "The Chemical Basis of Morphogenesis," "Phil. Trans. Roy. Soc.," ser.
B, No. 237, London, 1952.

9. Von Neumann, John, a lecture at the U. of Illinois published in: Arthur W. Burks, ed.,
"Theory of Self-Reproducing Automata," U. of Illinois Press, Urbana, 1966.

10. Fogel, L. Owens et al., "Artificial Intelligence Through Simulated Evolution,"
John Wiley and Sons, Inc., New York, 1966.

11. Bagley, J. D., "The Behavior of Adaptive Systems Which Employ Genetic and
Correlation Algorithms," Ph.D. thesis, U. of Michigan, Ann Arbor, 1967.

12. Rosenberg, R. S., "Simulation of Genetic Populations with Biochemical Properties,"
Ph.D. thesis, U. of Michigan, Ann Arbor, 1967.

13. Weinberger, R., "Computer Simulation of a Living Cell," Ph.D. thesis, U. of
Michigan, Ann Arbor, 1970.

14. Goodman, E. D., "Adaptive Behavior of Simulated Bacterial Cells Subjected to
Nutritional Shifts," Ph.D. thesis, U. of Michigan, Ann Arbor, 1972.

15. Foo, N. Y. and J. L. Bosworth, "Algebraic, geometric, and stochastic aspects of
genetic operators," Tech. Rep. 003120-2-T, U. of Michigan, Ann Arbor, 1972.

16. Frantz, D. R., "Non-linearities in Genetic Adaptive Search," Ph.D. thesis,
U. of Michigan, Ann Arbor, 1972.

l7. Holland, J. H. "Concerning Efficient Adaptive Systems," pp. 215-230 of "Self-
Organizing Systems," edited by M. C. Yovits, G. T. Jacobi, and G. J. Goldstein,
Spartan Books, Washington, D.C., 1962.

18. Holland, J. H., "Outline for a Logical Theory of Adaptive Systems," Journal
of the Assoc. for Computing Machinery, Vol. 9, No. 3, July 1962, pp. 297-314.

19. Holland, J. H., "Studies of the Spontaneous Emergence of Self-Replicating
Systems Using Cellular Automata and Formal Grammars," pp. 385-404 from "Automata,
Languages, Development," edited by Aristid Lindenmayer and Grzegorz Rosenberg,
North-Holland Publishing Company, Amsterdam, 1976.

20. DeJong, K. A., "Analysis of the Behavior of a Class of Genetic Adaptive
Systems," Ph.D. thesis, Dept. of Computer and Communications Sciences,
U. of Michigan, Ann Arbor, 1975.

21. Smith, S., "A Learning System Based on Genetic Adaptive Algorithms," Ph.D.
thesis, U. of Pittsburgh, Pa., 1980.

22. Grefenstette, J. G., "Optimization of Genetic Search Algorithms," Tech. Rep.
CS-83-14, Vanderbilt U., 1983.

23. Mauldin, L. Michael, "Maintaining Diversity in Genetic Search," Proc. of
the 1984 National Conference on Artificial Intelligence, pp. 247-250.

24. Goldberg, Allen and Ira Pohl, "Is Complexity Theory of Use to AI?," pp.
43-55 from "Artificial Human Intelligence," edited by A. Elithorn and R.
Banerji, Elsevier Science Publishers, Amsterdam, 1984.

25. Goldberg, David, "Dynamic System Control Using Rule Learning and Genetic
Algorithms," Proc., of the 1985 National Conference on AI, pp. 588-592.

26. Goldberg, E. David, "Genetic Algorithms In Search, Optimization and Machine
Learning," Addison-Wesley Publishing Company, Inc., Reading, 1989.

27. "DARPA Neural Network Study," AFCEA International Press, Fairfax, 1988.

ted@nmsu.edu (Ted Dunning) (10/17/89)

ivan bach is very enthusiastic, but relatively uninformed about recent
developments in determining the complexity of finite sequences.

he also has an annoying tendency to post news that is more than 80
columns.

here is a partial commentary on his posting,

In article <4542@imagen.UUCP> ib@apolling (Ivan N. Bach) writes:

   Path: opus!lanl!cmcl2!nrl-cmf!think!ames!amdcad!sun!imagen!daemon
   From: ib@apolling (Ivan N. Bach)

   Claude Shannon, founder of the theory of information, ...  He
   decided to call this measure "entropy," because he realized that
   his formula:

	H = -K x sum of (pi x log pi) for all i=1 to i=a

shannon actually qualified this measure extensively.  it is only valid
if successive symbols are independent, and only valid for stationary
ergodic sequences.  none of these qualifications apply in most
biological systems.

   Lila Gatlin showed that Claude Shannon's formula could be used to
   calculate the entropy of a randomly chosen base in the DNA chain of
   bases in the nucleus of a living cell.  If all four types of bases
   in a DNA chain are equiprobable, the entropy of a randomly chosen
   base in the DNA chain is equal to: ... = 2 bits

actuallly lila gatlin showed that you can misuse shannon's formula to
get an upper bound on the amount of information in a dna chain.

   Computer programmers know that they have to use at least 2 bits to
   Cencode 4 different values.

real computer programmers know that this is not true.

   If all bases in a DNA chain are not equiprobable, the probability
   of a certain type of base occurring at a random position in a DNA
   chain is proportional to the number of bases of its type in the

	...  Therefore, the entropy of a randomly chosen base in the
   DNA chain of "Micrococcus lysodeikticus" is equal to:

	 = 1.87 bits

this is entirely wrong.  see below.

   Lila Gatlin came to the conclusion that the entropy of a complex
   system is equal to the sum of entropies of its components.

lila is again mistaken, unless what actually was said was that the
entropy of a system composed of _independent_ parts is the sum of the
entropies of its components.  of course _complex_ systems generally do
not allow such naive decomposition.

	[ arithmetic maunderings about bits and blueprints deleted ]

	[ fuzzy comparison of a turing machine to protein synthesis ]


   4. Instead of constructing a full-blown neural network, we could
   specify a "blueprint" for contructing that network.  I think that
   the amount of information in the blueprint can always be smaller
   than the amount of information in the network if there is any
   redundancy or repetition in the structure of the network.

actually the blueprint must be less than or equal to the network.
otherwise, we would just use the network as the blueprint.  see below
for a reasonable definition of information content.

   5. If you use a certain "alphaneural net, you should use each type
   of component with the same frequency if you want achieve the
   maximum information capacity.

of course maximum information _content_ (or capacity) would also be
obtained by making all components completely independent.  this is the
same as recommending that people use RAM for their neural net.  you
_don't_ want a system with maximum content.  you want a system that
encodes and carries meaning.  these are very different things.



----------------------------------------------------------------


shannon's formulae are simple enough to be misused extensively by all
kinds of people.  such misuse does not mean that they are any more
applicable than they were in 1949.  in particular, if you compute the
entropy of english text, you over-estimate the information content by
a factor of 4 or more.  for instance, this posting has an entropy of
4.6 bits per letter, while english is generally accepted to have an
information content of 1 bit per letter or less.  the difference is
entirely due to the assumption of independence between letters.

an even more dramatic example is a sequence formed by the regular
expression {abcd}*.  if we take any 100 characters from a random
location in such a sequence, and compute the entropy for the sequence,
we will find that we apparently have 200 bits of information.  in
fact, though, we have only 2 bits of information because once we know
the first letter in the sequence, we know all the letters.

a much more sound approach to the complexity of sequences was proposed
in the 60's by various soviet mathematicians, and was popularized for
use with chaotic systems by v.i. arnold in the 70's.  this model
defines the complexity of a finite string to be the size of the
smallest computer program which can be used to recreate the sequence.
in arnold's work, the sequence was the itinerary of a finite map, but
this does not really matter if we are starting with a sequence from a
finite alphabet.

obviously, the upper bound on the complexity of a finite sequence is
the sequence size itself, since we can easily write a program for, say,
a ram machine which builds a list containing the sequence in the same
number of steps as the original sequence is long.

the complexity of one of our 100 character substrings of {abcd}* is
much less, since we can write a very simple program which produces
one of these substrings.

in practice, information content of a complex sequence can be
estimated by building relatively simple models for inter-symbol
dependencies and then measuring the entropy of the errors the model
makes in predicting the next symbol.  this is nearly equivalent to the
approach used by arnold and co, since if you can't build a predictor
at all, then the error stream becomes equivalent to the original
sequence. 
--
ted@nmsu.edu
			Dem Dichter war so wohl daheime
			In Schildas teurem Eichenhain!
			Dort wob ich meine zarten Reime
			Aus Veilchenduft und Mondenschein

cjoslyn@bingvaxu.cc.binghamton.edu (Cliff Joslyn) (10/17/89)

I'd like to thank the contributors (I agree more with the latter).  This
discussion was originally cross-posted to alt.cyb-sys, and this
conversation continues on subjects of key interest to systems scientists
and cyberneticians.

I am taking the liberty of cross-posting these two notes again to
alt.cyb-sys and also to my mailing list, CYBSYS-L@BINGVMB.BITNET.  I'd
like to invite anyone interested here to joing us there.  Blurb follows.

============================================================================

     ANNOUNCING FORMATION OF A MAILING LIST FOR SYSTEMS AND CYBERNETICS
     
     An electronic mailing list dedicated to Systems Science and Cybernetics 
     is currently in operation on the SUNY-Binghamton computer system.  The 
     list is commited to discussing a general understanding of the evolution 
     of complex, multi-level systems like organisms, minds, and societies as 
     informational entities containing possibly circular processes.  Specific 
     subjects include Complex Systems Theory, Self-Organizing Systems Theory, 
     Dynamic Systems Theory, Artificial Intelligence, Network Theory, 
     Semiotics, fractal geometry, Fuzzy Set Theory, Recursive Theory, computer 
     simulation, Information Theory, and more.
     
     The purposes of the list include: 1) facilitating discussion among those 
     working in or just interested in the general fields of Systems and 
     Cybernetics; 2) providing a means of communicating to the general 
     research community about the work that Systems Scientists and 
     Cyberneticians do; 3) housing a repository of electronic files for 
     general distribution concerning Systems and Cybernetics; and 4) providing 
     a central, public directory of working Systems Scientists and 
     Cyberneticians.  The mailing list can store or transmit notes and 
     messages, technical papers, references, calls for papers, computer 
     programs, and pictures and diagrams.
     
     The list is coordinated by members of the Systems Science department of 
     the Watson School at SUNY-Binghamton, and is affiliated with the 
     International Society for the Systems Sciences (ISSS) and the American 
     Society for Cybernetics (ASC).  The list is open to everyone, and we 
     currently have two hundred members from America, Canada, and Europe.  Our 
     subscribers are from both academia and industry, and while many are 
     active researchers, others are just "listening in".  We share in an 
     exciting, ongoing, multi-way conversation about many aspects of Systems 
     and Cybernetics.  Different levels and kinds of knowledge and experience 
     are represented.
     
     We invite all to join the discussion.  To subscribe, you need a computer 
     account with access to one of the international networks (e.g. BITNET, 
     USENET, ARPANET, INTERNET, CSNET).  Send a file containing only the line: 
     'SUB CYBSYS-L Your Full Name' to the list server at the address 
     LISTSERV@BINGVMB.BITNET. 
     
     Once subscribed, please post a message to the list itself at the address 
     CYBSYS-L@BINGVMB.BITNET.  In the message, include your name, affiliation, 
     and a brief description of your work and/or interest in the fields of 
     Systems and Cybernetics. 
     
     List moderator: CYBSYS@BINGVAXU.CC.BINGHAMTON.EDU 
     
     Author: Cliff Joslyn, CJOSLYN@BINGVAXU.CC.BINGHAMTON.EDU
     
  



-- 
O---------------------------------------------------------------------->
| Cliff Joslyn, Cybernetician at Large
| Systems Science, SUNY Binghamton, cjoslyn@bingvaxu.cc.binghamton.edu
V All the world is biscuit shaped. . .

park@usceast.UUCP (Kihong Park) (10/18/89)

In article <TED.89Oct16113856@aigyptos.nmsu.edu> you write:
>
>ivan bach is very enthusiastic, but relatively uninformed about recent
>developments in determining the complexity of finite sequences.
>
I agree with your comments regarding the outdatedness and errors of Ivan's
statements. As to describing the information content of a sequence over a
finite alphabet set, the algorithmic information complexity measure as
developed independently by Kolmogorov, Chaitin, and Solomonov can be fruitfully
applied here.
Basically, they define the complexity of a sequence as the size of the
minimum program which faithfully generates the sequence, if run on some
universal turing machine. Thus, for a DNA sequence to encode maximum 
information, i.e., be maximally patternless, its algorithmic complexity 
must nearly equal the length of the sequence.
As to Ivan's comments about the applicability of entropy to represent the
efficiency of information coding in a network, many workers in the field
have realized that for a given task, a more or less 'adequate' network size
should be strived for. One of the principal motivations for doing this is
the empirical observation that if a network size is 'too large' compared to
the complexity of the task at hand, then it may generate solutions which do
not 'generalize' well, given the opportunity to do so. What 'generalization'
means is a different problem altogether, but what this observation shows us
is that one has to constrain the degree of freedom of a network to entice it
to capture the most compact representation. This constitutes so to say, the
'minimal program'. Thus, the problem at hand is knowing the optimal size of
a network for a given task. I do not see an easy way how entropies can help
us in solving this correspondence problem.

>			Dem Dichter war so wohl daheime
>			In Schildas teurem Eichenhain!
>			Dort wob ich meine zarten Reime
>			Aus Veilchenduft und Mondenschein

It's interesting to notice that some people equate our profession to that
of poets. Is this a revival of romanticism ?

demers@beowulf.ucsd.edu (David E Demers) (10/19/89)

In article <TED.89Oct16113856@aigyptos.nmsu.edu> ted@nmsu.edu (Ted Dunning) writes:

>ivan bach is very enthusiastic, but relatively uninformed about recent
>developments in determining the complexity of finite sequences.

Agreed.

>he also has an annoying tendency to post news that is more than 80
>columns.

Ditto.
>here is a partial commentary on his posting,

>In article <4542@imagen.UUCP> ib@apolling (Ivan N. Bach) writes:
[discussion of Bach's posting + critical commentary deleted]
>----------------------------------------------------------------

>shannon's formulae are simple enough to be misused extensively by all
>kinds of people.  such misuse does not mean that they are any more
>applicable than they were in 1949.  

Despite the misuse, the basic concept is very valuable.

>a much more sound approach to the complexity of sequences was proposed
>in the 60's by various soviet mathematicians, and was popularized for
>use with chaotic systems by v.i. arnold in the 70's.  this model
>defines the complexity of a finite string to be the size of the
>smallest computer program which can be used to recreate the sequence.
>in arnold's work, the sequence was the itinerary of a finite map, but
>this does not really matter if we are starting with a sequence from a
>finite alphabet.

This work goes by the name of "Kolmogorov Complexity", usually.
Solomonoff, Kolmogorov, Chaitin did much of the pioneering work
in the early 60's, from independent reasons - examining the notion
of randomness [Kolmogorov] and inference [Solomonoff].

>obviously, the upper bound on the complexity of a finite sequence is
>the sequence size itself, since we can easily write a program for, say,
>a ram machine which builds a list containing the sequence in the same
>number of steps as the original sequence is long.
>
>the complexity of one of our 100 character substrings of {abcd}* is
>much less, since we can write a very simple program which produces
>one of these substrings.
>
>in practice, information content of a complex sequence can be
>estimated by building relatively simple models for inter-symbol
>dependencies and then measuring the entropy of the errors the model
>makes in predicting the next symbol.  this is nearly equivalent to the
>approach used by arnold and co, since if you can't build a predictor
>at all, then the error stream becomes equivalent to the original
>sequence. 

The above is a very good quick description of the concept.  It
essentially quantifies Occam's Razor in modeling.  Computational
learning theory takes much of these ideas; Valiant, et al make use of
the same principle.  

I apologize for once again posting with the 'F' key before
gathering citations.  Walt Savitch just gave a talk last
week on Kolmogorov Complexity for the Cognitive Science
seminar here at UCSD; I will scare up a summary and some
references and post soon for those interested in pursuing
the idea in more depth.

Dave DeMers				demers@cs.ucsd.edu
Dept. of Computer Science & Engineering demers@inls1.ucsd.edu
UCSD					{other non-Internet mailpaths
La Jolla, CA 92093			according to normal syntax...}

jlr20421@uxa.cso.uiuc.edu (10/20/89)

	Yes, I would like to see more information on this topic--especailly
the bibliographies.

	(jlr20421@uxa.cso.uiuc.edu)

andrew@dtg.nsc.com (Lord Snooty @ The Giant Poisoned Electric Head ) (10/23/89)

I guess the entropy calcs use the p.ln(p) expression, as per Shannon.. but
as per Hofstadter, what about the complexity of the context or "frame"?
What tells you which jukebox will play your record? If you could give
da Vinci a binary dump of a set of Postscript graphics files describing
his drawings, could he work out that they were his? - or would you have
to provide him with the appropriate "jukebox" (PC + OS  + etc)?

Of course, the correct answer is the latter one. So my question is:
have you included the complexity of the decoding apparatus in your entropy
estimates?

-- 
...........................................................................
Andrew Palfreyman		and the 'q' is silent,
andrew@dtg.nsc.com		as in banana			 time sucks

russell@minster.york.ac.uk (10/23/89)

In article <9675@vdsvax.crd.ge.com> thearlin@vdsvax.crd.ge.com writes:
>In article <4522@imagen.UUCP> ib@apolling (Ivan N. Bach) writes:
>>
>>If you are interested in this topic, I can post detailed entropy calculations and 
>>references to relevant books and articles.
>
>
>Please do.  If not, could you send it to me via email?
>
>kurt
>

Me too!

Thanks.

Russell.

____________________________________________________________
 Russell Beale, Advanced Computer Architecture Group,
 Dept. of Computer Science, University of York, Heslington,
 YORK. YO1 5DD.  UK.               Tel: [044] (0904) 432762

 russell@uk.ac.york.minster		JANET connexions
 russell%york.minster@cs.ucl.ac.uk	ARPA connexions
 ..!mcvax!ukc!minster!russell		UUCP connexions
 russell@minster.york.ac.uk		eab mail
____________________________________________________________