aam9n@uvaee.ee.virginia.EDU (Ali Minai) (10/06/89)
I have a question which might or might not make sense. Still, since it is of great interest to me, I shall be grateful if people share their views, suggestions and references on this topic with me. Of course, I shall be particularly grateful if someone pointed out the absurdity of the whole issue. The question is this: Given deterministic data {(X,Y)} where X is in a finite interval of n-d real space and Y is in a finite interval of m-d real space, what *structural* measures (if any) have people suggested for the "complexity" of the data? This immediately raises several other issues: 1) What possible notions of "complexity" might one admit in this regard? 2) What might be the *order* of this complexity? Should it, for example, be defined over the set of all points, all pairs, all triplets etc.? 3) Given that the data is not uniformly distributed over the domain (i.e. it is "lumpy"), what assumptions must be made regarding the blank areas? Should these be statistical? (probably yes). 4) How can such a "complexity" measure be made scale-invariant? etc. Also, what about such complexity measures for continuous functions? I mean measures defined structurally, not according to the type of the function (e.g. degree for polynomials). My gut feeling is that some sort of information-based measure is appropriate, but whatever is used must be efficiently computable. Have other people tried to deal with this problem? Since I am sure they have, where can I find the material? What would be the appropriate area of mathematics to look for such references? I am currently searching approximation theory, statistical modelling, maximum-entropy theory, measure theory, information theory, complexity theory (both system and computational) and non-linear dynamics literature. Obviously, I will miss more than I will find, hence this request. A particularly helpful thing would be names of journals which regularly carry material relevant to this issue. Also, is the International Journal of General Systems still published? If not, what did it turn into? Or rather, who carries articles about "the theory of things", "the calculus of indications" etc? I am extremely interested. Thank you. Ali Minai Dept. of Electrical Engg. Thornton Hall, University of Virginia, Charlottesville, VA 22903. aam9n@uvaee.ee.virginia.edu
pa1159@sdcc13.ucsd.EDU (Matt Kennel) (10/07/89)
In article <517@uvaee.ee.virginia.EDU> aam9n@uvaee.ee.virginia.EDU (Ali Minai) writes: > > >The question is this: > >Given deterministic data {(X,Y)} where X is in a finite interval of >n-d real space and Y is in a finite interval of m-d real space, >what *structural* measures (if any) have people suggested for >the "complexity" of the data? > I don't know what other people have suggested, but I'll spout out a suggestion of my own: crib ideas from the nonlinear dynamics people. For a rough-and-ready measure of the complexity of the input space, why not use fractal dimension? And for the output space, what about the sum of the positive Lyapunov exponents of the nonlinear mapping? >4) How can such a "complexity" measure be made scale-invariant? That's exactly what the fractal dimension does? > >etc. > >Also, what about such complexity measures for continuous functions? >I mean measures defined structurally, not according to the type >of the function (e.g. degree for polynomials). I don't understand what you mean by "defined structurally." >Ali Minai >aam9n@uvaee.ee.virginia.edu Matt Kennel UCSD physics pa1159@sdcc13.ucsd.edu
cybsys@bingvaxu.cc.binghamton.edu (CYBSYS-L Moderator) (10/10/89)
[ Cross-posted from CYBSYS-L@BINGVMB ] Really-From: heirich%cs@ucsd.edu (Alan Heirich) I am also interested in the question of complexity measures raised by Ali Minai. I think it is an important issue for anyone concerned with self organizing systems -- i.e. how can you measure the extent to which a system is organized? Some candidates I've thought of are Shannon entropy, Kolmogorov entropy, "integrality" (a measure defined by I-don't-remember-who, discussed by Brooks in the recent book "Information, entropy and evolution", which I assume readers of this list are familiar with, or should be). I would like to hear about other possibilties. ------------------------- Alan Heirich Comp. Sci. & Eng., Cognitive Science C-014 University of California, San Diego 92093 heirich@cs.ucsd.edu aheirich@ucsd.bitnet
ib@apolling (Ivan N. Bach) (10/12/89)
Many scientists believe that you cannot accept something as a science unless you can measure it in some way. Scientists often prove or disprove their theories by performing experiments in which they measure some key quantities, and then decide whether the results of their measurements are in accordance with the proposed theory. You must also be able to repeat such experiments. When it comes to expert systems and neural nets, many authors tend to make all kinds of extravagant claims that are usually not supported by any objective measurements. For example, they claim that a given expert system has achieved the capability of a human expert. This would, presumably, mean that the expert system is as complex as those components of the human brain that are the material basis for the human expertise, unless the expert system is designed more efficiently than the human brain. When I worked as a consultant at AT&T Bell Laboratories, I did some research on the measuring of the complexity of biological and artificial information systems. I was surprised by how much work has already been done in this area. I came to the conclusion that you can use entropy calculations to determine the MAXIMUM amount of information that can be stored in a particular information system, i.e., its maximum information capacity. For example, I calculated the maximum amount of information that can be stored in a fused human egg cell because that egg cell contains a blueprint for a human being. One way of measuring the complexity of any system is to find out how much information is needed to completely describe a blueprint for that system, and, for example, transmit it to a nearby star. The instructions for building all the proteins in a human body are stored in DNA molecules in the nucleus of the egg cell. The information is encoded by using an alphabet that consists of just four characters A (adenine), C (cytosine), G (guanine), and T (thymine). To calculate the maximum information capacity of a DNA DNA molecule, you have to assume that the distribution of bases is completely random, i.e., that there are no restrictions imposed on the order in which bases can appear in the DNA chain. The distribution of bases in real cells is not random, and, therefore, the actual information capacities of such cells are less than the potential maxima. The maximum capacity of a fused human egg cell is several gigabits. Once the genes of a human being are completely mapped, we will be able to calculate exactly the information capacity of those genes. In the meantime, we can calculate the maximum possible capacity of human genes. That is better than not having an objective measurement, and it has a number of precedents in mathematics. Mathematicians can sometimes determine the limits within the values of a particular function must be, but they may not be able to calculate the actual function values. I also came to the conclusion that the information capacity of an information system depends on the number of internal states that are actually used to store information. For example, you can view a stone as a system with just one internal state (0 bits of information) if you do not use its atoms to store information. You can view an electrical relay that can be closed or open as a system with just two internal states (1 bit of information) if you use the open state to store a 0 and the closed state to store a 1, etc. You can calculate the maximum information capacity of an expert system by taking into account the total number of characters and the number of characters in the alphabet. Researchers have calculated the information capacities of viruses, insects, mammals, etc. It is interesting that the ordering of biological systems by their information capacity corresponds very closely with the usual ordering of biological systems into lower (less complex) and higher (more complex) species. After you calculate the information capacity of different artificial and biological systems, you can present your results in the form of a diagram. relay expert system | | information V V capacity in bits 0 ++--...---+--------+---------------------------------...---+--------------> ^ ^ ^ | | | stone virus human egg cell I came to the conclusion that an artificial system could achieve the same level of intelligence (information processing) as a human being with a smaller number of internal states if its design was more optimal than the design of the human being. Unfortunately, over several billion years nature has used genetic algorithms to produce biological information systems of incredible complexity and optimization. The level of miniaturization used in DNA is about four orders of magnitude greater than the level of miniaturization used in integrated circuits with a 1-micron geometry. British researchers could not figure out how the information for producing all the proteins needed to construct a virus could be encoded in just a couple of thousand bases. The information stored in DNA is read in triplets. They discovered that the code for one protein started at a certain location on the DNA chain. The code for another protein started on the next base, i.e., the codes overlapped. I came to the conclusion that an artificial intelligence system that would be at the same intellectual level as a human being would have to be extremely complex even if its design is highly optimized. We will need a very large number of internal states to implement an artificial subsystem that would imitate the capabilities of, for example, a human eye simply because of the large amount of information that must be processed in real time. I do not believe that such complex systems can be produced by conventional coding of computer systems. I think that we will have to use automatic, self-organizing methods based on genetic algorithms to automatically and gradually develop more and more complex artificial systems until we eventually achieve and surpass the complexity and intellectual capability of human beings. An artificial information system will have a different internal model of the outside world than a human being, and a totally alien set of values, unless such a system has the shape of a human baby and goes through the experiences of adolescence, adulthood, and old age, or we control the information it receives from the outside world in such a way that it gets the impression that it is a human baby and that it is going through the experiences of a human being. I am not saying that we should try to create such an artificial system. I am just saying that an immobile artificial system that does not have the shape of a human body, and that receives information from the outside world through some sensors that have no relationship to human senses, cannot possibly develop an intellect similar to ours. We are more likely to develop an alien race of artifical beings than to develop artificial beings which cannot be distinguished from human beings as described in Isaac Asimov's novels on robots. If you are interested in this topic, I can post detailed entropy calculations and references to relevant books and articles.
thearlin@vdsvax.crd.ge.com (Thearling Kurt H) (10/13/89)
In article <4522@imagen.UUCP> ib@apolling (Ivan N. Bach) writes: > >If you are interested in this topic, I can post detailed entropy calculations and >references to relevant books and articles. Please do. If not, could you send it to me via email? kurt ----------------------------------------------------------------------- Kurt Thearling General Electric CRD Bldg. KW, Room C301 thearlin@vdsvax.crd.ge.com P.O. Box 8 kurt@bach.csg.uiuc.edu Schenectady, NY 12301 -----------------------------------------------------------------------
schiebel@a.cs.wvu.wvnet.edu (Darrell Schiebel) (10/14/89)
In article <4522@imagen.UUCP>, ib@apolling (Ivan N. Bach) writes: > If you are interested in this topic, I can post detailed entropy calculations and > references to relevant books and articles. Yes, I'd very much like to see more about this topic, especially a bibliography. Darrell Schiebel (schiebel@a.cs.wvu.wvnet.edu)
ib@apolling (Ivan N. Bach) (10/16/89)
How can we measure the capacity to store and transmit information? Claude Shannon, founder of the theory of information, suggested that the capacity of a system to store and transmit information can be determined by calculating the degree of randomness or disorganization in the system [1]. He decided to call this measure "entropy," because he realized that his formula: H = -K x sum of (pi x log pi) for all i=1 to i=a (where "H" is the entropy of a system with "a" possible states or events, "K" is a constant, and "pi" is the probability of occurrence of the i-th state or event) was essentially the same as the formula that is used in statistical thermodynamics to calculate the entropy or disorganization of a thermodynamic system. It can be proved mathematically [2] that the entropy of a system will be at its maximum when all system states or events are equally probable and independent, i.e., when the probability of each event in a system with "a" states or events is equal to 1/a, and the probability of an event is independent from the probabilities of all other events. This maximum entropy is equal to log a. When we set the constant "K" to 1, and use base 2 to calculate logarithms, the unit of entropy is called a "bit." One of the best books on the use of Shannon's entropy formula to calculate the information capacity of biological information systems is the book "Information Theory and The Living System" written by Lila Gatlin [3]. If a system has only one state, the probability of occurrence of this state is equal to 1. The entropy of such a system is equal to zero. The entropy of an information system with 2 states will be maximum when the probability of each event is equal to 1/2. This follows from the basic requirement that the sum of all probabilities of all possible events must always be equal to 1. The entropy of such a system is: Hmax = -(p1 x log2 of p1 + p2 x log2 of p2) bits = -(1/2 x log2 of 1/2 + 1/2 x log2 of 1/2) bits = -(log2 of 1/2) bits = log2 of 2 bits (because -log2 of 1/a = log2 of a) = 1 bit Computer programmers usually think of a single-bit variable as a variable which can have either value 0 or value 1. The maximum entropy or information capacity of a single-bit variable is 1 bit. Lila Gatlin showed that Claude Shannon's formula could be used to calculate the entropy of a randomly chosen base in the DNA chain of bases in the nucleus of a living cell. If all four types of bases in a DNA chain are equiprobable, the entropy of a randomly chosen base in the DNA chain is equal to: Hmax = -log2 of 1/4 bits = log2 of 4 bits = 2 bits Computer programmers know that they have to use at least 2 bits to encode 4 different values. The maximum entropy for a randomly chosen symbol from a sequence of symbols selected from an alphabet of "a" symbols (a=4 for DNA) is equal to log2 of "a" bits. If all bases in a DNA chain are not equiprobable, the probability of a certain type of base occurring at a random position in a DNA chain is proportional to the number of bases of its type in the whole chain, i.e., it depends on the composition of DNA. All bases in the DNA of "Escherichia coli" bacteria are equiprobable. Bases in the DNA of "Micrococcus lysodeikticus" appear with the following probabilities: p(A)=p(T)=.145 and p(C)=p(G)=.355 because adenine (A) bonds only to thymine (T), and cytosine (C) bonds only to guaning (G). Therefore, the entropy of a randomly chosen base in the DNA chain of "Micrococcus lysodeikticus" is equal to: H = -(.145 x log2 of .145 + .145 x log2 of .145 + .355 x log2 of .355 + .355 x log2 of .355) bits = 1.87 bits Notice that this value is smaller than the maximum value of 2 bits for equiprobable bases. Lila Gatlin came to the conclusion that the entropy of a complex system is equal to the sum of entropies of its components. The number of base pairs per sex or germ cell ranges from 10,000 for bacteria to over 1,000,000,000 for mammals. If we assume equiprobable bases, the maximum total entropy of the DNA chain in a human sex cell is about 5.8 gigabits (2.9 billion nucleotide pairs x 2 bits per nucleotide pair). When a sperm cell fuses with an egg cell, the maximum entropy of the DNA chain in the nucleus of the fertilized cell is equal to 11.6 gigabits. Notice that the DNA in a fertilized egg cell contains just a "blueprint" for a human body. Dancoff and Quastler [4] [5] tried to estimate very roughly the entropy of an adult human body by calculating the amount of information required to specify the type, position, and orientation of atoms in the body. There are probably about 60 different types of atoms in the human body. Assuming 64 different atoms which appear with equal frequency, the maximum entropy of a randomly selected atom is equal to 6 bits. However, not all types of atoms occur with the same frequency. Only three types of atoms, carbon, hydrogen and oxygen, are present in significant amounts. Their approximate relative frequencies in the human body are 0.66, 0.22 and 0.07, respectively. The frequency of all other types of atoms is about 0.05. Therefore, the actual entropy of a human body is about 1.4 bits per atom. About 23 more bits are needed to specify proper orientation of each atom. That brings the total entropy per atom to about 24.4 bits. It is not necessary to specify that many bits per each atom, because more than 90% of body's weight consists of water and other comparatively featureless substances. Therefore, the total entropy of a human body is: H = 24.4 bits x 10% of 7x10 to the power of 27 (the total number of atoms in the body) = 2x10 to the power of 28 bits (or 10 to the power of 24 pages of English text) Similar astronomical figures were calculated in different ways by Linschitz [6] and Morowitz [7]. What I find fascinating is that the information needed to specify the blueprint for a human body is so efficiently compressed that it can be placed into the nucleus of each of about 50 trillion cells in that body. Each sex cell contains only half a blueprint. Each non-sex cell contains all the information needed to build every type of protein in a human body. Which types of proteins are actually built depends on the position of biological switches. Several valuable lessons relevant to the design of extremely complex neural networks can be deduced from the studies of biological systems. 1. The storage of a neural network is usually measured in the number of interconnects. If you can calculate the number of different states of a neural network, you can calculate its maximum information capacity by assuming that all states are equiprobable. If the probability of occurrence of each state is either known or can be calculated, you can calculate the actual information capacity of your network. Once you can calculate the capacity of different networks, you can objectively compare them, and determine which network represents the most efficient way of storing information. In Fig. 2.13 on page 33 of the DARPA Neural Network Study [27], you can see a comparison of the storage and speed of a leech, worm, fly, etc. It is estimated that a human brain has the storage capacity of 10 to the power of 14 interconnects, and the speed of 10 to the power of 16 interconnects per second. 2. Many researchers are trying to find analytical methods for designing optimal neural networks. I suggest that we use genetic algorithms to design extremely complex and optimized neural networks automatically. Genetic algorithms require an objective function or a goal. We can specify the minimum number of neural nodes or the minimum number of layers in a neural network as our goal. We can then use a random process to generate generations of neural nets. We can use the objective function to evaluate members of each generation, and discard those members which do not achieve a good enough result with respect to the objective function. We can then cross the most successful members in each generation until we get an optimal network. Research has shown that application of a priori knowledge of which areas of the domain of a complex objective function should be explored usually results in suboptimal results, i.e., in the finding of a local optimum. Genetic algorithms based on random processes, thoroughly but not exhaustively explore all areas of the objetive function's domain, and usually result in the finding of an overall or global optimum [7][8]. I have come to the conclusion that very intelligent systems must be very complex. We cannot use conventional computer programming methods to specify each detail of an extremely complex information system. If we restrict ourselves to only analytical methods of constructing neural networks, we will not be able to construct networks that will be complex enough to have the information capacity equal to or greater than the capacity of a human brain. 3. The methods used to construct human neural networks are totally different from the methods which are now used to construct artificial neural networks and design complex VLSI circuits. The technology of biological systems can be viewed as an extremely advanced and alien technology. To not take advantage of this technology would be like coming across a UFO or a planet with very advanced technology, and passing up the opportunity to study and apply that technology. The Turing machine is an abstract automaton with a head that is used to read instructions from an abstract tape. Most computer scientists think that the Turing machine is used to prove theorems in the theory of automata, and that such a mechanism should not be implemented because there is no practical use for it. If you compare the diagram of a Turing machine in a textbook on the theory of automata with the diagram of a ribosome in a textbook on genetics or molecular biology, you will notice remarkable similarities. A ribosome in the cytoplam of a cell has a head which is used to read the instructions for the building of proteins which are stored on a "tape" called messenger RNA. Since ribosomes (the work stations for the manufacturing of proteins) are located outside the nucleus of a cell, and the DNA molecules are safely stored in the nucleus, messenger RNA molecules are needed to copy the instructions for building proteins from DNA molecules, and carry them to ribosomes. I would not be surprised if our colleagues who are reading the "sci.nanotech" newsgroup expressed an interest in such a mechanism. There must be a number of other ingenious mechanisms in our cells that are just waiting to be discovered. 4. Instead of constructing a full-blown neural network, we could specify a "blueprint" for contructing that network. I think that the amount of information in the blueprint can always be smaller than the amount of information in the network if there is any redundancy or repetition in the structure of the network. If a very complex network is used to store large amounts of similar types of information, whole sections of that network should have identical or very similar structure. There is no reason why we should use a different structure each time we want to store a data item of a certain type. If whole areas of the network are going to have the same structure, we could specify in the blueprint that a certain number of subnetworks of a given type is needed in a particular location, and we may not care how each detail of each subnetwork is going to be constructed. I suspect that Mother Nature had coded recursive subroutine calls into our DNA molecules long time before computer programmers rediscovered such calls. 5. If you use a certain "alphabet" of components to construct your neural net, you should use each type of component with the same frequency if you want achieve the maximum information capacity. Main references for the artificial evolution and genetic algorithms seem to be: A. Turing (1936-1952) [8] Turing machines, the chemical basis of morphogenesis. J. von Neumann (1952-53) [9] Reproduction and evolution of automata. L. Fogel, A. Owens, and M. Walsh (1966) [10] Artificial evolution through simulated evolution. J. Bagley (1967) [11] Behavior of adaptive systems. R. Rosenberg (1967) [12] Relationship between genotype and phenotype. R. Weinberger (1970) [13] Computer simulation of a living cell. E. Goodman (1972) [14] Adaptive behavior of simulated bacterial cells. N. Foo and J. Bosworth (1972) [15] Various aspects of genetic operators. D. Frantz (1972) [16] Non-linearities in genetic adaptive research. J. Holland (1962, 1975 and 1976) [17][18][19] Theory of adaptive systems and spontaneous emergence of self-reproducing automata. K. DeJong, A. Brindle, and A. Bethke (1975-81) [20] Genetic algorithms and function optimization. S. Smith (1980) [21] Learning systems based on genetic algorithms. J. Grefenstette (1983) [22] Optimization of genetic search algorithms. M. Mauldin (1984) [23] Maintaining diversity in genetic search. A. Goldberg and I. Pohl (1984) [24] Complexity theory and AI research. D. Goldberg (1985, 1989) [25][26] Application of genetic algorithms. REFERENCES 1. Shannon, C. E., "A Mathematical Theory of Communication," "Bell System Technical Journal," Vol. 27, pp. 379-423 and 623-656. (Reprinted in C. E. Shannon and W. Weaver, "The Mathematical Theory of Communication," pp. 3-91, U. of Illinois Press, Urbana, 1949.) 2. Khinchin, A. I., "Mathematical Foundations of Information Theory," Dover, New York, 1957. 3. Gatlin, Lila, "Information Theory and The Living System," Columbia University Press, New York, 1972. 4. Dancoff, S. M. and H. Quastler, "The Information Content and Error Rate of Living Things," in "Information Theory in Biology" edited by H. Quastler, U. of Illinois Press, Urbana, 1953. 5. Calow, Peter, "Biological Machines: A Cybernetic Approach to Life," Edward Arnold Publishing Co., London, England, 1976. 6. Linschitz, H., "Information Content of a Bacterial Cell" in "Information Theory in Biology," edited by H. Quastler, U. of Illinois Press, Urbana, 1953. 7. Morowitz, H. J., "Bull. Maths. Biophys.," Vol. 17, 1955, pp. 81-86. 8. Turing, A. M., "The Chemical Basis of Morphogenesis," "Phil. Trans. Roy. Soc.," ser. B, No. 237, London, 1952. 9. Von Neumann, John, a lecture at the U. of Illinois published in: Arthur W. Burks, ed., "Theory of Self-Reproducing Automata," U. of Illinois Press, Urbana, 1966. 10. Fogel, L. Owens et al., "Artificial Intelligence Through Simulated Evolution," John Wiley and Sons, Inc., New York, 1966. 11. Bagley, J. D., "The Behavior of Adaptive Systems Which Employ Genetic and Correlation Algorithms," Ph.D. thesis, U. of Michigan, Ann Arbor, 1967. 12. Rosenberg, R. S., "Simulation of Genetic Populations with Biochemical Properties," Ph.D. thesis, U. of Michigan, Ann Arbor, 1967. 13. Weinberger, R., "Computer Simulation of a Living Cell," Ph.D. thesis, U. of Michigan, Ann Arbor, 1970. 14. Goodman, E. D., "Adaptive Behavior of Simulated Bacterial Cells Subjected to Nutritional Shifts," Ph.D. thesis, U. of Michigan, Ann Arbor, 1972. 15. Foo, N. Y. and J. L. Bosworth, "Algebraic, geometric, and stochastic aspects of genetic operators," Tech. Rep. 003120-2-T, U. of Michigan, Ann Arbor, 1972. 16. Frantz, D. R., "Non-linearities in Genetic Adaptive Search," Ph.D. thesis, U. of Michigan, Ann Arbor, 1972. l7. Holland, J. H. "Concerning Efficient Adaptive Systems," pp. 215-230 of "Self- Organizing Systems," edited by M. C. Yovits, G. T. Jacobi, and G. J. Goldstein, Spartan Books, Washington, D.C., 1962. 18. Holland, J. H., "Outline for a Logical Theory of Adaptive Systems," Journal of the Assoc. for Computing Machinery, Vol. 9, No. 3, July 1962, pp. 297-314. 19. Holland, J. H., "Studies of the Spontaneous Emergence of Self-Replicating Systems Using Cellular Automata and Formal Grammars," pp. 385-404 from "Automata, Languages, Development," edited by Aristid Lindenmayer and Grzegorz Rosenberg, North-Holland Publishing Company, Amsterdam, 1976. 20. DeJong, K. A., "Analysis of the Behavior of a Class of Genetic Adaptive Systems," Ph.D. thesis, Dept. of Computer and Communications Sciences, U. of Michigan, Ann Arbor, 1975. 21. Smith, S., "A Learning System Based on Genetic Adaptive Algorithms," Ph.D. thesis, U. of Pittsburgh, Pa., 1980. 22. Grefenstette, J. G., "Optimization of Genetic Search Algorithms," Tech. Rep. CS-83-14, Vanderbilt U., 1983. 23. Mauldin, L. Michael, "Maintaining Diversity in Genetic Search," Proc. of the 1984 National Conference on Artificial Intelligence, pp. 247-250. 24. Goldberg, Allen and Ira Pohl, "Is Complexity Theory of Use to AI?," pp. 43-55 from "Artificial Human Intelligence," edited by A. Elithorn and R. Banerji, Elsevier Science Publishers, Amsterdam, 1984. 25. Goldberg, David, "Dynamic System Control Using Rule Learning and Genetic Algorithms," Proc., of the 1985 National Conference on AI, pp. 588-592. 26. Goldberg, E. David, "Genetic Algorithms In Search, Optimization and Machine Learning," Addison-Wesley Publishing Company, Inc., Reading, 1989. 27. "DARPA Neural Network Study," AFCEA International Press, Fairfax, 1988.
ted@nmsu.edu (Ted Dunning) (10/17/89)
ivan bach is very enthusiastic, but relatively uninformed about recent
developments in determining the complexity of finite sequences.
he also has an annoying tendency to post news that is more than 80
columns.
here is a partial commentary on his posting,
In article <4542@imagen.UUCP> ib@apolling (Ivan N. Bach) writes:
Path: opus!lanl!cmcl2!nrl-cmf!think!ames!amdcad!sun!imagen!daemon
From: ib@apolling (Ivan N. Bach)
Claude Shannon, founder of the theory of information, ... He
decided to call this measure "entropy," because he realized that
his formula:
H = -K x sum of (pi x log pi) for all i=1 to i=a
shannon actually qualified this measure extensively. it is only valid
if successive symbols are independent, and only valid for stationary
ergodic sequences. none of these qualifications apply in most
biological systems.
Lila Gatlin showed that Claude Shannon's formula could be used to
calculate the entropy of a randomly chosen base in the DNA chain of
bases in the nucleus of a living cell. If all four types of bases
in a DNA chain are equiprobable, the entropy of a randomly chosen
base in the DNA chain is equal to: ... = 2 bits
actuallly lila gatlin showed that you can misuse shannon's formula to
get an upper bound on the amount of information in a dna chain.
Computer programmers know that they have to use at least 2 bits to
Cencode 4 different values.
real computer programmers know that this is not true.
If all bases in a DNA chain are not equiprobable, the probability
of a certain type of base occurring at a random position in a DNA
chain is proportional to the number of bases of its type in the
... Therefore, the entropy of a randomly chosen base in the
DNA chain of "Micrococcus lysodeikticus" is equal to:
= 1.87 bits
this is entirely wrong. see below.
Lila Gatlin came to the conclusion that the entropy of a complex
system is equal to the sum of entropies of its components.
lila is again mistaken, unless what actually was said was that the
entropy of a system composed of _independent_ parts is the sum of the
entropies of its components. of course _complex_ systems generally do
not allow such naive decomposition.
[ arithmetic maunderings about bits and blueprints deleted ]
[ fuzzy comparison of a turing machine to protein synthesis ]
4. Instead of constructing a full-blown neural network, we could
specify a "blueprint" for contructing that network. I think that
the amount of information in the blueprint can always be smaller
than the amount of information in the network if there is any
redundancy or repetition in the structure of the network.
actually the blueprint must be less than or equal to the network.
otherwise, we would just use the network as the blueprint. see below
for a reasonable definition of information content.
5. If you use a certain "alphaneural net, you should use each type
of component with the same frequency if you want achieve the
maximum information capacity.
of course maximum information _content_ (or capacity) would also be
obtained by making all components completely independent. this is the
same as recommending that people use RAM for their neural net. you
_don't_ want a system with maximum content. you want a system that
encodes and carries meaning. these are very different things.
----------------------------------------------------------------
shannon's formulae are simple enough to be misused extensively by all
kinds of people. such misuse does not mean that they are any more
applicable than they were in 1949. in particular, if you compute the
entropy of english text, you over-estimate the information content by
a factor of 4 or more. for instance, this posting has an entropy of
4.6 bits per letter, while english is generally accepted to have an
information content of 1 bit per letter or less. the difference is
entirely due to the assumption of independence between letters.
an even more dramatic example is a sequence formed by the regular
expression {abcd}*. if we take any 100 characters from a random
location in such a sequence, and compute the entropy for the sequence,
we will find that we apparently have 200 bits of information. in
fact, though, we have only 2 bits of information because once we know
the first letter in the sequence, we know all the letters.
a much more sound approach to the complexity of sequences was proposed
in the 60's by various soviet mathematicians, and was popularized for
use with chaotic systems by v.i. arnold in the 70's. this model
defines the complexity of a finite string to be the size of the
smallest computer program which can be used to recreate the sequence.
in arnold's work, the sequence was the itinerary of a finite map, but
this does not really matter if we are starting with a sequence from a
finite alphabet.
obviously, the upper bound on the complexity of a finite sequence is
the sequence size itself, since we can easily write a program for, say,
a ram machine which builds a list containing the sequence in the same
number of steps as the original sequence is long.
the complexity of one of our 100 character substrings of {abcd}* is
much less, since we can write a very simple program which produces
one of these substrings.
in practice, information content of a complex sequence can be
estimated by building relatively simple models for inter-symbol
dependencies and then measuring the entropy of the errors the model
makes in predicting the next symbol. this is nearly equivalent to the
approach used by arnold and co, since if you can't build a predictor
at all, then the error stream becomes equivalent to the original
sequence.
--
ted@nmsu.edu
Dem Dichter war so wohl daheime
In Schildas teurem Eichenhain!
Dort wob ich meine zarten Reime
Aus Veilchenduft und Mondenschein
cjoslyn@bingvaxu.cc.binghamton.edu (Cliff Joslyn) (10/17/89)
I'd like to thank the contributors (I agree more with the latter). This discussion was originally cross-posted to alt.cyb-sys, and this conversation continues on subjects of key interest to systems scientists and cyberneticians. I am taking the liberty of cross-posting these two notes again to alt.cyb-sys and also to my mailing list, CYBSYS-L@BINGVMB.BITNET. I'd like to invite anyone interested here to joing us there. Blurb follows. ============================================================================ ANNOUNCING FORMATION OF A MAILING LIST FOR SYSTEMS AND CYBERNETICS An electronic mailing list dedicated to Systems Science and Cybernetics is currently in operation on the SUNY-Binghamton computer system. The list is commited to discussing a general understanding of the evolution of complex, multi-level systems like organisms, minds, and societies as informational entities containing possibly circular processes. Specific subjects include Complex Systems Theory, Self-Organizing Systems Theory, Dynamic Systems Theory, Artificial Intelligence, Network Theory, Semiotics, fractal geometry, Fuzzy Set Theory, Recursive Theory, computer simulation, Information Theory, and more. The purposes of the list include: 1) facilitating discussion among those working in or just interested in the general fields of Systems and Cybernetics; 2) providing a means of communicating to the general research community about the work that Systems Scientists and Cyberneticians do; 3) housing a repository of electronic files for general distribution concerning Systems and Cybernetics; and 4) providing a central, public directory of working Systems Scientists and Cyberneticians. The mailing list can store or transmit notes and messages, technical papers, references, calls for papers, computer programs, and pictures and diagrams. The list is coordinated by members of the Systems Science department of the Watson School at SUNY-Binghamton, and is affiliated with the International Society for the Systems Sciences (ISSS) and the American Society for Cybernetics (ASC). The list is open to everyone, and we currently have two hundred members from America, Canada, and Europe. Our subscribers are from both academia and industry, and while many are active researchers, others are just "listening in". We share in an exciting, ongoing, multi-way conversation about many aspects of Systems and Cybernetics. Different levels and kinds of knowledge and experience are represented. We invite all to join the discussion. To subscribe, you need a computer account with access to one of the international networks (e.g. BITNET, USENET, ARPANET, INTERNET, CSNET). Send a file containing only the line: 'SUB CYBSYS-L Your Full Name' to the list server at the address LISTSERV@BINGVMB.BITNET. Once subscribed, please post a message to the list itself at the address CYBSYS-L@BINGVMB.BITNET. In the message, include your name, affiliation, and a brief description of your work and/or interest in the fields of Systems and Cybernetics. List moderator: CYBSYS@BINGVAXU.CC.BINGHAMTON.EDU Author: Cliff Joslyn, CJOSLYN@BINGVAXU.CC.BINGHAMTON.EDU -- O----------------------------------------------------------------------> | Cliff Joslyn, Cybernetician at Large | Systems Science, SUNY Binghamton, cjoslyn@bingvaxu.cc.binghamton.edu V All the world is biscuit shaped. . .
park@usceast.UUCP (Kihong Park) (10/18/89)
In article <TED.89Oct16113856@aigyptos.nmsu.edu> you write: > >ivan bach is very enthusiastic, but relatively uninformed about recent >developments in determining the complexity of finite sequences. > I agree with your comments regarding the outdatedness and errors of Ivan's statements. As to describing the information content of a sequence over a finite alphabet set, the algorithmic information complexity measure as developed independently by Kolmogorov, Chaitin, and Solomonov can be fruitfully applied here. Basically, they define the complexity of a sequence as the size of the minimum program which faithfully generates the sequence, if run on some universal turing machine. Thus, for a DNA sequence to encode maximum information, i.e., be maximally patternless, its algorithmic complexity must nearly equal the length of the sequence. As to Ivan's comments about the applicability of entropy to represent the efficiency of information coding in a network, many workers in the field have realized that for a given task, a more or less 'adequate' network size should be strived for. One of the principal motivations for doing this is the empirical observation that if a network size is 'too large' compared to the complexity of the task at hand, then it may generate solutions which do not 'generalize' well, given the opportunity to do so. What 'generalization' means is a different problem altogether, but what this observation shows us is that one has to constrain the degree of freedom of a network to entice it to capture the most compact representation. This constitutes so to say, the 'minimal program'. Thus, the problem at hand is knowing the optimal size of a network for a given task. I do not see an easy way how entropies can help us in solving this correspondence problem. > Dem Dichter war so wohl daheime > In Schildas teurem Eichenhain! > Dort wob ich meine zarten Reime > Aus Veilchenduft und Mondenschein It's interesting to notice that some people equate our profession to that of poets. Is this a revival of romanticism ?
demers@beowulf.ucsd.edu (David E Demers) (10/19/89)
In article <TED.89Oct16113856@aigyptos.nmsu.edu> ted@nmsu.edu (Ted Dunning) writes: >ivan bach is very enthusiastic, but relatively uninformed about recent >developments in determining the complexity of finite sequences. Agreed. >he also has an annoying tendency to post news that is more than 80 >columns. Ditto. >here is a partial commentary on his posting, >In article <4542@imagen.UUCP> ib@apolling (Ivan N. Bach) writes: [discussion of Bach's posting + critical commentary deleted] >---------------------------------------------------------------- >shannon's formulae are simple enough to be misused extensively by all >kinds of people. such misuse does not mean that they are any more >applicable than they were in 1949. Despite the misuse, the basic concept is very valuable. >a much more sound approach to the complexity of sequences was proposed >in the 60's by various soviet mathematicians, and was popularized for >use with chaotic systems by v.i. arnold in the 70's. this model >defines the complexity of a finite string to be the size of the >smallest computer program which can be used to recreate the sequence. >in arnold's work, the sequence was the itinerary of a finite map, but >this does not really matter if we are starting with a sequence from a >finite alphabet. This work goes by the name of "Kolmogorov Complexity", usually. Solomonoff, Kolmogorov, Chaitin did much of the pioneering work in the early 60's, from independent reasons - examining the notion of randomness [Kolmogorov] and inference [Solomonoff]. >obviously, the upper bound on the complexity of a finite sequence is >the sequence size itself, since we can easily write a program for, say, >a ram machine which builds a list containing the sequence in the same >number of steps as the original sequence is long. > >the complexity of one of our 100 character substrings of {abcd}* is >much less, since we can write a very simple program which produces >one of these substrings. > >in practice, information content of a complex sequence can be >estimated by building relatively simple models for inter-symbol >dependencies and then measuring the entropy of the errors the model >makes in predicting the next symbol. this is nearly equivalent to the >approach used by arnold and co, since if you can't build a predictor >at all, then the error stream becomes equivalent to the original >sequence. The above is a very good quick description of the concept. It essentially quantifies Occam's Razor in modeling. Computational learning theory takes much of these ideas; Valiant, et al make use of the same principle. I apologize for once again posting with the 'F' key before gathering citations. Walt Savitch just gave a talk last week on Kolmogorov Complexity for the Cognitive Science seminar here at UCSD; I will scare up a summary and some references and post soon for those interested in pursuing the idea in more depth. Dave DeMers demers@cs.ucsd.edu Dept. of Computer Science & Engineering demers@inls1.ucsd.edu UCSD {other non-Internet mailpaths La Jolla, CA 92093 according to normal syntax...}
jlr20421@uxa.cso.uiuc.edu (10/20/89)
Yes, I would like to see more information on this topic--especailly the bibliographies. (jlr20421@uxa.cso.uiuc.edu)
andrew@dtg.nsc.com (Lord Snooty @ The Giant Poisoned Electric Head ) (10/23/89)
I guess the entropy calcs use the p.ln(p) expression, as per Shannon.. but as per Hofstadter, what about the complexity of the context or "frame"? What tells you which jukebox will play your record? If you could give da Vinci a binary dump of a set of Postscript graphics files describing his drawings, could he work out that they were his? - or would you have to provide him with the appropriate "jukebox" (PC + OS + etc)? Of course, the correct answer is the latter one. So my question is: have you included the complexity of the decoding apparatus in your entropy estimates? -- ........................................................................... Andrew Palfreyman and the 'q' is silent, andrew@dtg.nsc.com as in banana time sucks
russell@minster.york.ac.uk (10/23/89)
In article <9675@vdsvax.crd.ge.com> thearlin@vdsvax.crd.ge.com writes: >In article <4522@imagen.UUCP> ib@apolling (Ivan N. Bach) writes: >> >>If you are interested in this topic, I can post detailed entropy calculations and >>references to relevant books and articles. > > >Please do. If not, could you send it to me via email? > >kurt > Me too! Thanks. Russell. ____________________________________________________________ Russell Beale, Advanced Computer Architecture Group, Dept. of Computer Science, University of York, Heslington, YORK. YO1 5DD. UK. Tel: [044] (0904) 432762 russell@uk.ac.york.minster JANET connexions russell%york.minster@cs.ucl.ac.uk ARPA connexions ..!mcvax!ukc!minster!russell UUCP connexions russell@minster.york.ac.uk eab mail ____________________________________________________________