[comp.ai.neural-nets] Infinite alphabets

bayers@kodak.UUCP (mitch bayersdorfer) (01/01/70)

In article <417@russell.STANFORD.EDU> nakashim@russell.UUCP (Hideyuki Nakashima) writes:
>
>The comparison between Europian alphabetical system and Chinese
>character system is very interesting.  Chinese can DO have infinite
>characters.  What they do is to assign one character to each concept.
>Of course, they don't have infinite characters now.  But I say it is
>possible to have them if they want.
>
>Hideyuki Nakashima
>CSLI and ETL
>nakashima@csli.stanford.edu (until Aug. 1988)
>nakashima%etl.jp@relay.cs.net (afterwards)

I don't mean to make a reductio ad adsurdum, but can any alphabet which
a given human can perceive be truely infinite?  Given that human beings have 
a finite (but very large) number of neurons in their visual and cerebral
cortex, and that any distinct alphabetic character would exceed the 
thresholds of an enumerable permutation of those neurons, mustn't there
be only a finite number of characters (and concepts?).  I am assuming that 
neural thresholds for a certain learned concept are relatively constant.  If
another concept produces the same permutation, but only to a greater 
degree, then couldn't that be reduced down to two concepts-- one a measure 
of the concept, and the other a measure of its degree?  

               - Mitch Bayersdorfer

...{}!rochester!kodak!bayers

"The above does not represent the views of the Eastman Kodak Company or
its management."

berke@CS.UCLA.EDU (10/13/87)

I thought you might find Turing's words on the subject interesting.
These are from his 1936/7 article "On Computable Numbers, with
an application to the Entscheidungsproblem."  I think they are 
important.  I also quote them in my article "That Does Not Compute"
from ICNN '87, distinguishing brains from computers from neural
networks, an expanded version of which I have submitted to Computer 
Magazine. Turing:

"If we regard a symbol as literally printed on a square... The symbol
is defined as a set of points in theis square, viz. the set occupied
by the printer's ink.  If these sets are restricted to be measurable,
we can define the "distance" between two symbols as the cost of 
transforming one symbol into the other if the cost of moving unit area 
of printer's ink unit distance is unity..."

He later goes on...

"I shall also suppose that the number of symbols which may be printed
is finite.  If we were to allow an infinity of symbols, then there would
be symbols differing to an arbitrarily small extent.  The effect of this
restriction of the number of symbols is not very serious.  It is always 
possible to use sequences of symbols in the place of single symbols.
Thus an Arabic numeral such as 17 or 999999999999 is normally treated
as a single symbol.

"Similarly, in any European language words are treated as single symbols
(Chinese however, attempts to have an enumerable infinity of symbols).
The differences from our point of view between the single and compound
symbols is that the compound symbols, if they are too lengthy, cannot
be observed at one glance.  This is in accord with experience.  We
cannot tell at a glance whether 999999999999999 and 9999999999999999
are the same."


These passages led me to suggest that what we want to do with neural 
networks is to mechanize perception, not simply to build faster computers.
(Though that is a desirable goal in its own right.)
Please note that it is Turing claiming Chinese "attempts" to have an
enumerable infinity of symbols.  I am ignorant of Chinese.  At some point,
with an increasing number of symbols, all members of an alphabet cannot be
recognized at-a-glance.  Or, looking at it another way, if you presume
an infinity of symbols upon which you are operating, you are not computing.

This is also the intuitive reason I claim ambiguity resolution 
is not computable.  I think of ambiguity resolution as the
general case of "object recognition" and "problem formulation." 
Imagine a vague scene or situation not already described by a finite
enumeration of the objects in it and their relationships, etc.
Such a vague scene is equivalent, in Turing's sense, to supposing
an infinite number of things in the scene, which you are going to reduce
to a finite number of options from which to choose.  Since you are
supposing an infinite alphabet (or the equivalent "vague" scene) technically,
you are not computing.  

By Turing's above remarks, I think by his definition, Chinese cannot
succeed at having an enumerable infinity of symbols.  It can only
"attempt" to have them.  Unless you "go down a level" and consider
"things that make up" Chinese symbols "the symbols."  Then, there
must be a finite number of them.  Does anyone know if this is true
about Chinese?  It would seem that even in English it does not apply to
orthography, though it apparently does to letters, 
since we use in hand-writing no fixed "alphabet" of strokes, etc.  Well, 
I meant only to introduce Turing's words, forgive me for going on.

Regards,

Peter Berke

nakashim@russell.STANFORD.EDU (Hideyuki Nakashima) (10/13/87)

In article <8583@shemp.UCLA.EDU> berke@CS.UCLA.EDU (Peter Berke) writes:
>
>By Turing's above remarks, I think by his definition, Chinese cannot
>succeed at having an enumerable infinity of symbols.  It can only
>"attempt" to have them.  Unless you "go down a level" and consider
>"things that make up" Chinese symbols "the symbols."  Then, there
>must be a finite number of them.  Does anyone know if this is true
>about Chinese?  It would seem that even in English it does not apply to
>orthography, though it apparently does to letters, 
>since we use in hand-writing no fixed "alphabet" of strokes, etc.  Well, 
>I meant only to introduce Turing's words, forgive me for going on.
>

The comparison between Europian alphabetical system and Chinese
character system is very interesting.  Chinese can DO have infinite
characters.  What they do is to assign one character to each concept.
Of course, they don't have infinite characters now.  But I say it is
possible to have them if they want.

One chinese character usually consists of smaller constituents.  For
example, a character for "maple tree" consists of two parts: one
designating "tree" and the other designating "wind".  The character
for "forest" consists of three "tree"s.

The character for "tree" comes from a pictorial representation of a
real tree.  It is like this:

                     |
                 ----+----
                    /|\
                   / | \
                  /  |  \

So are characters for "sun", "house", "river" and so on.  By this way,
you can just invent a new character for a concept if it is a basic
one.  Otherwise you can combine several characters to define a new
one.

I think that Europian way of thinking is analytic while Eastern is
holistic.  Having alphabets to compose words vs. having characters for
each concepts, is one of the examples.  I further think that digital
computer follows Europian way.  I want to come up with Eastern
equivalent of digital computer (not an analog one, though).
Connectionism is one of the possibilities.

-- 
Hideyuki Nakashima
CSLI and ETL
nakashima@csli.stanford.edu (until Aug. 1988)
nakashima%etl.jp@relay.cs.net (afterwards)

david@linc.cis.upenn.edu (David Feldman) (10/14/87)

In <8583@shemp.UCLA.EDU>, Peter asks:

>By Turing's above remarks, I think by his definition, Chinese cannot
>succeed at having an enumerable infinity of symbols.  It can only
>"attempt" to have them.  Unless you "go down a level" and consider
>"things that make up" Chinese symbols "the symbols."  Then, there
>must be a finite number of them.  Does anyone know if this is true
>about Chinese?  It would seem that even in English it does not apply to
>
>Peter Berke

Chinese characters are by and large composed of 'radicals' which have base
meanings.  These meanings are not necessarily related to the meanings of the
characters that are composed of the radicals, but they sometimes do.  For
instance, the question particle 'ma' has the 'ko' radical which is associated
with the mouth.  And indeed, there is a finite number of radicals.  I have
a couple of sheets describing them.  Anything component of a character that is
not a radical can be classified as a stroke, and there are specific kinds of
strokes also.

					Dave Feldman
					david@linc.cis.upenn.edu

zhu@ogcvax.UUCP (Jianhua Zhu) (10/15/87)

In article <417@russell.STANFORD.EDU>, nakashim@russell.STANFORD.EDU (Hideyuki Nakashima) writes:
> In article <8583@shemp.UCLA.EDU> berke@CS.UCLA.EDU (Peter Berke) writes:
> >
> >By Turing's above remarks, I think by his definition, Chinese cannot
> >succeed at having an enumerable infinity of symbols.  It can only
> >"attempt" to have them.  Unless you "go down a level" and consider
> >"things that make up" Chinese symbols "the symbols."  Then, there
> >must be a finite number of them.  Does anyone know if this is true
> >about Chinese?  It would seem that even in English it does not apply to
> >orthography, though it apparently does to letters, 
> >since we use in hand-writing no fixed "alphabet" of strokes, etc.  Well, 
> >I meant only to introduce Turing's words, forgive me for going on.
> >
> 
> The comparison between Europian alphabetical system and Chinese
> character system is very interesting.  Chinese can DO have infinite
> characters.  What they do is to assign one character to each concept.
> Of course, they don't have infinite characters now.  But I say it is
> possible to have them if they want.
> 
 ...

> 
> each concepts, is one of the examples.  I further think that digital
> computer follows Europian way.  I want to come up with Eastern
> equivalent of digital computer (not an analog one, though).
> Connectionism is one of the possibilities.
> 

It looks like a symbol is defined as a finite set of discrete points in
a *finitely bounded* square (otherwise, the number of symbols would be
infinite). According to this definition, I cannot image how Chinese can
have infinite symbols. On the other hand, if one is allowed to use certain
operators operating on symbols to introduce new symbols, then the number of
symbols in any language can potentially be infinite.

As far as written languages are concerned, the main difference between Chinese
and English (or any Europian language) is that English words are composed
of letters from an finite alphabet in a one-dimensional manner (namely,
concatenation), whereas Chinese words are composed of *strokes* from a finite
stroke set as two-dimentional pictures. (YES the stroke set is VERY finite,
in fact not that bigger than English alphabet.)

Although we do have one more degree of freedom by which we can build words
from elements of one level down, we wouldn't want to go too far in this
other direction (we still want to have neatly printed document in our
language). As a matter of fact, currently nobody is inventing new words
by piling up existing words any more, and new words are added, as new
concepts come along, almost exclusively in a form of *word combinations*,
whose English analogy would be hyphenated words.

Yes I quite agree with Mr. Nakashima in that digital computers follow
Europian way. But I don't see how Connectionism in particular can lead
to digital computers (or equivalents) of Eastern way (I do know that
Connectionism architecture can do extremely well for tasks such as
pattern recognition), and would be delighted if someone could provide
further information.

-- 
 /__|__  _/-\_ |  /__|__   *   Jianhua Zhu  *** UUCP: ...!tektronix!ogcvax!zhu
"___|__,  --- || /| ,|_/  *   CSE Dept - OGC   *** CSNet: zhu@oregon-grad
  ,"|",   \|/ || ---+--- * 19600NW von Neumann Dr ***
 / \|  \ /--- \|    |   *    Beaverton, OR 97006     ***

tsmith@gryphon.CTS.COM (Tim Smith) (10/16/87)

In article <971@kodak.UUCP> bayers@kodak.UUCP (mitch bayersdorfer) writes:
+=====
| I don't mean to make a reductio ad adsurdum, but can any alphabet which
| a given human can perceive be truely infinite?  Given that human beings have 
| a finite (but very large) number of neurons in their visual and cerebral
| cortex, and that any distinct alphabetic character would exceed the 
| thresholds of an enumerable permutation of those neurons, mustn't there
| be only a finite number of characters (and concepts?).  I am assuming that 
| neural thresholds for a certain learned concept are relatively constant.  If
| another concept produces the same permutation, but only to a greater 
| degree, then couldn't that be reduced down to two concepts-- one a measure 
| of the concept, and the other a measure of its degree?  
|               - Mitch Bayersdorfer
+=====
You are addressing a problem that linguists have had to wrestle
with for a long time. Back in the 1950's, the linguist Noam
Chomsky did some original research on the properties of formal
grammars that might be used to generate sentences in a natural
language. Among other things, he discovered that any reasonable
grammar (or production system) to generate sentences in natural
language would have recursive properties that would, in effect,
make the set of sentences of the language infinite in size. The
corollary to this, of course, is that there is no "longest
sentence" in a natural language. Realizing that it is somewhat
silly to claim that a human can utter or comprehend a sentence
that might be a few giga-centuries in duration, Chomsky fudged
around and invented a distinction between linguistic
"competence" and linguistic "performance". Actually, it's a
reasonable fudge.

A good analogy for this is simple arithmetic. If you know the
rules of, say, multiplication, there is no reason why you can't
multiply two numbers that require a few giga-light-years worth of
paper to write down (in 10-point type). You are theoretically
competent to do this. Your performance abilities (life span,
attention span, etc.) will, however, undoubtedly keep you from
achieving the end product.

Cantor discovered several kinds of infinities. Perhaps what we
need to discover is a kind of sub-infinity. The actual number of
sentences in any natural language is "infinite" in the same
sense that the number of grains of sand on a beach is infinite,
i.e. "not really".

Are alphabets like sentences? Uh, no, not at all! I have stayed out
of this discussion about "infinite alphabets", since I do not
understand the issue. I enter reluctantly. Here goes...

An alphabet is a set (decidedly finite) of glyphs (e.g., little
marks on paper). A language adopts an alphabet, and tries to map
its sound system (its phonology) onto the alphabet. Sometimes
this works well, sometimes not so well. For example, both
Italian and English use the Latin alphabet. Neither language's
phonology maps directly to the alphabet, but Italian maps better
than English. In Italian there are only a few little problems
(the letters "c", "g", "e", and "o" do not map one-to-one to
Italian phonemes). In English, there are many, many problems
(which we are all aware of).

The sound system of every language contains a finite number of
phonemes, and a finite (but much larger) number of syllables.
Therefore, any alphabet (phomeme-glyph mapping), or any
syllabary (syllable-glyph mapping) should be finite.

Ideographic writing systems, such as Japanese "kanji", are not
alphabets, and they are not syllabaries. If we assume that the
languages that use ideographic writing systems allow free
creation of new glyphs, then these writing systems have an
infinite number of glyphs, in exactly the same sense that there
is an infinite number of sentences in English, or that there is
an infinite number of art works that can be created by mankind.

-- 
Tim Smith
INTERNET:     tsmith@gryphon.CTS.COM
UUCP:         {hplabs!hp-sdd, sdcsvax, ihnp4, ....}!crash!gryphon!tsmith
UUCP:         {philabs, trwrb}!cadovax!gryphon!tsmith

rwojcik@bcsaic.UUCP (Richard Wojcik) (10/16/87)

There seems to be some confusion over the nature of writing systems for
human languages.  There are three 'ideal' types: logographic
(word-based), syllabic, and alphabetic.  Alphabetic writing systems are
based on phonemes, the basic units of sound that make up words.  Since
most languages have between 30 and 50 phonemes, it makes little sense to
talk about natural writing systems with infinite alphabets.  Logographic
writing systems, the least efficient type of writing, can have only as
many symbols as there are words (or morphemes) in the vocabulary.  If it
is necessary to carry on this discussion, why not debate the size of the
class of objects that written symbols map onto?

nakashim@russell.STANFORD.EDU (Hideyuki Nakashima) (10/16/87)

In article <971@kodak.UUCP> bayers@kodak.UUCP (mitch bayersdorfer) writes:
>In article <417@russell.STANFORD.EDU> nakashim@russell.UUCP (Hideyuki Nakashima) writes:
>>
>> Chinese can DO have infinite characters.
>
>I don't mean to make a reductio ad adsurdum, but can any alphabet which
>a given human can perceive be truely infinite?

OK.  Let me change "infinite" to "unbounded".  Who cares for infinity
in real life, anyway?

What I wanted to say was, however, not that point.  What Chinese tried
was (or so seems tome was) to assign a symbol for each distinct
concepts.  So, as the number of concepts grow, so does the number of
symbols (potentially).  Europian "word"s correspond to Chinese
"character"s.  No Chinese equivalent of Europian "alphabet"s exist.

-- 
Hideyuki Nakashima
CSLI and ETL
nakashima@csli.stanford.edu (until Aug. 1988)
nakashima%etl.jp@relay.cs.net (afterwards)

randyg@iscuva.ISCS.COM (Randy Gordon) (10/19/87)

In article <436@russell.STANFORD.EDU> nakashim@russell.UUCP (Hideyuki Nakashima) writes:
>.............  No Chinese equivalent of Europian "alphabet"s exist.
>
>
>-- 
>Hideyuki Nakashima
>CSLI and ETL
>nakashima@csli.stanford.edu (until Aug. 1988)
>nakashima%etl.jp@relay.cs.net (afterwards)

sigh, I am probably gonna get my head bit off, but....

I don't know much about Chinese writing, but in terms of formal game theory,
does it really make much of a difference to the equivalence of the languages
that chinese use simple tokens(well sorta) and europeans use complex tokens?
From what I can tell, there is nothing that is impossible to translate either
way(tho it may take a number of manipulations).

RANDY GORDON(TAO KUO TSE FUN PEE)

dougf@lcuxlm.UUCP (10/22/87)

In article <436@russell.STANFORD.EDU>, nakashim@russell.STANFORD.EDU (Hideyuki Nakashima) writes:
> >In article <417@russell.STANFORD.EDU> nakashim@russell.UUCP (Hideyuki Nakashima) writes:
> >> Chinese can DO have infinite characters.

> What I wanted to say was, however, not that point.  What Chinese tried
> was (or so seems tome was) to assign a symbol for each distinct
> concepts.  So, as the number of concepts grow, so does the number of
> symbols (potentially).  Europian "word"s correspond to Chinese
> "character"s.  
Most Chinese "words" these days consist of two characters.  Some "words" are
made up of more characters.  There are less than 10,000 Chinese characters,
but the language is not that limited.  New words are created all the time
generally composed of characters with related meanings.  "Telephone", for
example is two characters, the first for electricity, & the second for 
speech.

Other "words" are formed in the way we form acronyms.  A poor example, but
to the  point: I taught at He-fei Gong-ye Da-xue (the 3rd "word", "University",
is composed of the characters for "big" and a root for "study/school"), 
however it was almost always refered to as He-gong-da. This works for 
things other than names also.  

I am quoting the word "word", because the concept is often slippery in 
Chinese.  In writing, each character is the separated by the same space
whether or not it is part of the same "word".

> No Chinese equivalent of Europian "alphabet"s exist.

True, but as someone has mentioned, It has a very limited number of strokes.
In fact a typewriter keyboard hooked to a microprocessor was invented in 
1982 that could accept characters as input from a combination of as few as
5 types of strokes.  The keyboard also has a number of standard stroke
combinations that are part of many characters.  These save time, but the
5 strokes are all that is needed.  A key to this process is that the 
strokes in a Chinese character are to be written in a specific order. Thus,
the character dictionary consists of ordered sets of the 5 strokes.

The most complex single character has about 24 strokes.  If we take that as
a limit then the maximum number of characters, (assuming all combinations would
be acceptable, which i doubt), would be :

	i=24
	___
	\     5^i	or about 7*10^16
	/
	---
 	i=1

70 quadrillion, although rather large is not infinite.  In fact, few people
would be able to spend the time to learn them all -)

> Hideyuki Nakashima

-- 
	doug foxvog	...allegra!lcuxlj!dougf	[Please use lcuxlj not lcuxlm]
			If only Bell Labs agreed with my opinion...
NSA: names of CIA agents in NRO working on TEMPEST encrypted above.  Drug
	dealing terrorists assassinated for planned hijacking.

lee@uhccux.UUCP (Greg Lee) (10/23/87)

Alphabets are often close to phoneme systems.  They may have characters
for allophones, however.  Karl Menge in his book on Turkish grammar
mentions the Orkhon script of a Turkic dialect which had different
signs for most consonants, depending on whether they occur in front
or back harmonic words.  Judging from Turkish and following the
conventional view, these were probably allophones.
	Greg Lee, lee@uhccux.uhcc.hawaii.edu