neuron-request@HPLABS.HP.COM ("Neuron-Digest Moderator Peter Marvit") (03/08/89)
Neuron Digest Tuesday, 7 Mar 1989 Volume 5 : Issue 12 Today's Topics: Talk on 10 March in Washington DC Post-processing of neural net output (SUMMARY) Job Opportunity at MITRE Identity Mappings Re: Identity Mappings Re: Identity Mappings Prove non-existence with neural net? WANTED: VAX/VMS neural net simulator noise resistance Re: Loan applications with neural nets effect of attribute set choice on pattern classifiers Re: principal components references? Re: Neuron Digest V5 #11 Re: WANTED: neural net simulator running on MAC II Re: ART? What is ART? Re: Neuron Digest V5 #10 Send submissions, questions, address maintenance and requests for old issues to "neuron-request@hplabs.hp.com" or "{any backbone,uunet}!hplabs!neuron-request" ARPANET users can get old issues via ftp from hplpm.hpl.hp.com (15.255.16.205). ------------------------------------------------------------ Subject: Talk on 10 March in Washington DC From: Masud Cader <GJTAMUSC%AUVM2.BITNET@CUNYVM.CUNY.EDU> Date: Mon, 06 Mar 89 18:32:44 -0400 ********************************************************************* THE ACM chapter of The American University sponsors Dr. Harold Szu Naval Research Laboratory NEURAL NETWORKS : Emerging technology Parallel & Distributed Algorithms An introduction to artificial neural networks wil be presented. Focus will be on a NEW HILL CLIMBING acceptance criteria for a Cauchy version of the BOLTZMANN machine. These criteria make parallel and distributed search possible for finding the global minimum. WHEN : Friday, March 10, 1989. TIME : 2:20pm - 3:00pm WHERE : Rm. 221A. Ward Circle Building. American University. 4400 Massachusetts Ave., NW. (Nebraska Ave & Mass. Ave.) Washington DC 20016. NEAR TENLEY TOWN STOP ON METRO INFORMATION : Masud Cader (202) 885-3306 or Carolyn McCreary (202) 885-3166 or E - mail : gjtamusc@auvm2.bitnet ###################################################################### This talk will be a preview of the upcoming UCLA engineering extension that Dr. Szu will conduct (March 20-23, 1989). Info : 213/825-3344. ------------------------------ Subject: Post-processing of neural net output (SUMMARY) From: mesard@BBN.COM Date: Fri, 03 Feb 89 14:18:51 -0500 About a month ago, I asked for information about post-processing of output activation of a (trained or semi-trained) network solving a classification task. My specific interest was what additional information can be extracted from the output vector, and what techniques are being used to improve performance and/or adjust the classification criteria (i.e., how the output is interpreted). I've been thinking about how Signal Detection Theory (SDT; cf. Green and Swets, 1966) could be applied to NN classification systems. Three areas I am concerned about are: 1) Typically interpretation of a net's classifications ignores the cost/payoff matrix associated with the classification decision. SDT provides a way to take this into account. 2) A "point-5 threshold interpretation" of output vectors is in some sense arbitrary given (1) and because it may have developed a "bias" (predisposition) towards producing a particular response (or responses) as an artifact of its training. 3) The standard interpretation does not take into account the a priori probability (likelihood) of an input of a particular type being observed. SDT may also provide an interesting way to compare two networks. Specifically, the d' ("D-prime") measure and the ROC (receiver operating characteristic) curves which have been successfully used to analyze human decision making, may be quite useful in understanding NN behavior. - --- The enclosed summary covers only responses that addressed these specific issues. (The 19 messages I received totaled 27.5K. This summary is just under 8K. I endeavored to preserve all the non-redundant information and citations.) Thanks to all who replied. - -- void Wayne_Mesard(); Mesard@BBN.COM Bolt Beranek and Newman, Cambridge, MA - -- Summary of citation respondents: - ------- -- -------- ------------ The following two papers discuss interpretation of multi-layer perceptron outputs using probabilistic or entropy-like formulations @TECHREPORT{Bourlard88, AUTHOR = "H. Bourlard and C. J. Wellekens", YEAR = "1988", TITLE = "Links Between {M}arkov Models and Multilayer Perceptrons", INSTITUTION = "Philips Research Laboratory", MONTH = "October", NUMBER = "Manuscript M 263", ADDRESS = "Brussels, Belgium" } @INPROCEEDINGS{Golden88, AUTHOR = "R. M. Golden", TITLE = "Probabilistic Characterization of Neural Model Computations", EDITOR = "D. Anderson", BOOKTITLE = "Neural Information Processing Systems", PUBLISHER = "American Institute of Physics", YEAR = "1988", ADDRESS = "New York", PAGES = "310-316" } Geoffrey Hinton (and others) cites Hinton, G. E. (1987) "Connectionist Learning Procedures", CMU-CS-87-115 (version 2) as a review of some post-processing techniques. He said that this tech report will eventually appear in the AI journal. He also says: The central idea is that any gradient descent learning procedure works just fine if the "neural net" has a non-adaptive post processing stage which is invertible -- i.e. it must be possible to back-propagate the difference between the desired and actual outputs through the post processing. [...] The most sophisticated post-processing I know of is Herve Bourlard's use of dynamic time warping to map the output of a net onto a desired string of elements. The error is back-propagated through the best time warp to get error derivatives for the detection of the individual elements in the sequence. The paper by Kaplan and Johnson in the 1988 ICNN Proceedings addressed the problem. A couple of people Michael Jordan has done interesting work in the area of post-processing, but no citations were provided. (His work from 2-3 years ago does discuss interpretation of output when trained with "don't care"s in the target vector. I don't know if this is what they were referring to.) "Best Guess" ---- ----- This involves looking at the set of valid output vectors, V(), and the observed output, O, and interpreting O as V(i) where i minimizes |V(i) - O| . For one-unit-on-the-rest-off output vectors, this is the same thing as taking the one with the largest activation, but when classifying along multiple dimensions simultaneously, this technique may be quite useful. - ---- J.E. Roberts sent me a paper by A.P. Doohovskoy called "Metatemplates," presented at ICASSP Dallas, 1987 (no, I don't know what that is). He (Roberts) suggests using "a trained or semi-trained neural net to produce one 'typical' output for each type of input class. These vectors would be saved as 'metatemplates'." Then classification can be done by comparing (via Euclidian distance or dot product) observed output vectors with the metatemplates (where the closest metatemplate wins). This is uses the information from the entire network output vector for classification. Probability Measures - ----------- -------- Terry Sejnowski writes: The value of an output unit is highly correlated with the confidence of a binary categorization. In our study of predicting protein secondary structure (Qian and Sejnowski, J. Molec. Biol., 202, 865-884) we have trained a network to perform a three-way classification. Recently we have found that the real value of the output unit is highly correlated with the probability of correct classification of new, testing sequences. Thus, 25% of the sequences could be predicted correctly with 80% or greater probability even though the average performance on the training set was only 64%. The highest value among the output units is also highly correlated with the difference between the largest and second largest values. We are preparing a paper for publication on these results. - --- Mark Gluck writes: In our recent JEP:General paper (Gluck & Bower, 1988) we showed how the activations could be converted to choice probabilities using an exponential ratio function. This leads to good quantitative fits to human choice performance both at asymptote and during learning. - --- Tony Robinson states that the summed squared difference between the actual output vector and the relevant target vector provides a measure of the probability of belonging to each class [in a one-bit-on-others-off output set]. [See "Best Guess" above.] Confidence Measures - ---------- -------- John Denker says: Yes, we've been using the activation level of the runner-up neurons to provide confidence information in our character recognizer for some time. The work was reported at the last San Diego mtg and at the last Denver mtg. - --- Mike Rossen describes the speech recognition system that he and Jim Anderson are working on. The target vectors are real-valued. With each phoneme represented by several units with activation on [-1, 1]: Our retrieval method is a discretized dynamical system in which system output is fed back into the system using appropriate feedback and decay parameters. Our scoring method is based on an average activation threshold, but the number of iterations the - -> system takes to reach this threshold -- the system reaction time -- - -> serves as a confidence measure. [He also reports on intra-layer connections on the outputs (otherwise, he's using a vanilla feedforward net) which sounds like a groovy idea, although it seems to me that this would have pros and cons in his application.] After the feedforward network is trained, connections AMONG THE OUTPUT UNITS are trained. this "post-processing" reduces both omission and confusion errors by the system. Some preliminary results of the speech model are reported in: Rossen, M.L., Niles, L.T., Tajchman, G.N., Bush, M.A., & Anderson, J.A. (1988). Training methods for a connectionist model of CV syllable recognition. Proceedings of the Second Annual International Conference on Neural Networks, 239-246. Rossen, M.L., Niles, L.T., Tajchman, G.N., Bush, M.A., Anderson, J.A., & Blumstein, S.E. (1988). A connectionist model for consonant-vowel syllable recognition. ICASSP-88, 59-66. Improving Discriminability - --------- ---------------- Ralph Linsker says: You may be interested in an issue related, but not identical, to the one you raised; namely, how can one tailor the network's response so that the output optimally discriminates among the set of input vectors, i.e. so that the output provides maximum information about what the input vector was? This is addressed in: R. Linsker, Computer 21(3)105-117 (March 1988); and in my papers in the 1987 and 1988 Denver NIPS conferences. The quantity being maximized is the Shannon information rate (from input to output), or equivalently the average mutual information between input and output. - --- Dave Burr refers to D. J. Burr, "Experiments with a Connectionist Text Reader," Proc. ICNN-87, pp. IV717-IV724, San Diego, CA, June 1987. Which describes a post-processing routine which assigns a score to every word in an English dictionary by summing log compressed activations. ------------------------------ Subject: Job Opportunity at MITRE From: alexis%yummy@gateway.mitre.org Date: Mon, 13 Feb 89 08:43:05 -0500 The MITRE Corporation is looking for technical staff for their expanding neural network effort. MITRE's neural network program currently includes both IR&D and sponsored work in areas ranging from performance analysis, learning algorithms, pattern recognition, and simulation/implementation. The ideal candidate would have the following qualifications: 1. 2-4 years experience in the area of neural networks. 2. Strong background in traditional signal processing with an emphasis on detection and classification theory. 3. Experienced programmer in C/Unix. Experience in graphics (X11/NeWS), scientific programming, symbolic programming, and fast hardware (array and parallel processors) are pluses. 4. US citizenship required. Interested canidates should send resumes to: Garry Jacyna The MITRE Corporation M.S. Z406 7525 Colshire Drive McLean, Va. 22102 USA ------------------------------ Subject: Identity Mappings From: KINSELLAJ@vax1.nihel.ie Date: Thu, 09 Feb 89 17:35:00 +0000 John A. Kinsella Mathematics Dept., University of Limerick, Limerick, IRELAND KINSELLAJ@VAX1.NIHEL.IE The strategy "identity mapping", namely training a feedforward network to reproduce its input was (to the best of my knowledge) suggested by Geoffrey Hinton and applied in a paper by J.L. Elman & D. Zipser "Learning the hidden structure of speech". It is not clear to me, however, that this approach can do more than aid in the selection of the salient features of the data set. In other words what use is a network which has been trained as an identity mapping on (say) a vision problem? Certainly one can "strip off" the output layer & weights and by a simple piece of linear algebra determine the appropriate weights to transform the hidden layer states into output states corresponding to the salient features mentioned above. It would appear, though, that this is almost as expensive a procedure computationally as training the network as well as being numerically unstable with respect to the subset of the training set selected for the purpose. I would appreciate any comments on these remarks and in particular references to relevant publised material, John Kinsella ------------------------------ Subject: Re: Identity Mappings From: Geoffrey Hinton <hinton@ai.toronto.edu> Date: Fri, 10 Feb 89 22:49:34 -0500 The potential advantage of using "encoder" networks is that the code in the middle can be developed without any supervision. If the output and hidden units are non-linear, the codes do NOT just span the same subspace as the principal components. The difference between a linear approach like principal components and a non-linear approach is especially significant if there is more than one hidden layer. If the codes from several encoder networks are then used as the input vector for a "higher level" network, one can get a multilayer, modular, unsupervised learning procedure that should scale up better to really large problems. Ballard (AAAI proceedings, 1987) has investigated this approach for a simple problem and has introduced the interesting idea that as the learning proceeds, the central code of each encoder module should give greater weight to the error feedback coming from higher level modules that use this code as input and less weight to the error feedback coming from the output of the code's own module. However, to the best of my knowledge, nobody has yet shown that it really works well for a hard task. One problem, pointed out by Steve Nowlan, is that the codes formed in a bottleneck tend to "encrypt" the information in a compact form that is not necessarily helpful for further processing. It may be worth exploring encoders in nets with many hidden layers that are given inputs from real domains, but my own current view is that to achieve modular unsupervised learning we probably need to optimize some other function which does not simply ensure good reconstruction of the input vector. Geoff Hinton ------------------------------ Subject: Re: Identity Mappings From: Zipser%cogsci@ucsd.edu Date: Sat, 11 Feb 89 10:49:00 -0800 Perhaps of interest is that in our work with identity mapping of speech, the hidden layer spontaneously learned to represent vowels and consonants in separate groups of units. Within these groups the individual sounds seemed quite compactly coded. Maybe the ease with which we are able to identify the distinct features used to recognize whole items depends on the kind of coding they have in our hidden layers. David Zipser ------------------------------ Subject: Prove non-existence with neural net? From: sdo@linus.UUCP (Sean D. O'Neil) Organization: The MITRE Corporation, Bedford MA Date: Wed, 15 Feb 89 18:29:15 +0000 In article <9775@nsc.nsc.com> andrew@nsc.nsc.com (andrew) writes: >A recently addressed "solution" by supercomputer to a long-standing maths >conjecture - that of the finite projective plane - now exists for planes up >to order 10 (Science News, Dec 24 & 31, 1988), whereby 1000's of hours >of Cray time was needed! This looks like a nice place for a net and a Cray >to do battle... the constraints are simply expressed: >- for order k, construct a square matrix with k**2 + k + 1 rows/columns >- fill the matrix with 0's and 1's, such that: > - every row contains exactly k + 1 1's > - every possible pair of rows has exactly one column in which > both have a '1' Hmmm. Sounds to me like there is some confusion going on here. Let's recall that what was proved was that there is NO finite projective plane of order 10. This was done by showing that no 0-1 matrix of the given type existed for order 10. The contribution of the people involved was to somehow restrict the search of candidate 0-1 matrices so that it only took thousands of hours on a Cray (as opposed to the lifetime of the universe). Now Andrew is proposing to do the 0-1 matrix search on a neural net. However, for order 10 he's not going to find one that satisfies the constraints. What will the neural network output be that will allow us to say, as the Cray people were able to say, that there is NO finite projective plane of order 10? Is his network such that we can *definitively* state it will always find a solution if one exists (thereby allowing us to interpret a negative result as a non-existence proof)? I suspect not. Therefore, it's hard for me to see what, if any, comparison can be made with the Cray 'proof' in this case. >I have successfully solved this for low orders using the linear "schema" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >cs (constraint satisfaction) program from "Explorations...PDP". Exactly. Finite projective planes exist for all orders less than 10 and greater than 1, except for 6 (they are known to exist for orders equal to a prime or a power of a prime). What did the neural net do for order 6? Perhaps Andrew means to search for finite projective planes of order greater than 10 for which we do not know how to do an explicit construction. I believe order 12 is the next case where the results are unknown (in addition to orders for which we know how to explicitly construct finite projective planes, there is a theorem proving that there are no finite projective planes for some other infinite set of orders---I believe 14 is in this set). If one could find a finite projective plane of order 12, that would be an impressive result. I wish him luck, but I am skeptical. Remember, the constraints are hard in the sense that they must be exactly satisfied. Approximate solutions don't count. Sean ------------------------------ Subject: WANTED: VAX/VMS neural net simulator From: Usenet file owner <mailrus!eecae!cps3xx!usenet@TUT.CIS.OHIO-STATE.EDU> Organization: Michigan State University, Engineering, E. Lansing Date: 17 Feb 89 16:56:14 +0000 The title says it all! An acquaintance who has access only to VAX/VMS would like info about availability of any simulator he can use. Please mail raja@frith.egr.msu.edu OR raja%frith.egr.msu.edu@uunet.uu.net I understand there is something called 'Eunice' that will emulate Unix under VAX/VMS. Is it possible to port a simulator like RCS (written for C/Unix) to run under Eunice for VAX/VMS? Thanks! Narayan Sriranga Raja. ------------------------------ Subject: noise resistance From: movellan@garnet.berkeley.edu Date: Fri, 17 Feb 89 10:16:07 -0800 I am interested in ANY information regarding NOISE RESISTANCE in BP and other connectionist learning algorithms. In return I will organize the information and I will send it back to all the contributors. You may include REFERENCES (theoretical treatments, applications ...) as well as HANDS-ON EXPERIENCE (explain in detail the phenomena you encounter or the procedure you use for improving noise resistance). Please send your mail directly to: movellan@garnet.berkeley.edu Use "noise" as subject name. Sincerely, - - Javier R. Movellan. ------------------------------ Subject: Re: Loan applications with neural nets From: joe@amos.ling.ucsd.edu (Fellow Sufferer) Organization: Univ. of Calif., San Diego Date: Fri, 17 Feb 89 18:35:37 +0000 In article myers@eecea.eece.ksu.edu (Steve Myers) writes: > > I am doing a paper for an undergrad class on the use >of neural networks in the evaluation of loan applications. Any >information on the hardware and software required for the >implementation of neural networks in the business environment >would be appreciated. I am more interested in the practical >application than the theory. > >Steve Myers Hecht-Nielsen Corp of San Diego, Ca is doing just such research. It seems they've had some success, too. The neural net was quite good at predicting loan reliabilty. Their real problem was explaining why an applicant was refused: evidently there is a law which requires that the institution explain exatactly what was wrong with an application. That's not quite as easy as it sounds. UUCP: ucbvax!sdcsvax!sdamos!joe INTERNET: joe@amos.ling.ucsd.edu BITNET: joe%amos@ucsd.bitnet ARPA & RELAY: joe%amos.ling.ucsd.edu@relay.cs.net ------------------------------ Subject: effect of attribute set choice on pattern classifiers From: LEWIS@cs.umass.edu Date: Sun, 19 Feb 89 21:43:00 -0500 I'm interested in finding out what literature exists on evaluating how the choice of a set of features affects the performance of pattern classification systems (linear or non-linear). In particular, I'm wondering if there are methods that would let one say things like this: *Given* :two alternative sets of features {a1, a2,...} and {b1, b2,...}, :a set of objects D to be classified, :that the functional form of the classifier is F (for instance, "is linear", or is a polynomial of degree k, or is an arbitrary polynomial) :a set {P1, P2,...} of typical partitions of D that we would like to be able to train classifiers to recognize. *Then* :any classifier with form F will have higher accuracy on {P1,P2,...} (and presumedly on similar partitions) if the objects are represented using {a1, a2,...}, than if they are represented using {b1, b2,...} OR :any classifier with form F will take longer to be trained if objects are represented using {a1, a2,...}, than if they are represented using {b1, b2,...} OR :any other ways that might have been developed for saying that one set of features is better than another Apologies if I've mangled the terminology above; I'm not familiar with the pattern classification literature. Any thoughts? I would appreciate replies directly to me, as I read this list infrequently. I will post a summary of replies to this list. ---Thanks, Dave David D. Lewis ph. 413-545-0728 Computer and Information Science (COINS) Dept. BITNET: lewis@umass University of Massachusetts, Amherst ARPA/MIL/CS/INTERnet: Amherst, MA 01003 lewis@cs.umass.edu USA UUCP: ...!uunet!cs.umass.edu!lewis@uunet.uu.net ------------------------------ Subject: Re: principal components references? From: bond@delta.CES.CWRU.Edu (angus bond) Organization: CWRU Dept of Computer Engineering and Science, Cleveland, OH Date: Mon, 20 Feb 89 18:08:03 +0000 In regards to your query about prinicpal components analysis, et al, you might try looking at: _Statistical_Pattern_Recognition_ by Fukanaga I believe the publisher is Prentice-Hall, but I could be wrong (the book is at home). I took a one-week concentrated course in Image Processing and Pattern Recognition at Purdue and Fukanaga was one of the instructors. He knows his subject. Angus Bond CWRU Comp. Sci. & Engr. ------------------------------ Subject: Re: Neuron Digest V5 #11 From: gary%cs@ucsd.edu (Gary Cottrell) Date: Sun, 05 Mar 89 12:48:38 -0800 Re: principal components references? Gonzales & Wintz, Digital Image Processing, Addison Wesley has a readable section on principal components (the Hotelling transform). gary cottrell 619-534-6640 Computer Science and Engineering C-014 UCSD, La Jolla, Ca. 92093 gary%cs@ucsd.edu (ARPA) {ucbvax,decvax,akgua,dcdwest}!sdcsvax!gary (USENET) gcottrell@ucsd.edu (BITNET) ------------------------------ Subject: Re: WANTED: neural net simulator running on MAC II From: garybc@potomac.ads.com (Gary Berg-Cross) Organization: Advanced Decision Systems, Arlington VA Date: Fri, 24 Feb 89 20:36:46 +0000 What plug in boards/software exist for the Mac II to build neural nets? (sorry if this is an old question, I'm only an occaisional reader). I have heard of the early version of MacBrain but don't have details about the latest. infomation in terms of connections and connectionsper second processing rate would be helpful along with the ususal info about Sw in gerneral (bugs, friendlines, support etc.) **************************************************************************** A knowledge engineer's wish - a full knowledge base and a full life Gary Berg-Cross. Ph.D. ARPANET-INTERNET(garybc@Potomac.ADS.COM) Advanced Decision Systems | UUCP sun!sundc!potomac!garybc Suite 512, 1500 wilson Blvd. |VoiceNet 703-243-1611 Arlington, Va. 22209 **************************************************************************** ------------------------------ Subject: Re: ART? What is ART? From: neighorn@qiclab.UUCP (Steve Neighorn) Organization: Qic Laboratories, Portland, Oregon. Date: Sun, 26 Feb 89 03:20:54 +0000 In article <230@torch.UUCP> paul@torch.UUCP (Paul Andrews) writes: >I guess the title says it all. In a nutshell, ART stands for "adaptive resonance theory", and is a system that can create and organize categories for patterns it is exposed to. It attempts to mimic the human ability of self-organization, where recognition surfaces from environmental interaction. ART can dynamically modify the coding it uses for recognition of steady states in patterns and the suppression of potentially disruptive patterns. The paper you should probably wade through if you really want to learn about ART is entitled _A Massively Parallel Architecture for a Self- Organizing Neural Pattern Recognition Machine_ by Gail Carpenter and Stephen Grossberg. My copy is dated February 1986. Hecht-Nielsen Neurocomputer has a product called AR/NET that implements the Grossberg/Carpenter ART #1 and #2. You might try contacting them for more information. Steven C. Neighorn !tektronix!{psu-cs,nosun,ogccse}!qiclab!neighorn Sun Microsystems, Inc. "Where we DESIGN the Star Fighters that defend the 9900 SW Greenburg Road #240 frontier against Xur and the Ko-dan Armada" Portland, Oregon 97223 work: (503) 684-9001 / home: (503) 645-7015 ------------------------------ Subject: Re: Neuron Digest V5 #10 From: Drew <SCR596%cyber2.central.bradford.ac.uk@NSS.Cs.Ucl.AC.UK> Date: Tue, 28 Feb 89 19:45:00 +0000 Edward Fredkin of MIT has been quoted as saying that the universe is a three-dimensional cellular automaton; a crystalline lattice of interlacing logic units, each one oscillating millions of times per second. A "universal computer" determines when each bit turns on and off. Can anyone tell me where I can find the precise citation? Any clues would be appreciated. Thanks. Drew Radtke Janet: Drew@uk.ac.bradford.central.cyber2 Internet: Drew%cyber2.central.bradford.ac.uk@cunyvm.cuny.edu Earn/Bitnet: Drew%cyber2.central.bradford.ac.uk@ukacrl UUCP: Drew%cyber2.central.bradford.ac.uk@ukc.uucp Post: Science & Society, University of Bradford, Bradford, UK, BD7 1DP. Phone: +44 274 733466 x6135 Fax: +44 274 305340 Telex: 51309 UNIBFD G ------------------------------ End of Neurons Digest *********************