[comp.ai.neural-nets] PDP programs not working for large nets?

d83_sven_a@tekno.chalmers.se (Sven (Sciz) Axelsson) (02/09/89)

I and a college of mine are trying to learn something about neural networks
and PDP, and so we wanted to use the PDP programs from 'Explorations...
(PDP Book 3)' as a starting point. 
Instead of running the extremely trivial examples discussed in the books, we
wanted to go for something a little less trivial, using the PA program on a
network 360 units and initial settings as defined as below:

When run (on an IBM AT with 2.5MB memory) this produces the error message

	error 2100: Floating point error: Invalid
	
after which the program exits to DOS. We have assumed that this is caused by
the algoritms used not handling the calculations for nets of this size very
well, or have we done anything wrong? This is what the network definition
looks like:

	definitions:
	nunits 360
	ninputs 180
	noutputs 180
	end
	network:
	%r  180 180 0 180
	end

and we then use a set-up file as the one below where we set lrate to a size
of about 1/180 as suggested in 'Explorations...'

	get network ournet.net
	set dlevel 3
	set lflag 1
	set mode cs 1
	set nepochs 1
	set param lrate .005
	set step cycle

When we excercise ptrain or strain using a suitable training-pattern, we get
the aformentioned error message after a few seconds of processing.
We would very much appreciate if someone could give some explanation of why
this doesn't work or at least confirm that it cannot work. Please answer
promptly, as we must have this information as soon as possible.

+-------------------------+--------------------------------+------------------+
|   Sven Axelsson         |  d83_sven_a@tekno.chalmers.se  |                  |
|   dep:t of Linguistics  |          (^^ best ^^)          |                  |
|   univ. of Gothenburg   |        dlv_sa@hum.gu.se        |                  |
|   SWEDEN                |      usdsa@seguc21.bitnet      |                  |
+-------------------------+--------------------------------+------------------+

andrew@nsc.nsc.com (andrew) (02/11/89)

I tried a skeleton of this using cs, and much less than 1MB RAM.
No problems!

	Andrew Palfreyman, MS D3969		PHONE:  408-721-4788 work
	National Semiconductor				408-247-0145 home
	2900 Semiconductor Dr.			there's many a slip
	P.O. Box 58090				'twixt cup and lip
	Santa Clara, CA  95052-8090

	DOMAIN: andrew@logic.sc.nsc.com  
	ARPA:   nsc!logic!andrew@sun.com
	USENET: ...{amdahl,decwrl,hplabs,pyramid,sun}!nsc!logic!andrew

fenimore@usceast.UUCP (Fred Fenimore) (02/23/89)

In article <539@tekno.chalmers.se> d83_sven_a@tekno.chalmers.se (Sven (Sciz) Axelsson) writes:
>I and a college of mine are trying to learn something about neural networks
>and PDP, and so we wanted to use the PDP programs from 'Explorations...
>(PDP Book 3)' as a starting point.
 
   ... (some stuff deleted about the fact that it resulted in an error)

>well, or have we done anything wrong? This is what the network definition
>looks like:
>
>	definitions:
>	nunits 360
>	ninputs 180
>	noutputs 180
>	end
>	network:
>	%r  180 180 0 180
>	 ^^^^^^^^^^^^^^^^

  This past fall, I was in a special topics course on Neural Nets.  As 
part of the course, we were to implement some type of project using one
of the simulators availible.  What we found with ours was that if you 
use BP or PA, then you cannot use the block commands in the .net file.
We tried it on a Vax 11/725 and a Apollo.  Both machines gave either
a segmentation fault or out of memory error. We spent some time looking
in the various files to see if we could find the error and to confirm
that it was a real bug in the code or what.  The semester ended with
no results so I gave up and coded the project in C. 
We have not had time lately with all the new classes to find out where 
in the source code this error is coming up.
  Hope this helps...
     Fred Fenimore


   \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
   \ Fred Fenimore                    \
   \ Voice  : 803-777-2041            \
   \ Usenet : fenimore@usceast.UUCP   \
   \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\


-- 
___________________________________________________
|  Fred Fenimore   |   fenimore@usceast.UUCP      |
---------------------------------------------------

andrew@nsc.nsc.com (andrew) (02/23/89)

> use the block commands in the .net file.  > We tried it on a Vax 11/725
> and a Apollo.  Both machines gave either > a segmentation fault or out
> of memory error.  
> ___________________________________________________
> |  Fred Fenimore   |   fenimore@usceast.UUCP      | 
> ---------------------------------------------------

Using the <cs> program, I find I top out at about 1E6 weights, 1E3 nodes
using a Unix environment. I'm sure tweaking the source "emalloc()"s would
give relief.  Helpful? - hope so;

	Andrew Palfreyman, MS D3969		PHONE:  408-721-4788 work
	National Semiconductor				408-247-0145 home
	2900 Semiconductor Dr.			there's many a slip
	P.O. Box 58090				'twixt cup and lip
	Santa Clara, CA  95052-8090

	DOMAIN: andrew@logic.sc.nsc.com  
	ARPA:   nsc!logic!andrew@sun.com
	USENET: ...{amdahl,decwrl,hplabs,pyramid,sun}!nsc!logic!andrew

wlp@calmasd.Prime.COM (Walter L. Peterson, Jr.) (02/24/89)

In article <2730@usceast.UUCP>, fenimore@usceast.UUCP (Fred Fenimore) writes:
> 
>   [stuff deleted] ... 
> part of the course, we were to implement some type of project using one
> of the simulators availible.  What we found with ours was that if you 
> use BP or PA, then you cannot use the block commands in the .net file.
> We tried it on a Vax 11/725 and a Apollo.  Both machines gave either
> a segmentation fault or out of memory error. We spent some time looking
> in the various files to see if we could find the error and to confirm
> that it was a real bug in the code or what.  The semester ended with
> no results so I gave up and coded the project in C. 
>  ... [stuff deleted]

Since there have been several questions about porting the PDP code
lately, I'll post this rather than e-mailing it.

The obvious first question is: are you certain that you declared the
block(s) correctly?  Getting things out of order could cause the
program to attempt to allocate 0 bytes. e.g. if you take the XOR.NET 
and give it :
                %r 2 2 2 0
                %r 4 1 2 2
rather than :
                %r 2 2 0 2
                %r 4 1 2 2  

the incorrect definition of the sending level of the first block will
cause the system to attempt to allocate 0 nodes for the sending level
and you will get a run-time such as you describe.  If your network
definitions are correct, then there are several other possibilities;
these should be checked anyway, since the PDP code *IS* sensitive to
compiler and system differences.

First off - the block network definitions DO work. The XOR.NET and
XOR2.NET files that are distributed with the PDP software use them for
BP and there are other network definition files that use them also.  I
have made networks with over 100 nodes in 4 layers (in, out, 2 hidden),
using the block notation and have found no bugs *in the code that 
reads or utilizes* this type of definition.

Note the asterix above; this emphasis indicates that I did not find
bugs in  THAT part of the code, I *DID* find problems elsewhere.  When I
began using the PDP code I found numerous, albeit minor, problems when
I compiled, linked and ran it using TURBO-C V2.0 under MS-DOS V3.1 .

The problem which you found seems to be the same, or close
to one of the ones which I encountered.  My first attempt to run the
BP program after having re-compiled and relinked it under TURBO-C gave
me the "no memory" error.  As I was using the XOR.* files that
come with the code and had not yet made any mods to the code, I knew
that something was not porting correctly.  After a bit I found that
the PDP code's "shells" arround calloc, malloc and realloc allowed an
input parameter of 0 to slip through; if you try to calloc 0 bytes calloc
returns NULL and the code *was* testing for that.  Having fixed that I
was at least able to get started. ( Note: this error happened soon
after the copyright notice was displayed, before any display comes up
on the screen; did yours do the same ? ).

*THEN* I hit the *real* problem.  I started getting Floating Point
errors. In a program that uses floats for darn near everything, that
was real fun to track down :-).  ( I need to acknowledge some VERY
helpful hints from Walter Bright and Eric Raymond ).  The actual
problem with the PDP code when ported to compilers and systems other
than the one on which it was written ( SUN UNIX ? ) is in the casting
of floats to doubles and doubles to floats.  The culprits are at the
points were there are calls to exp(x) and pow(y, x). I don't have
the code here and I don't remember off hand in what functions these
occur, but you can use grep to find them.  The solution is relatively
straight forward.  In those functions the return value is computed in
the return statement; change that.  Add a local variable that is
declared as double, do the computations outside of the return
statment, BEING VERY CAREFUL ABOUT USING PROPER CASTING. Assign the
result to the local variable and then return the local variable . 
For example:

           ...
           double foo;

            ...

           foo = exp( < some expression > );

            ...

           return(foo);

This simple expedient should solve your problems.  Also in the
functions that use the pow(y, x)  [ that is, y raised to the x ], y is
ALWAYS 10, so if your C library provides it, you might want to change
this to pow10(x).  

These casting problems can get nasty and can cause problems that are
not easy to track down; however, once you get them fixed the code runs
just fine.  I have been able to make some rather extensive
modifications to the BP code, having gone so far as converting it to
use Scott Fahlman's "Quick-Prop" ( see "Proc. of the 1988 Connectionist
Models Summer School", Morgan-Kaufman, NY, 1988 ).

If you have the time, it might also be helpfull to convert the code
from the "old" K&R style to ANSI-C with function prototypes, but that
is really not necessary. If you have a LOT of time and you are using
TURBO-C or some other system which provides good screen IO routines,
you might want to get rid of the CURSES emulation stuff.  That will
eliminate some unnecessary function calls and for long runs of large
models that might help to speed things up.

Good Luck,..

-- 
Walt Peterson.  Prime - Calma San Diego R&D (Object and Data Management Group)
"The opinions expressed here are my own and do not necessarily reflect those
Prime, Calma nor anyone else.
...{ucbvax|decvax}!sdcsvax!calmasd!wlp

mesard@bbn.com (Wayne Mesard) (02/24/89)

In article <2730@usceast.UUCP> fenimore@usceast.UUCP (Fred Fenimore) writes:
> What we found with ours was that if you 
>use BP or PA, then you cannot use the block commands in the .net file.
>We tried it on a Vax 11/725 and a Apollo.  Both machines gave either
>a segmentation fault or out of memory error. We spent some time looking
>in the various files to see if we could find the error and to confirm
>that it was a real bug in the code or what.

Nonsense.  Here is the .net file from a net that ran for a over 24 hours
using BP:

    definitions:
    nunits 174
    ninputs 43
    noutputs 2
    nepochs 5000
    ecrit 0.04
    end
    network:
    %r 43 129 0 43
    %r 172 2 43 129
    end
    biases:
    %r 43 131
    end

This has worked on a Sun3/160 running SunOS 3.5 and on a Compaq running
MesSy-DOS. 

[Disclaimer:  Yes, I did have a reason for having so many hidden units.]

-- 
unsigned *Wayne_Mesard();   "What are THEY doing singing in a major key!?"
MESARD@BBN.COM
BBN, Cambridge, MA                               -DB on the Violent Femmes