[comp.dcom.lans] Recurring Novell Problem

pcb@gator.usl.edu (Peter C. Bahrs) (02/09/90)

I used to think it was a loose connection problem that periodically
caused the Netware server to crash (IBM PS/2 Model 80).  So
I tightened all the connectors up and no problems, I thought.

Periodically the clients (IBM PS/2 Model 70's) go 'dead' and can't
locate the server any longer.  The server's screen will show
  Running MUXPRC
or something similar and it says to power down the system?!@!@
Then we have to reboot the clients.  What a pain...

(Today after I reboot the server it said:
    FAT location 1041 used
    but file does not exist. Do you want to mark...y/n
I said n??)

With the above exception, the server always comes back up with no
problems ??? I don't understand.

Has anyone seen this?  Where ,..., what have you done to rectify the
situation?


/*------------Thanks in advance...---------------------------------------+
| Peter C. Bahrs                                                         |
| The USL-NASA Project                                                   |
| Center For Advanced Computer Studies   INET  pcb@gator.cacs.sl.edu     |
| University of Southwestern Louisiana   ...!uunet!dalsqnt!gator!pcb     |
| Lafayette, LA 70504                                                    |
+-----------------------------------------------------------------------*/

mrichey@orion.oac.uci.edu (Mike Richey) (02/11/90)

In article <867@gator.usl.edu> pcb@gator.usl.edu (Peter C. Bahrs) writes:
>I used to think it was a loose connection problem that periodically
>caused the Netware server to crash (IBM PS/2 Model 80).  So
>I tightened all the connectors up and no problems, I thought.
>
>Periodically the clients (IBM PS/2 Model 70's) go 'dead' and can't
>locate the server any longer.  The server's screen will show
>  Running MUXPRC
>or something similar and it says to power down the system?!@!@

This is a somewhat important message. If you have a GPI or an NMI
error (Running MUXPRC tipped me off) then you probably have a bad or
very flaky mempry chip/board in your server. 

My experience has been that if the bad memory chip is located in the lower
addresses of RAM, then the errors will occur often. If the failing address
is higher in memory, this error will occur less frequently. Now what
you need to do is either find a good set of diagnostics, like Check-It,
or QAPlus and run the diagnostics on the server and identify the failing
module.

GPI error can be caused by a number of things. Power supplies, a NIC, system
board, etc. All I can offer you is a bit of good luck. So Good luck.

>    FAT location 1041 used
>    but file does not exist. Do you want to mark...y/n
>I said n??)
>
>With the above exception, the server always comes back up with no
>problems ??? I don't understand.
>
>Has anyone seen this?  Where ,..., what have you done to rectify the
>situation?
>

This could be caused by faulty RAM. Netware caches reads and writes.

There are easy ways of taking care of this. I really think you need get
the NMI error fixed first. (The FAT error will be taken care of by Netware)


Michael S. Richey   mrichey@orion.oac.uci.edu
University of California, Irvine     Network Services

david@cwlim.CWRU.EDU (David Nerenberg) (02/11/90)

In article <867@gator.usl.edu> pcb@gator.usl.edu (Peter C. Bahrs) writes:
>I used to think it was a loose connection problem that periodically
>caused the Netware server to crash (IBM PS/2 Model 80).  So
>I tightened all the connectors up and no problems, I thought.>
>Periodically the clients (IBM PS/2 Model 70's) go 'dead' and can't
>locate the server any longer.  The server's screen will show
>  Running MUXPRC
>or something similar and it says to power down the system?!@!@
>Then we have to reboot the clients.  What a pain...>
>(Today after I reboot the server it said:
>    FAT location 1041 used
>    but file does not exist. Do you want to mark...y/n
>I said n??)>
>With the above exception, the server always comes back up with no
>problems ??? I don't understand.>
>Has anyone seen this?  Where ,..., what have you done to rectify the
>situation?

Your problem could be caused by many possible things as, if you can find
in the Novell error manual, says.  In my experience, it is one of two
things:  1.  Your servers memory.  Although IBM 80's are usually good with
             what type and quality of memory chips/SIMMS they have, one
    	     unreliable byte and your server goes into this process.  You
     	     could try changing memory.  Unfortunately, memory tests don't
	     usually find these subtle errors because they are not always
  	     there (heat, flaw, who knows).  I have found that talking the
	     server down and making it into a DOS machine with a RAM disk
	     taking up all of the memory and then running disk checking
	     programs has been able to find some of these subtle errors.
	     Both Norton DT and OPTUNE in non-cache mode have been able 
	     able to find some of my problems.
	 2.  The other possible cause could be a dirty AC line.  Without
	     the use of an osciloscope, I know of no easy way to tell if
	     this is the problem.  If you have a UPS or a good line
	     conditioner, this isn't the problem.
	My first guess would have to be the memory, that is as long as
you are not running MAC VAPs! 

The error you received was because the server locked up when it was  accessing
the hard drive and the operating system FAT has an entry without a file being
on the disk.  The error would continue to be there on bootup of the server
until the OS either uses the HOT FIX or another file uses the space on the
drive intend for the file that the FAT thinks shoud be there.  Basiclly,
no matter what you answered to the quetion, no harm willl come to your
server.
 
					Good luck....
						Dave
david@cwlim.ins.cwru.edu
dwn@pyrite.som.cwru.edu

gribble@cica.cica.indiana.edu (gribble) (02/12/90)

I had recurring problems w/ a Model 80 server--that was eventually diagnosed
as bent pins on the connector where the memory boards plug into mother board.
It took months of hairpulling before I noticed the bent pins w/ a flashlight.
This will give the same type of errors as the previously mentioned memory errors
Actually, it was kinda of embarrassing...

-- 
************************************************************************
* Steve Gribble                  Internet: gribble@cica.cica.indiana.edu
* Lead Computer Consultant                 swg@iumail.ucs.indiana.edu
* Dept. of Sociology, Indiana University   Bitnet:   gribble@iubacs

chapman@acf4.NYU.EDU (Gary W. Chapman) (02/13/90)

I have had HUGE problems with a model 80 server (running NW386); we
replaced the motherboard, and all is well.  The bad server had '87 ROMS;
the working server motherboard has '88 ROMS.  I do not know if
it was the ROMS or whether there was some other motherboard problem.