campbell@redsox.UUCP (Larry Campbell) (10/01/88)
I'm running Interactive's 386/ix on a Dell 310. On the whole it all works just fine, but there's one showstopper of a bug: using cpio to back up the disk to floppies panics the kernel with a memory parity (NMI) trap! Now, I don't really suspect the memory since everything else works fine. Strangely, I can format and mount floppies, and read and write to them all day, without a panic. It seems to be only writing (not reading) to the raw floppy device rather than the block device that panics the kernel. Interactive say they can't reproduce the problem, but they also haven't got a Dell 310 to try it on. Is anyone out there successfully running 386/ix on a Dell 310? Has anyone on any kind of machine ever had this problem? Although the system works on the whole, I'm getting a bit nervous, because I can't do backups... I can feel Murphy breathing down my neck...
debra@alice.UUCP (Paul De Bra) (10/02/88)
In article <455@redsox.UUCP> campbell@sushi.UUCP (Larry Campbell) writes: >I'm running Interactive's 386/ix on a Dell 310. On the whole it all works >just fine, but there's one showstopper of a bug: using cpio to back up >the disk to floppies panics the kernel with a memory parity (NMI) trap! > >Now, I don't really suspect the memory since everything else works fine. >... >I can't do backups... I can feel Murphy breathing down my neck... You are damn right to feel Murphy breathing down your neck, because this kind of problem DOES indicate a memory problem with the Dell 310. The problem (in general) is that not all memory accesses are equally critical. Accessing memory in some specific order can generate parity errors or worse if your memory does not safely give the memory chips enough time to respond. All these new cranked-up 286 and 386 boxes are pushing things BEYOND their limit. You can run DOS for years, or memory diagnostics for years and never find a problem, yet your unix tries just the kind of access which fails. It is not surprising that IS cannot reproduce the problem on another machine. I have seen this problem before, many times I must add, and in general there is only one solution: either replace the memory chips by faster ones or lower the clock frequency. Your system seems not to run safely at its top speed, because the MMU sometimes does not give the memory chips enough time to respond. (Our supplier has often been able to solve our problems by replacing the memory. It REALLY works.) I am not blaming Dell specifically here, because many companies make the same error. Since memory is expensive they just put in slower chips than the machine really needs. I hope they are listening????????? Paul. |-------------------------------------------------------------------------- |Paul De Bra | I am completely surrounded by giant bugs ! | |debra@research.att.com | There's millions of them, all over this code! | |uunet!research!debra | Beam me up quickly...Please... | |--------------------------------------------------------------------------
james@bigtex.uucp (James Van Artsdalen) (10/06/88)
In article <8254@alice.UUCP>, debra@alice.UUCP () wrote: [ discussion of bizarre memory problems with floppy/hard disk/DMA ... ] > All these new cranked-up 286 and 386 boxes are pushing things BEYOND > their limit. I think the 310 designer would be willing to argue that point. > You can run DOS for years, or memory diagnostics for years and never > find a problem, yet your unix tries just the kind of access which > fails. Which is precisely why unix/Xenix is part of the standard test suite, along with OS/2, Windows, and lots of other exotic stuff. > [...] Your system seems not to run safely at its top speed, because > the MMU sometimes does not give the memory chips enough time to > respond. That's a simple design flaw. There's no excuse for it. > (Our supplier has often been able to solve our problems by replacing > the memory. It REALLY works.) Then your supplier has demonstrated that it is a design flaw. I don't think the question is just related to too-slow RAM. There may be subtle design flaws in the various motherboard chipsets that don't show up unless more than one DMA channel is running. Perhaps it would be a good thing to modify the memory test to run more than one DMA channel while doing the RAM test. I'll have to consult some engineers on what the worst cases really are... > Since memory is expensive they just put in slower chips than the > machine really needs. I hope they are listening????????? If you let the marketing/purchasing/finance people run wild, that probably would happen. But Systems Validation would never sign off to it. In our case (Dell), there is the additional threat of paying for on-site service (and paying for the 800 line time). There are ways to cut corners without compromising reliability, but you'll hurt raw performance - and you don't do things to hurt the prime selling point for your machine. -- James R. Van Artsdalen ...!uunet!utastro!bigtex!james "Live Free or Die" Home: 512-346-2444 Work: 338-8789 10926 Jollyville Rd #901 Austin TX 78759