2FHGKINGLY@kuhub.cc.ukans.edu (11/10/89)
Has anyone had any problems with your NeXT not booting up off the hard drive? What happens is the machine locks up during the boot process. You fix the problem by booting off optical and copying the sdmach file from optical to the hd. This has happened to me once and others three more times. Any suggestions, comments? Blake Hughes, undergrad, University of Kansas
eht@f.word.cs.cmu.edu (Eric Thayer) (11/10/89)
In article <17504@kuhub.cc.ukans.edu> 2FHGKINGLY@kuhub.cc.ukans.edu writes: >Has anyone had any problems with your NeXT not booting up off the hard >drive? What happens is the machine locks up during the boot process. >You fix the problem by booting off optical and copying the sdmach >file from optical to the hd. This has happened to me once and others >three more times. Any suggestions, comments? If you get the waiting for SCSI to become ready ............... I've seen this before a couple of times. > >Blake Hughes, undergrad, University of Kansas -- Eric H. Thayer School of Computer Science, Carnegie Mellon (412) 268-7679 5000 Forbes Ave, Pittsburgh, PA 15213
rogerj@batcomputer.tn.cornell.edu (Roger Jagoda) (11/10/89)
In article <6906@pt.cs.cmu.edu> eht@f.word.cs.cmu.edu (Eric Thayer) writes: >In article <17504@kuhub.cc.ukans.edu> 2FHGKINGLY@kuhub.cc.ukans.edu writes: >>Has anyone had any problems with your NeXT not booting up off the hard >>drive? What happens is the machine locks up during the boot process. >>You fix the problem by booting off optical and copying the sdmach >>file from optical to the hd. This has happened to me once and others >>three more times. Any suggestions, comments? > >If you get the waiting for SCSI to become ready ............... I've seen >this before a couple of times. > >> >>Blake Hughes, undergrad, University of Kansas > >-- >Eric H. Thayer School of Computer Science, Carnegie Mellon >(412) 268-7679 5000 Forbes Ave, Pittsburgh, PA 15213 Well, I've got another piece of bad news. I was talking to my NeXT sales rep yesterday. He had just gotten his 40 MB accelerator drives. Guess what, they ARE Quantum drives and he didn't know which firmware these drives had. Sigh, I think Apple just stuck it to their old buddy Steve. Maybe someone from NeXT could check with Quantum to see whether they're ready for a HOLE bunch of returns! Roger Jagoda System Support Cornell University FQOJ@CORNELLA.CIT.CORNELL.EDU
madler@tybalt.caltech.edu (Mark Adler) (11/11/89)
Yep, I've seen just that happen a few times to some NeXT's here. It seems to be contagious since it happens to a few machines connected over ethernet at the same time (but not all of them?). It's never happened (fingers crossed) to my standalone NeXT (no net connection). I have no suggestions. Mark Adler
feldman@umd5.umd.edu (Mark Feldman) (11/11/89)
In article <17504@kuhub.cc.ukans.edu> 2FHGKINGLY@kuhub.cc.ukans.edu writes: ... >What happens is the machine locks up during the boot process. >You fix the problem by booting off optical and copying the sdmach >file from optical to the hd. ... >Blake Hughes, undergrad, University of Kansas I've had three systems corrupted the same way. It's a bug. Many others have suffered the same bug. As of yet, no one knows what is causing the boot file to become corrupted, let alone how to prevent it. Mark
gerrit@nova.cc.purdue.edu (Gerrit) (11/11/89)
In article <17504@kuhub.cc.ukans.edu> 2FHGKINGLY@kuhub.cc.ukans.edu writes: >Has anyone had any problems with your NeXT not booting up off the hard >drive? What happens is the machine locks up during the boot process. >You fix the problem by booting off optical and copying the sdmach >file from optical to the hd. This has happened to me once and others >three more times. Any suggestions, comments? This is a real problem and NeXT is aware of it. The current theory is that some user level process is opening /sdmach carelessly for write and "accidentally" dropping garbage. The symptoms seem to be that a block of zeroes is written at the beginning of the data segment and on the next reboot, the machine hangs after printing out the memory configuration. The fix listed above is currently the best workaround for the problem, so keep a distribution OD within your reach for a while. NeXT has a few sites running some tests hoping to isolate the faulty program. Once that is done, you should expect to see an updated version of the faulty bugger, possibly in the archives on cc.purdue.edu, possibly available via email, possibly available more directly from NeXT. gerrit
callisto@blake.acs.washington.edu (Finn) (11/12/89)
In article <12609@cit-vax.Caltech.Edu> madler@tybalt.caltech.edu.UUCP (Mark Adler) writes: > >Yep, I've seen just that happen a few times to some NeXT's here. It seems >to be contagious since it happens to a few machines connected over ethernet >at the same time (but not all of them?). It's never happened (fingers crossed) >to my standalone NeXT (no net connection). I have no suggestions. > >Mark Adler Dare I ask... Virus????
ali@polya.Stanford.EDU (Ali T. Ozer) (11/12/89)
In article <5604@umd5.umd.edu> feldman@umd5.umd.edu (Mark Feldman) writes: >In article <17504@kuhub.cc.ukans.edu> 2FHGKINGLY@kuhub.cc.ukans.edu writes: >>What happens is the machine locks up during the boot process. >>You fix the problem by booting off optical and copying the sdmach >>file from optical to the hd. >I've had three systems corrupted the same way. It's a bug. Many others >have suffered the same bug. As of yet, no one knows what is causing the >boot file to become corrupted, let alone how to prevent it. Yes, this is a bug. NeXT is working on it. If your system freezes up during the boot process, after announcing the amount of memory and possibly the number of buffers used, then you might be bitten by this bug. You will need to boot from a 1.0 optical to fix things; please diff your /sdmach file against the good one from the OD; if they are different copy the one from the OD oveer the corrupted one. If you can duplicate the problem, please send me mail and I'll get it to the OS engineers. Ali
divine@gargoyle.uchicago.edu (Dwight Divine) (11/16/89)
Please be patient with me, as this is my first post to a net. I seem to have suffered this kernel corruption problem, with a nasty twist. Not only does the machine fail to boot from the hard drive (locking up shortly after checking the RAM), but it will not boot from the 1.0 System floptical. When I use the Mach boot-from-floptical command, the disk begins to load in, but after setting up several of the daemons, I receive the message that the window servers cannot be accessed/opened. The boot-up locks at this point, but I *am* able to power down using the power switch. Still, I cannot successfully boot up, and thus cannot fix the kernel corruption problem as per the suggested fix posted in the various articles about this problem. This floptical problem could be due to a variety of things which have no relation to the original kernel problem, and user error has not as yet been ruled out. However, has anyone had this happen to them? If so, I would greatly appreciate any input anyone has to offer. Thanks much for the time. With Thanks, Dwight Divine (div3@tank) NeXT System Administrator Usite, U of Chicago
ali@polya.Stanford.EDU (Ali T. Ozer) (11/16/89)
In article <12811@polya.Stanford.EDU> Ali T. Ozer writes: >In article <5604@umd5.umd.edu> feldman@umd5.umd.edu (Mark Feldman) writes: >>I've had three systems corrupted the same way. It's a bug. Many others >>have suffered the same bug. As of yet, no one knows what is causing the >>boot file to become corrupted, let alone how to prevent it. >Yes, this is a bug. NeXT is working on it. The bug has been discovered and there is a workaround, in fact, an incredibly simple one. Launch a Shell, become root, and remove the executable bit on your kernel: su [type password] chmod a-x /sdmach The problem occurs if you try to launch an executable in the Mach preload format; depending on how the pages our laid out in the file, a part of the file might become corrupted if paging occurs after the file is "launched." Mach preload executables are meant to be bootable images and are not meant to be executed by the demand-paged system; thus your system will not lose any functionality when you remove the executable bit. You will just be assuring that the kernel is not launched inadvertently (either from the Shell or with a double-click), which is probably what caused the problem in all cases. There are only two preload format files in the system, the kernel and the boot file. The boot file has been shipped without the executable bit so it's fine. Thanks to Alan Marcum and Avie for the explanation and workaround. Ali
feldman@umd5.umd.edu (Mark Feldman) (11/17/89)
In article <12837@polya.Stanford.EDU> ali@Polya.Stanford.EDU (Ali T. Ozer) writes: ... >The bug has been discovered and there is a workaround, in fact, an incredibly >simple one. Launch a Shell, become root, and remove the executable bit on >your kernel: > > su > [type password] > chmod a-x /sdmach Ok, I read it, I did it, but I'm not very happy about the implications. >The problem occurs if you try to launch an executable in the Mach preload >format; depending on how the pages our laid out in the file, a part of the >file might become corrupted if paging occurs after the file is "launched." The files /sdmach and /odmach (which are the same file) are owned by root and their permissions are 555 -- readable and executable by all, writable by none. How is it that the file can be written to when it is executed by a user other than root? >Mach preload executables are meant to be bootable images and are not meant >to be executed by the demand-paged system; thus your system will not lose >any functionality when you remove the executable bit. You will just be >assuring that the kernel is not launched inadvertently (either from the >Shell or with a double-click), which is probably what caused the >problem in all cases. The fact that it is possible to write to a file when you don't have permission is very bad. Very, very bad. And why would the system ever try to page back to a program file? Me thought that that is what a swap file was for. >There are only two preload format files in the system, the kernel and the boot >file. The boot file has been shipped without the executable bit so it's fine. Not fine. Getting an error back when trying to execute one of these files would be fine. Getting a core dump would be ok. Having the original, write protected file corrupted is not. >Thanks to Alan Marcum and Avie for the explanation and workaround. > >Ali > Ali, if the fix will keep my kernels from being corrupted, thanks! If it's one thing that I can't stand, it's a corrupted kernel. But what am I missing? Mark p.s. If someone has a NeXT and does not have USENET access, how will they find out about the fix?
ali@polya.Stanford.EDU (Ali T. Ozer) (11/17/89)
In article <5631@umd5.umd.edu> feldman@umd5.umd.edu (Mark Feldman) writes: >In article <12837@polya.Stanford.EDU> I wrote: >>The bug has been discovered and there is a workaround ... >>The problem occurs if you try to launch an executable in the Mach preload >>format; depending on how the pages our laid out in the file, a part of the >>file might become corrupted if paging occurs after the file is "launched." > >The files /sdmach and /odmach (which are the same file) are owned by root >and their permissions are 555 -- readable and executable by all, writable by >none. How is it that the file can be written to when it is executed by a >user other than root? This is a bug, after all --- and bugs break rules. The bug will only occur if you try to execute a preload format file, and even then only under special circumstances, which the sdmach file exhibits. This bug will not occur when executing normal demand-paged executables or trying to execute other non-executable files. Again --- this is not a file system bug but rather a bug in the program loader trying to load a preload format file. sdmach is the only file in the system that will cause this bug to occur. >If someone has a NeXT and does not have USENET access, how will they find >out about the fix? NeXT is getting the news out to customers through various other channels. Ali
dcarpent@sjuphil.uucp (D. Carpenter) (11/17/89)
>>If someone has a NeXT and does not have USENET access, how will they find >>out about the fix? > >NeXT is getting the news out to customers through various other channels. > >Ali What other channels? Does NeXT have any regular means for communicating with its customers? All I know is what I read in this newsgroup or in the newspapers and trade press. Being a NeXT owner without at the same time being a NeXT support person can leave one feeling rather isolated. -- =============================================================== David Carpenter dcarpent@sjuphil.UUCP St. Joseph's University dcarpent%sjuphil.sju.edu@relay.cs.net Philadelphia, PA 19131 ST_JOSEPH@HVRFORD.BITNET
jsb@panix.UUCP (J. S. B'ach) (11/17/89)
In article <12609@cit-vax.Caltech.Edu> madler@tybalt.caltech.edu.UUCP (Mark Adler) writes:
)
)Yep, I've seen just that happen a few times to some NeXT's here. It seems
)to be contagious since it happens to a few machines connected over ethernet
)at the same time (but not all of them?). It's never happened (fingers crossed)
)to my standalone NeXT (no net connection). I have no suggestions.
Well it happened to mine. And I was booting from the optical drive. The fix was
to resore from the hard drive. NeXT is aware of the bug and asked to be sent
copies of the corrupt kernel to help them debug.
--
rutgers!cmcl2!panix!jsb or (more reliably) uunet!actnyc!jsb
"There aren't enough men around. Every time there's a plane accident, it's 100 men
dead or it's a troop transport, and I literally think, 'Why couldn't some women
have been on that flight?'" - Helen Gurley Brown