duncan@vuwcomp.UUCP (Duncan McEwan) (05/21/87)
Can someone please give me some clues as to what might have happened here? When I came in to work yesterday morning and tried to log on to our pyramid (90x, 16 users, OSx 3.1 - not sure what PTF level), after entering my password I got the message 'No utmp entry. You must exec "login" from the lowest level "sh"' I know what this is *supposed* to mean, but not why it should come up when I first log on. This was the same with another user at his terminal, so I went to a terminal in our machine room where I had stayed logged on overnight (I know - security risk and all that!). I ran the "w" command to see if anyone was logged on. The output came back (no one logged on), but then something (the shell?) printed Stopped (tty output) % logout and I was logged out! I tried logging back in on that terminal and got the same utmp error message as at my own. So I took the system to single user, and found the utmp file was zero length. I recalled a similar problem with a version 7 system I used to look after (then it was a *missing* utmp that prevented people from logging on). I thought utmp had to have an entry for each terminal, (the error message seemed to suggest this as well) and decided the easiest way to recreate it was to get the copy off the last backup. Then I edited it with emacs to reflect the fact that no one was logged on by replacing the names with nulls (we do dumps in multiuser mode, but with logins disabled, and the file system being dumped quiescent - that way I can fill in time reading news :-)) I wasn't sure whether this was necessary, but did it just to be safe. I checked to make sure /etc/utmp was correctly pointing at /etc/.ucbutmp, and then brought the system up multiuser. The first time it paniced saying it was trying to free a free inode, I think because I had forgotten to "sync" the disk after editing. So I copied my edited backup of .ucbutmp back into /etc again, and this time sync'ed (sunk?) the disks before rebooting (I also ran a full fsck which reported no problems). But when the system came up multiuser - exactly the same utmp error. Back to single user, and sure enough - /etc/utmp was zero length again. Anyway, to cut a long (two and and a half hour!) story short, after many different ideas had all resulted in the same error, we eventually resorted to a complete restart, including reloading microcode off floppy. When the system rebooted, everything was fine! I was the first to log on, and I was suprised to see that utmp only had an entry for one terminal (the console), so perhaps the fact that it was zero length in single user mode was a red herring. Now it has an entry for each terminal, with a null user name for terminals not in use, which is what I thought it should be. I later discovered that the problem had occured at approx 23:00 the previous evening. 2 or 3 users had been logged on, and they had mysteriously been logged off in much the same way I had been, and then been unable to log back in. The system continued to run fine all night, since various cron jobs had executed properly (including uucp calls using tty lines). What I really want to know is how corrupted microcode (if that is what it was) could have such an effect. I was able to use the system quite happily in single user mode (editing, copying files, etc). Perhaps the itp micro code was responsible since the itp's are not used in single user mode (though I don't think the console goes through an itp anyway, and the error occured when I tried logging on there). I don't really see how bad microcode could affect user level programs like init/getty/login in this way anyway. Could someone who knows more about what goes on inside OSx please suggest something (anything!) that could have been responsible for this. Is it a problem that others have encountered? Is there a PTF to fix it? Replies by email prefered. I will summarise (summarize?) to this group if I get any interesting suggestions. Duncan. UUCP: duncan@vuwcomp.UUCP or ...!calgary!vuwcomp!duncan ACSnet: duncan@vuwcomp.nz