[comp.unix.sysv386] SCO OpenDesktop Crashing With Weird Disk Problems

erc@pai.UUCP (Eric Johnson) (09/06/90)

Help!

I've been having some terrible problems with SCO's OpenDesktop 1.0.
I'm not sure if these are hardware, software or both. And, I'd
appreciate any help from the net. (Please note that I really don't
blame anyone but myself and that any and all help is requested.
Thanks.)

My system:

SCO ODT 1.0, X11, Motif, DOS, TCP/IP, Software Dev.
Avex 386 mainboard 25 MHz
Adaptec 2322-16 ESDI disk controller
Paradise VGA Plus 800x600x16
Western Digital 8003EBT Ethernet (thin, and the only system on my own net)
Imprimis Wren 6 320 MB disk
8 MB RAM
Phoenix BIOS
Logitech serial Mouse (latest rev)
Relaxed security defaults


I normally run the X Window system and use the box for developing
programs and writing for my next book.  My default config is two 
large xterms and one xclock, under the Motif window manager, mwm.

1) I cannot seem to be able to run the system with "heavy" use for more
than four hours. (I'm developing Motif programs).  During a major make
session, running the C compiler (stock cc), I'll see a message like
"Killed."
or 
"Signal receieved"

(I'm not typing ANYTHING at all during this time.)

Then, the X server usually freezes and the only thing I can do is
Alt-Sys Req to trash the X server (and my compile processes).  When I
get back to the console (I sure wish xterm -C worked, so I could
see console messages under X!), the screen is filled with hard disk
errors.  These errors keep getting worse, and generally I have to hit 
the hard reset button.  Now, this is a brand new system, but I 
never rule out hardware (e.g., disk) problems. These disk errors
are continuous and all the system seems to be doing is printing
these errors to the screen.

When I reboot, though, fsck seems to fix all the disk problems. So, 
the hard disk bad track errors don't seem to me to really be bad
tracks, unless fsck isn't really fixing the situation. fsck has always
been voodoo to me, but it has always seemed to do the job on the
many versions of UNIX I've used.

I'm using an Adaptec ESDI controller and an Imprimis Wren 320 MB disk.
Any ideas as to what is causing this? Is it probably hardware, or
could it be in the software, too?

2) (Related to #1, above): A Motif program I wrote, which normally works
fine (its just a test of the Scale widget and it works fine on a number
of UNIX workstations), all of a sudden was killed, like above. The
disk then went berserk, so I did the infamous Alt-SysReq to trash the X 
server. At the console, I again saw streams of disk errors.  When the
system rebooted, and I tried to run my test program, it didn't run.
Instead, it looked like it ran dfspace (a df variant that SCO uses).
Now, whenever I start an xterm, OpenDesktop (ODT) seems to run dfspace
in the new window, so I suspect this is in the system-wide .login
or some file like that. Anyway, my executable did something other
than it had ever done.  Anyone ever seen anything like this?
I deleted the file, since I didn't like what happened.

3) One of the times the #1 stuff happened, the ttys data base
(part of system security) got trashed, so only the superuser (root)
could log in.  An SCO Tech support person led me through the process
of pulling in a ttys file from the distribution floppy (the ODT manual
has a great section on fixing this problem, but it assumes that at
least one ttys* file exists, which I didn't have).  Note: the SCO
Tech Support folks are great (once you actually get to talk to them).
I've called them a number of times and they've always helped out with
very good advice.  The main problem here wasn't the lost file, but:
   a) A trouble-shooting section in the manual that dealt
   with the problem but made too many assumptions to be actually
   workable.
   b) The implications of ever using a product that has so many
   weird ("weird" as in not in other versions of UNIX I've seen)
   files which are required to use the system.  This has bad 
   implications for my employer adopting this product (see below).
 

4) One of the times the above (#1) stuff happened, one of the C
compiler executables got deleted (/lib/386/p2_286).  So, to recover,
I ran the custom program to pull that file in from the distribution
floppy diskette. I had two main problems with this:
   a) Every time I try to install one file, custom brings over the
   file just fine (I think), but then custom always dies with an
   "Internal Error: 10#".  In errno.h, error 10 is related to
   calling wait on a child that doesn't exist, I think. What exactly
   is custom doing that causes it to die so ungracefully?  Can
   anyone bring over single files from the ODT dsitribution
   disks using custom?

   b) Once I had the infamous /lib/386/p2_386 file, I still could
   not compile anything. Why? Because the /lib/386/p2_386 program
   wasn't "serialized" (a part of SCO's copy protection scheme).
   Now, how can I "serialize" one single file?  Remember that
   custom dies for me every time I try to install single files,
   so I never get to the serialization phase from custom (like I
   did when I first installed this stuff).  I tried RTFM-ing,
   but I didn't find any mention of how to serialize one file.
   Anyone know how?  Even if I my main disk problem is hardware-
   related, this is a serious issue.  I don't really mind SCO's
   copy-protection scheme (which is also very much like Interactive's),
   but, a copy protection scheme should be aimed at preventing
   unauthorized users, not AUTHORIZED users!  When copy protection
   schemes get in my way, I tend to drop the products.  

   So SCO (and Interactive, too, since you have a CP scheme as well),
   listen up:  All this (above) is for my own private system,
   but during the day I work in R&D at Boulware Technologies
   (see signature below) and BTI provides industrial automation
   systems.  We expect things like files to get trashed out in 
   the field. We also demand the ability to recover from things like
   this.   This last week, I was asked to evaluate 386 UNIXes
   for BTI.  (A 386 running UNIX is generally cheaper than 
   a full-blown UNIX workstation, especially since BTI puts together
   their own 386 clones.) I had to state that I did not think
   that ANY 386 UNIX has evolved to an acceptable level yet.  That
   is, installation is too hard and fraught with problems (it only
   took me 11 full tries to get SCO ODT installed; I've given up 
   for now on ISC 2.2), system administration is also too hard and
   especially for SCO fraught with all sorts of security-related
   issues, and I generally don't have confidence that these versions
   of UNIX will run under demanding conditions in the field (with
   users who aren't very UNIX-literate).  In other words, I feel that
   Hewlett-Packard and Sun (for example) have a much stronger
   software product than either SCO or Interactive and that I
   do not have the necessary confidence in SCO or Interactive to
   recommend their products yet.   I do not mean for this to be
   a bitch session, so please don't take it as such. And yes,
   I do understand that 386 UNIXes must support a vast array of
   not-so-compatible hardware options, so there are more problems
   to face on a 386.  I want you SCO and Interactive folks to 
   take this constructively. I'd love if your products improved
   (and yes, I have seen them improve thus far).  I'd love to have
   the confidence in your products, because that would mean a
   substantial cost savings for my employer.  But, I just don't
   feel the products are there yet.

   I finally had to re-install the ODT basic software development
   package to be able to return to a state where I could compile
   C files.  yech-o.

5) How does one change to single-user mode without changing your
system forever?  I always try to run custom in single-user mode,
so instead of bringing the whole system down and then re-booting
(due to time, as I was on the phone to SCO Tech Support at the time),
I tried:
   shutdown -iS -g0 -y

That is, shutdown to run-state S (single user), right now (-g0)
and yes (-y) I want to do it.

After doing this, the system console changed from /dev/tty01 to
/dev/syscon (which meant I had to change my X start-up scripts
in .login), and the root user is always asked:

TERM = (ansi)

This never happened before I ran that one shutdown.  What has really
happened to my system and why did it change forever from one
shutdown?  I've always been used to the idea that shutting down
to single-user mode should just do that and not irretrievably
change your system when you reboot back up to multi-user run-
state. That is, if you boot to single-user run-state, this should
be the same as shutting down to single-user run-state. Single-user
run-state should be single-user run state.

6) Just about every other time I start the X Window server, I
get screen jitter mode. That is, the screen jitters vertically
so fast (basically moving every pixel up and down about 1/4 of
the height of the screen).  This, obviously, makes X totally
unusable.  Usually, I need to stop X, then logout and then
restart the X server.  Normally, everything works fine then.
Mostly, it goes bad every other time, although somethimes more
often and sometimes less often.  Any ideas?  I'd love to have
it work right every time, of course.


If anyone has any information on any of these topics, I'd appreciate
email (or a post if you're so inclined). I'll summarize the email
responses I get for the net. If you suggest RTFM, please point out
which manual and which section. I'd love to get this box where I
can spend a whole day working on my book and not wasting hours
trouble-shooting my system.



Thanks,
-Eric

erc@pai.mn.org

  
   

-- 
Eric F. Johnson               phone: +1 612 894 0313    BTI: Industrial
Boulware Technologies, Inc.   fax:   +1 612 894 0316    automation systems
415 W. Travelers Trail        email: erc@pai.mn.org     and services
Burnsville, MN 55337 USA

lerman@stpstn.UUCP (Ken Lerman) (09/07/90)

In article <1420@pai.UUCP> erc@pai.UUCP (Eric Johnson) writes:
->
[...lots of details deleted...]
->My system:
->
->SCO ODT 1.0, X11, Motif, DOS, TCP/IP, Software Dev.
->Avex 386 mainboard 25 MHz
->Adaptec 2322-16 ESDI disk controller
->Paradise VGA Plus 800x600x16
->Western Digital 8003EBT Ethernet (thin, and the only system on my own net)

I had problems when I was the only system on my net in the sense that
I just left my ethernet card (3com503) unconnected.  After running for
a few hours, I could no longer log in and it seemed to have other
problems, also.  (I forget the details, it was a long time ago.)
After I connected a tee connector with a pair of terminators, problems
went away.  (As to why I want to have a network if I'm the only one on
it, my machine is a portable, and when I'm at the office, I am not
alone.)

->Imprimis Wren 6 320 MB disk
->8 MB RAM
->Phoenix BIOS
->Logitech serial Mouse (latest rev)
->Relaxed security defaults
[...more deleted...]
->
->Thanks,
->-Eric
->
->erc@pai.mn.org
->-- 
->Eric F. Johnson               phone: +1 612 894 0313    BTI: Industrial
->Boulware Technologies, Inc.   fax:   +1 612 894 0316    automation systems
->415 W. Travelers Trail        email: erc@pai.mn.org     and services
->Burnsville, MN 55337 USA

It sounds like your problem is more complex than this, but this may be
part of it.

Ken

jbayer@ispi.COM (Jonathan Bayer) (09/07/90)

erc@pai.UUCP (Eric Johnson) writes:


}Help!

}I've been having some terrible problems with SCO's OpenDesktop 1.0.
}I'm not sure if these are hardware, software or both. And, I'd
}appreciate any help from the net. (Please note that I really don't
}blame anyone but myself and that any and all help is requested.
}Thanks.)

}My system:

}SCO ODT 1.0, X11, Motif, DOS, TCP/IP, Software Dev.
}Avex 386 mainboard 25 MHz
}Adaptec 2322-16 ESDI disk controller
}Paradise VGA Plus 800x600x16
}Western Digital 8003EBT Ethernet (thin, and the only system on my own net)
}Imprimis Wren 6 320 MB disk
}8 MB RAM
}Phoenix BIOS
}Logitech serial Mouse (latest rev)
}Relaxed security defaults


}I normally run the X Window system and use the box for developing
}programs and writing for my next book.  My default config is two 
}large xterms and one xclock, under the Motif window manager, mwm.

}1) I cannot seem to be able to run the system with "heavy" use for more
}than four hours. (I'm developing Motif programs).  During a major make
}session, running the C compiler (stock cc), I'll see a message like
}"Killed."
}or 
}"Signal receieved"

}(I'm not typing ANYTHING at all during this time.)

}Then, the X server usually freezes and the only thing I can do is
}Alt-Sys Req to trash the X server (and my compile processes).  When I
}get back to the console (I sure wish xterm -C worked, so I could
}see console messages under X!), the screen is filled with hard disk
}errors.  These errors keep getting worse, and generally I have to hit 
}the hard reset button.  Now, this is a brand new system, but I 
}never rule out hardware (e.g., disk) problems. These disk errors
}are continuous and all the system seems to be doing is printing
}these errors to the screen.

}When I reboot, though, fsck seems to fix all the disk problems. So, 
}the hard disk bad track errors don't seem to me to really be bad
}tracks, unless fsck isn't really fixing the situation. fsck has always
}been voodoo to me, but it has always seemed to do the job on the
}many versions of UNIX I've used.

}I'm using an Adaptec ESDI controller and an Imprimis Wren 320 MB disk.
}Any ideas as to what is causing this? Is it probably hardware, or
}could it be in the software, too?

It sounds like it could be either the controller card, or the electronics 
on the drive.  Try replacing the controller and see what happens.

What appears to be happening is that something is locking up in the
electronics and prevents the system from reading/writing to the disk.



}   b) Once I had the infamous /lib/386/p2_386 file, I still could
}   not compile anything. Why? Because the /lib/386/p2_386 program
}   wasn't "serialized" (a part of SCO's copy protection scheme).
}   Now, how can I "serialize" one single file?  Remember that
}   custom dies for me every time I try to install single files,
}   so I never get to the serialization phase from custom (like I
}   did when I first installed this stuff).  I tried RTFM-ing,
}   but I didn't find any mention of how to serialize one file.
}   Anyone know how?  Even if I my main disk problem is hardware-
}   related, this is a serious issue.  I don't really mind SCO's
}   copy-protection scheme (which is also very much like Interactive's),
}   but, a copy protection scheme should be aimed at preventing
}   unauthorized users, not AUTHORIZED users!  When copy protection
}   schemes get in my way, I tend to drop the products.  

First, call SCO.  They will be able to walk you through the
serialization of a single file.

Second, instead of loading the file from the distribution disks, why not
restore the file from your backups? (you _do_ do backups, don't you :-) 
If you restore from backups (and you can restore a single file) the file
will already be serialized.





}Thanks,
}-Eric



Your welcome.





JB
-- 
Jonathan Bayer		Intelligent Software Products, Inc.
(201) 245-5922		500 Oakwood Ave.
jbayer@ispi.COM		Roselle Park, NJ   07204    

wain@seac.UUCP (Wain Dobson) (09/08/90)

In article <1420@pai.UUCP> erc@pai.UUCP (Eric Johnson) writes:
>
>Help!
>
>I've been having some terrible problems with SCO's OpenDesktop 1.0.
>I'm not sure if these are hardware, software or both. And, I'd
>appreciate any help from the net. (Please note that I really don't
>blame anyone but myself and that any and all help is requested.
>Thanks.)
>
>My system:
>
>SCO ODT 1.0, X11, Motif, DOS, TCP/IP, Software Dev.
>Avex 386 mainboard 25 MHz
>Adaptec 2322-16 ESDI disk controller
>Paradise VGA Plus 800x600x16
>Western Digital 8003EBT Ethernet (thin, and the only system on my own net)
>Imprimis Wren 6 320 MB disk
>8 MB RAM
>Phoenix BIOS
>Logitech serial Mouse (latest rev)
>Relaxed security defaults
>
>
[A whole lot deleted, here.]

Think that I would backtrack and do a new install. But, before the install,
check your hardware configuration, thoroughly. First, what disk parameters
are you using? Secondly, what is the primary address of the disk? Thirdly,
what is the base address of ethernet controller. Fourthly, how is your vga
addressing its memory, contiguously or split. Since the primary disk
address, the ethernet address, and method whereby the vga sets up its
memory can all come into conflict, really check this carefully. From what
you describe about the behaviour of motif, I think that you have a
conflict, with the ethernet and the vga, or the vga is not configured
properly --- watch out for any eight bit and 16 bit configuration on
the vga (do not know the Paradise first hand so I can only suggest this as
a possible problem). On our client machines we took into consideration the
vga addressing, first, then the primary address of the disk controller,
and then the ethernet address to see how things would layout. Then we
reversed the orde to ethernet, disk, vga. (Eliminated alot of vga's
that way.) Anyways, SCO's release notes are very explicit on how to
set up the WD8003E. Follow it and don't monkey with it. Then do the
hard drive, and then the vga.

As to the install itself, make sure that you install the upgrade
immediately after the OS. If it fails with an internal error,
move /etc/perms/dsmd to /tmp and then do the upgrade installation and
then move /tmp/dsmd back to /etc/perms/dsmd. 
-- 
Wain Dobson, Vancouver, B.C.
	...!{uunet,ubc-cs}!van-bc!seac!wain