[comp.unix.admin] SVVS requires a panic? Was: Re: CRASH your TANDEM :

guy@auspex.auspex.com (Guy Harris) (03/17/91)

>I talked to a couple of Tandem engineers after USENIX this January.  They
>bragged that their systems failed the SVVS (requiring waivers)
>because some tests to invoke PANIC messages didn't work; Tandem
>UNIX doesn't crash under some of the conditions expected by AT&T.

That's an interesting claim, but I'm rather skeptical of it.  AT&T has
done some bogus things in the SVVS (e.g., requiring that "read()", as I
remember, actually bump the system time returned by "times()"; this
shafted Apollo, because the Domain/OS implementation of "read()" was all
in user mode, so it bumped the user time but not the system time), but
requiring that the system *panic* wasn't one of them, at least not in
the version of the SVVS I've seen.

jfh@rpp386.cactus.org (John F Haugh II) (03/18/91)

In article <6685@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes:
>That's an interesting claim, but I'm rather skeptical of it.  AT&T has
>done some bogus things in the SVVS (e.g., requiring that "read()", as I
>remember, actually bump the system time returned by "times()"; this
>shafted Apollo, because the Domain/OS implementation of "read()" was all
>in user mode, so it bumped the user time but not the system time), but
>requiring that the system *panic* wasn't one of them, at least not in
>the version of the SVVS I've seen.

I don't know about "panic", but I do recall that IBM flunked certain
parts of the SVVS because it has a virtually unlimited process table
(128K entries).  There is apparently a test which forks a whole bunch
and expects to get EAGAIN or some such back and never does.

So, I can imagine that there are certain tests which are required to
"fail", for various values of "fail".  [ And "panic" may or may not be
part of that set ... ]
-- 
John F. Haugh II        | Distribution to  | UUCP: ...!cs.utexas.edu!rpp386!jfh
Ma Bell: (512) 832-8832 | GEnie PROHIBITED :-) |  Domain: jfh@rpp386.cactus.org
"I've never written a device driver, but I have written a device driver manual"
                -- Robert Hartman, IDE Corp.

karish@mindcraft.com (Chuck Karish) (03/18/91)

In article <6685@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes:
>AT&T has
>done some bogus things in the SVVS [ ... ] , but
>requiring that the system *panic* wasn't one of them, at least not in
>the version of the SVVS I've seen.

It sounded sort of odd to me at the time.  Perhaps I misunderstood.

The actual capabilities of Tandem's systems also sounded wildly
improbable to one who's used to less-robust implementations.

	Chuck Karish		karish@mindcraft.com
	Mindcraft, Inc.		(415) 323-9000

scott@zorch.SF-Bay.ORG (Scott Hazen Mueller) (03/19/91)

In article <669239249.25261@mindcraft.com> karish@mindcraft.com (Chuck Karish) writes:
>In article <6685@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes:
>>AT&T has done some bogus things in the SVVS [ ... ] , but requiring that the
>>system *panic* wasn't one of them, [...]

>It sounded sort of odd to me at the time.  Perhaps I misunderstood.

I've seen no claims (to date) that Tandem broke conformance by continuing
to run when it should panic.  I do have a quote that says "There are between
800 and 1,200 places in the UNIX operating system where it can decide to
'panic' and shut down.  Brad [Tandem s/w engineer] isolated nearly 100
junctures that, collectively, are responsible for more than 85 percent of
potential failures.  He then inserted recovery code to keep the system running.
He did this while still conforming to AT&T's standard UNIX System V."

I also have no information on SVVS compliance, though the same issue of our
internal (Tandem) magazine claims in several places that we have 100%
compliance with AT&T's UNIX System V.

>The actual capabilities of Tandem's systems also sounded wildly
>improbable to one who's used to less-robust implementations.

Heh.

I've pulled out one of the CPU's, one of the fans, and on of the disk drives
on our Integrity S2 system.  No problem.

Uptime is not spectacular, a mere 58 days; I've had Sparcservers stay up
that long.  However, we've got rotten power in my building, and even an S2
won't stay up forever without mains power.  It does ride over every minor power
glitch we've had, though, smooth as silk.

Disclaimer - I work for Tandem Computers.  This is my non-work account, and
I do not speak for the company.

-- 
Scott Hazen Mueller | scott@zorch.SF-Bay.ORG or (ames|pyramid|vsi1)!zorch!scott
10122 Amador Oak Ct.| +1 408 253 6767   |Mail fusion-request@zorch.SF-Bay.ORG
Cupertino, CA  95014|Love make, not more|for emailed sci.physics.fusion digests
SF-Bay Public-Access Unix 408-996-7358/61/78/86 login newuser password public

karish@mindcraft.com (Chuck Karish) (03/20/91)

In article <1991Mar18.174115.26445@zorch.SF-Bay.ORG> scott@zorch.SF-Bay.ORG
(Scott Hazen Mueller) writes:
>>In article <6685@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes:
>>>AT&T has done some bogus things in the SVVS [ ... ] , but requiring that the
>>>system *panic* wasn't one of them, [...]

>I've seen no claims (to date) that Tandem broke conformance by continuing
>to run when it should panic.

An e-mail followup from one of the people I talked to in January
clarified this:  the SVVS expected a panic when the timeout table
overflowed.  This didn't happen under Tandem's implementation.

	Chuck Karish		karish@mindcraft.com
	Mindcraft, Inc.		(415) 323-9000

guy@auspex.auspex.com (Guy Harris) (03/24/91)

>An e-mail followup from one of the people I talked to in January
>clarified this:  the SVVS expected a panic when the timeout table
>overflowed.

OK, I'm *still* skeptical that the person you talked to in January
really said that the SVVS expected a panic:

1) how does the SVVS manage to test for a panic?  It's not as if there's
   a call in UNIX that says "notify me when the system panics"....

2) how does the SVVS manage to attempt to make entries in the timeout
   table?

3) how do the SVVS authors manage to justify this, given that the
   callout mechanism in many UNIX kernels isn't specified *anywhere* in
   the SVID - it's an implementation detail?

4) what version of the SVVS were they using, given that the version I
   ran at Sun didn't do any such check?

jfw@ksr.com (John F. Woods) (03/28/91)

guy@auspex.auspex.com (Guy Harris) writes:
>>I talked to a couple of Tandem engineers after USENIX this January.  They
>>bragged that their systems failed the SVVS (requiring waivers)
>>because some tests to invoke PANIC messages didn't work; Tandem
>>UNIX doesn't crash under some of the conditions expected by AT&T.
>That's an interesting claim, but I'm rather skeptical of it.  AT&T has done
>some bogus things in the SVVS (e.g., requiring that "read()", as I remember,
>actually bump the system time returned by "times()"; this shafted Apollo,
>because the Domain/OS implementation of "read()" was all in user mode, so it
>bumped the user time but not the system time), but requiring that the system
>*panic* wasn't one of them, at least not in the version of the SVVS I've seen.

I don't know about SVVS 3, but SVVS 2 certainly didn't, and couldn't, require
a panic.  After all, it kept a journal about the results of each test, and it
could hardly expect that file to come out of a panic with a message correctly
describing the fact that the system crashed, now could it?

I could well believe that the SVVS required *error* returns from subroutines
which aren't required by the SVID, which a more careful implementation could
avoid.