[mod.computers.vax] Request for info on VAX cluster failures.

pearson%anchor.DECnet@lll-icdc ("ANCHOR::PEARSON") (07/19/86)

How often does an entire VAX cluster crash? We're trying to build a system
with sufficient redundancy to stay up practically all the time (2 VAXes
identical in hardware and software, HSC50, DEC's disk shadowing), and don't
know how paranoid to be about the possibility of the whole cluster's
going out to lunch. Any hints?
    (Please reply to me directly. I'll post a summary.)
    Thanks in advance.

pearson%anchor.decnet@lll-icdc.arpa
------

mhg@MITRE-BEDFORD.ARPA.UUCP (07/22/86)

>How often does an entire VAX cluster crash? We're trying to build a system
>with sufficient redundancy to stay up practically all the time (2 VAXes
>identical in hardware and software, HSC50, DEC's disk shadowing), and don't
>know how paranoid to be about the possibility of the whole cluster's
>going out to lunch. Any hints?

In general, clustering reduces the chance of a crash significantly
(unless of course your DEC-Man just powers-down the HSC without any
advance notice...[It happened to us...]).  If one machine should happen
to crash it essentially becomes an unreachable node to the rest of the
cluster.  It would take quite a bit (short of a loss of power) to bring
an entire cluster down.

I know of one installation that went from one 780 to a cluster
consisting of a 780 and two 750's.  Before they clustered, the 780
would crash at least once a week.  Since clustering, they have had
almost no crashes.

Hope this helps.

Mark H. Granoff
ARPA: mhg@mitre-bedford
 DDD: (617) 271-8438