[net.unix] VAXclusters and UN*X

dave@uwvax.UUCP (Dave Cohrs) (05/31/85)

A while back I posted the following message:

> Has any work been done to make a version of UN*X that can work with
> VAXclusters and takes advantage of the resource sharing involved?
> If I get a large enough response (i.e. anything) I'll summarize and
> post.

Well, I got a 'why would you want to do that' message and the following
replies:

----
| From: uwvax!seismo!noao!grandi (Steve Grandi)
| 
| I asked the DEC Ultrix folks at the Anaheim Decus about this and was
| told to not hold my breath; the mongol hordes that build VMS took
| years to make their cluster support work, so the relative few working
| on Ultrix have a lot to do!  
| 
| Actually, if you have followed discussions of using dual ported disks
| under Unix you will see how hard a problem a cluster would be to do
| right---how do you do disk block caching for the shared disks?
| 
| Since we really wanted the cluster, we had to order our 8600 with VMS;
| a REAL pity!
----
| From: uwvax!ihnp4!houxm!hou2b!garys (Gary Seubert)
| 
| I believe Dave Leonard (DEC) and members of his group are working on a
| driver for the HSC50. They are the ones who did the System V RA81
| driver. You might try contacting them thru DEC in Holmdel. We run 6
| VAXes with RA81/TU78 configuration and are anxiously (to say the least)
| awaiting word from Dave as to their progress.
----

The fact that nothing is available is about what I expected.
-- 
dave cohrs
...!{allegra,harvard,ihnp4,seismo}!uwvax!dave
dave@wisc-limburger.arpa

    (bug?  what bug?  that's a feature!)

chris@umcp-cs.UUCP (Chris Torek) (05/31/85)

I believe (why I don't know, since I have no way of knowing) that VMS
doesn't use dual-porting for clusters either.

I've been told that the way the cluster controller works is that it
speaks DECnet and does VMS QIO level stuff with some protocol on top of
DECnet.  This makes quite a bit of sense to me . . . .  Anyway, if this
is true, then with DECnet support, it shouldn't be more than a couple
months' effort to hack up kernel stuff that translates UNIX file
requests to VMS QIOs, if one had all the documentation....  (Just use
one huge VMS file to store your disk in, right?)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@maryland

jensen@decwrl.UUCP (Paul Jensen) (06/05/85)

Following is a very brief tutorial on VAXclusters, and how they relate
to unix*:

A cluster is defined by a set of proprietary protocols for implementing
a loosely-coupled multi-processing system.  Two of the key protocols
are System Communication Services (SCS), software which defines and coordinates
members of the cluster; and the Distributed Lock Manager, which allows
locks to be shared between processors.

These protocols are entirely software-based there are no hardware
dependencies in them except at the lowest levels.  Also, the
protocols are such that control is distributed dynamically between
members of the cluster; in fact, there is no such thing as a
"cluster controller" (the HSC50 is logically a peer of the VAX
processors).

The HSC50 is a high-speed IO server.  It services requests for
logical disk blocks.  It does not know anything about file structure:
this is imposed by the VAX processors via the MSCP protocol.
The HSC50 performs various sorts of optimizations (similar
to those done by the FFS) and has a peak transfer rate of nearly
4MB/sec.

The RA-series disks are not dynamically dual-ported.  Dual-porting
was implemented in RA disks for the purpose of allowing the disk
to be accessed by a secondary controller in the event the primary
fails.  In a cluster, a typical configuration would be a disk
dual-ported between either 2 HSC50s or an HSC50 and a UDA50.
Only one path will be active: in the event of the HSC50 failing,
the alternate path will be dynamically failed-over to.

DECnet is totally unrelated to clusters.  It is possible to run
DECnet over a CI bus (using SCS), but a cluster can
run fine without a byte of DECnet code (it IS extremely useful for
system management, however).

Allowing a unix (or any other) system to participate in a cluster would
require implementing at a minimun SCS, the connection manager (software
which decides when to form, change, and dissolve clusters), the
distributed lock manager, and MSCP.  This is a large amount of
code, much of it embedded in VMS (and therefore subject to VMS
licensing restrictions), and porting it would be a major
undertaking.  A major re-write of the file system would be necessary,
and adopting some sort of standard for file locking would be
highly recommended.

All the above work would just give you a distributed file system.
If you wanted distributed job and device queues, you would
have to implement the Distributed Job Controller as well.  Given
the VMS-ish flavor of this protocol, this task might be distasteful,
not to mention non-standard.

In conclusion, the bottom line shakes out as follows:

	o  "cluster" of homogeneous UNIX systems with distributed
	   file system only:  technically feasible but a lot of
	   work (>> 1 man-year).

	o  the above with distributed queues: more work, problems
	   with maintaining a standard version of unix

Regards,

				--- Paul Jensen
				    Digital Equipment Corporation

------------------------------------------------------------------------
Disclaimer:  All information in this response is drawn from public
	     sources.  All opinions expressed are solely my own.
	     In particular, I haven't the faintest idea of the
	     future or current plans of either Ultrix or VMS
	     engineering.

*unix is a trademark of AT&T.

pc@unisoft.UUCP (n) (06/05/85)

<munch>

	I went to a DEC presentation of how their Vax Clusters run under VMS
over a year ago. It goes somewhat like this. 

	The hardware connects upto 16 Vaxes and HSC50s (pdp11s controlling
lots of disk/tape as servers) the servers take care of logical to physical
translations (to the clients the disks are acontiguous defect free array
of blocks). The VMS systems take care of file system management, they run
a distributed heirarchical lock manager between to controll access to the
shared file systems (VMS does not buffer blocks in core in the same way that
Unix does), it is possible to lock at both a file and record level. When a
system goes down all the other systems compare notes, see who has which
locks outstanding and put the shared database in a good state. Again the
disk servers have no part in this they only provide blocks. Their are
several protocols involved. The one to the HSC50s for IO (also vax systems
may provide their own MSCP servers so that other people may access local
disks, at some overhead of course), a lock protocol, (very fast, only needs
one reply ... at 70MHz, dual ported) and DECNet. Note these protocols share
a common physical medium but are separate (nothing rides on top of DECNet) and
the HSC50 doesn't understand DECNet, DECNet is there for remote log throug and
system management (plus the fact that only one node on a cluster need be 
connected to an internet). One good thing about what DEC had to do to get
this running was that they did away with the cumbersome ACPs (disk structure
daemons) hooray!! Most of the synchronisation is in RMS as that is where the
buffering is. All the above is embodied in VMS4.0 and presumably runs
. . . especially considering the time it took them to get it out the door!!

	Anyway that is what I know about VMS VaxClusters, some of this may
have changed since I talked to the DEC people about it, potentially it is
a very cheap way to add more CPU power without spending big

	(I have no connection to DEC apart from being involved in buying
a VaxCluster for my previous employer ...)


		Paul Campbell		..!ucbvax!unisoft!paul
	

faustus@ucbcad.UUCP (Wayne A. Christopher) (06/06/85)

From what I understand of VAXclusters, it would seem that instead of
multi-porting the devices, it would work just as well to have one host
run one device and then do a remote mount (promised in 4.4 BSD) onto
the rest of them. The high bandwidth between hosts would make this
pretty transparent. I guess it doesn't take as full advantage of the
clustering as it could, but I'm sure that if the disks were multi-ported
and writable by all the hosts then you would suffer a lot in bookeeking
overhead, if you could do it at all with UNIX.

	Wayne