[comp.os.vms] LAVC Performance notes from Fall 1986 DECUS

nagy%warner.hepnet@LBL.ARPA (05/21/87)

The following are my notes from session V095 given at the Fall '86
DECUS Symposium in San Francisco.

Low End Cluster Performance (V095)		John Haliburton, DEC
--------------------------------------------------------------------
more CPU time used due to PE software emulator
DEQNA does not understand virtual addresses so more elapsed time for
	large transfers.
CPU time requires for lock operations:
		780/CI		uVAX-II/NI
		------		----------
local node	2.7 msec	6.0 msec  (>2X)	for $ENQ/$DEQ pair
remote node	1.6 msec	4.8 msec  (3X)

local node	2.2 msec	5.6 msec	for up/down lock convert
remote node	1.5 msec	4.6 msec

lock operations tend to occur in pairs (ENQ/DEQ, up/down conversions).
High locking rates load "master" CPU at IPL$_SCS (8).
Potential bottleneck for many clients to swamp one CPU.
Most disk accesses are to boot member.  MSCP buffering is a tuning issue.
Requires substantial boot member CPU.
CPU times (note RD53/RD54 performance is approximately same as RA60):
		local	"cluster" disks
		RD53/54	client	server
1 block I/O	2.6	7.4	7.6	msec CPU time
4 blocks	2.7	9.5	9.6
16 blocks	3.5	17.9	18.6
64 blocks	6.1	48.8	52.6
Elapsed times in msec.
		local	served	stretch
		RD53/54	RD53/54	factor
1 block I/O	50	64	1.3	msec elapsed time
4 blocks	55	71	1.3
16 blocks	70	96	1.4
64 blocks	133	217	1.6
i.e. about 30% longer I/O times
more CPU intensive => inherent in design of general purpose interface of DEQNA.
	CI-emulation, MOVCs of the data, extra Ethernet messages
Potential bottlenecks:
	Ethernet capacity
	Ethernet interfaces at any node
	Boot member CPU node
Ethernet capcity: 10 Mbit is a lot of capacity.  Can saturate with an artifical
	workload.  In reality this DOES NOT occur.  Disk IOs are small and
	require lots of CPU time in I/O operations.
	More realistic to assume 6-8 Mbit for user data.
		6 Mbit for lock-intensive; 8 Mbit for IO-intensive.
	Need 2-3 saturated serving nodes.
Ethernet interface: 1.2 Mbit for DEUNA -> 300 blocks/sec
	3.3 Mbit for DEQNA and DELUA -> 800 blocks/sec
	typical IO 4-8 blocks -> 100-200 IOs/sec for DEQNA
	Avoid DEUNA if many users (interface problem only on boot member for
		large clusters)
boot member CPU: this will be the bootneck.
	Boot members involved in locks and IOs.
	86 users at 80% of boot member CPU.
		0.54 locks/sec/user; 0.63 IOs/sec/user; 4.00 blocks/IO
	Numbers from measuring many actual systems (outside Digital).


= Frank Nagy
= Fermilab Research Division EED/Controls
= FNAL::NAGY.HEPNET or NAGY@FNAL.Bitnet