nagy%warner.hepnet@LBL.ARPA (05/21/87)
The following are my notes from session V095 given at the Fall '86 DECUS Symposium in San Francisco. Low End Cluster Performance (V095) John Haliburton, DEC -------------------------------------------------------------------- more CPU time used due to PE software emulator DEQNA does not understand virtual addresses so more elapsed time for large transfers. CPU time requires for lock operations: 780/CI uVAX-II/NI ------ ---------- local node 2.7 msec 6.0 msec (>2X) for $ENQ/$DEQ pair remote node 1.6 msec 4.8 msec (3X) local node 2.2 msec 5.6 msec for up/down lock convert remote node 1.5 msec 4.6 msec lock operations tend to occur in pairs (ENQ/DEQ, up/down conversions). High locking rates load "master" CPU at IPL$_SCS (8). Potential bottleneck for many clients to swamp one CPU. Most disk accesses are to boot member. MSCP buffering is a tuning issue. Requires substantial boot member CPU. CPU times (note RD53/RD54 performance is approximately same as RA60): local "cluster" disks RD53/54 client server 1 block I/O 2.6 7.4 7.6 msec CPU time 4 blocks 2.7 9.5 9.6 16 blocks 3.5 17.9 18.6 64 blocks 6.1 48.8 52.6 Elapsed times in msec. local served stretch RD53/54 RD53/54 factor 1 block I/O 50 64 1.3 msec elapsed time 4 blocks 55 71 1.3 16 blocks 70 96 1.4 64 blocks 133 217 1.6 i.e. about 30% longer I/O times more CPU intensive => inherent in design of general purpose interface of DEQNA. CI-emulation, MOVCs of the data, extra Ethernet messages Potential bottlenecks: Ethernet capacity Ethernet interfaces at any node Boot member CPU node Ethernet capcity: 10 Mbit is a lot of capacity. Can saturate with an artifical workload. In reality this DOES NOT occur. Disk IOs are small and require lots of CPU time in I/O operations. More realistic to assume 6-8 Mbit for user data. 6 Mbit for lock-intensive; 8 Mbit for IO-intensive. Need 2-3 saturated serving nodes. Ethernet interface: 1.2 Mbit for DEUNA -> 300 blocks/sec 3.3 Mbit for DEQNA and DELUA -> 800 blocks/sec typical IO 4-8 blocks -> 100-200 IOs/sec for DEQNA Avoid DEUNA if many users (interface problem only on boot member for large clusters) boot member CPU: this will be the bootneck. Boot members involved in locks and IOs. 86 users at 80% of boot member CPU. 0.54 locks/sec/user; 0.63 IOs/sec/user; 4.00 blocks/IO Numbers from measuring many actual systems (outside Digital). = Frank Nagy = Fermilab Research Division EED/Controls = FNAL::NAGY.HEPNET or NAGY@FNAL.Bitnet