radford@calgary.UUCP (Radford Neal) (03/21/85)
RE: The discussion of whether C ought to pad structures to align data and the subsequent discussion of how much this gets you on a VAX. There's nothing like actual data on a question like this. I ran the following quick test program on a VAX 11/780 (Berkeley 4.2 C compiler): #include <stdio.h> main(argc,argv) int argc; char **argv; { int a[2]; register int *p; int n; int o; int rw; register int x; n = atoi(*++argv); o = atoi(*++argv); rw = **++argv; p = (int*)((int)&a[0]+o); if (rw=='r') { while (n>0) { *p = 0; *p = 0; *p = 0; *p = 0; *p = 0; *p = 0; *p = 0; *p = 0; *p = 0; *p = 0; n -= 1; } } else { while (n>0) { x = *p; x = *p; x = *p; x = *p; x = *p; x = *p; x = *p; x = *p; x = *p; x = *p; n -= 1; } } } The results are as follows: % time aligntime 100000 0 r 2.1 real 1.8 user 0.0 sys % time aligntime 100000 1 r 5.3 real 3.6 user 0.0 sys % time aligntime 100000 2 r 5.1 real 3.6 user 0.0 sys % time aligntime 100000 3 r 5.7 real 3.7 user 0.1 sys % time aligntime 100000 4 r 1.9 real 1.5 user 0.0 sys % time aligntime 100000 0 w 1.3 real 1.1 user 0.0 sys % time aligntime 100000 1 w 3.2 real 2.7 user 0.0 sys % time aligntime 100000 2 w 5.5 real 2.9 user 0.0 sys % time aligntime 100000 3 w 3.0 real 2.6 user 0.0 sys % time aligntime 100000 4 w 1.5 real 1.2 user 0.0 sys Conclusion: Alignment of longwords on a mod 4 boundary gets you better than a factor of two speed-up on both reads and writes. Alignment at a mod 8 boundary is not significant. This is a bit worse than I would expect. Does the 780's microcode fetch non-aligned longwords a byte at a time from cache? Note that the 64-bit data path to main memory is not relevant, only cache accesses, for this test. Radford Neal The University of Calgary
paul@unisoft.UUCP (Paul Campbell) (03/27/85)
No .... the Vax does not (necessarily) read data from its cache 64 bits at a time. What it does is fetch 64bits into cache from memory when a cache miss occurs (the SBI is 64 bits wide). This is a good idea for several reasons, for instruction lookahead, data lookahead, and it makes the cache design a little easier not having to compare as many address lines. Paul Campbell ucbvax!unisoft!paul
afb3@hou2d.UUCP (A.BALDWIN) (04/02/85)
NO-NO-NO!!!! The SBI is not 64-bits wide. SBI transactions occur as follows: Address (30 bits on 32 bit bus) data (32-bits on 32 bit bus) data (32-bits on 32 bit bus) All other transactions are "special" (and slow things down!@#$%^&). The 11/780 has memory organized in 64-bit chunks which can cause real problems on memory writes (the memory systems has to read the data, mask in the new stuff, then write the data... sorta like core, remember??). If you read the info on the UBA's and MBA's the SBI interaction is described in detail (hardware handbook). The cache is two-way set associative (ie. as you say it reads 64-bits on a miss) write-through cache. The write through feature really slows things on non-aligned transfers for the reason above. Also, it is unclear from the documentation (VAX hardware handbook) that the cache organization is quad-word oriented. In fact most of the VAX cpu functions are byte oriented and I would suspect that cache to be also. Al Baldwin AT&T-Bell Labs ...!ihnp4!hou2d!afb3 [These opinions are my own....Who else would want them!!!]