chris@mimsy.UUCP (03/31/87)
>In article <1531@husc6.UUCP> reiter@harvard.UUCP (Ehud Reiter) writes: >[strcpy is inordinately slow on a uVax II running 4.3BSD] In article <5@wb1.cs.cmu.edu> avie@wb1.cs.cmu.edu (Avadis Tevanian) writes: >... the 4.3 libc ... has been carefully optimized to use the fancy >VAX instructions for the string routines. Unfortunately, some of >these instructions are not implemented by the MicroVAX-II hardware. >As it turns out, what is happening is that your tests (including >Dhrystone) are causing kernel traps to emulate those instructions! Exactly. Strcpy, strcat, and strlen were all modified to use the Vax `locc' instruction to find the ends of strings. This instruction is not implemented in hardware in the uVax II. The obvious solution is to arrange the libraries so that on a uVax, programs use a straightforward test-byte-and-branch loop (see sample code below). There are two ways to do this. One could attempt to determine at run-time whether `locc' is available; or one can simply assume that anything compiled on a uVax will run on a uVax, and anything compiled on a `big Vax' will run on a big Vax. The former would be hard, requring a system call, but would likely be worthwhile if this could be done at most once per program run. The latter is easy: just build libc.a differently on a uVax (and then watch rdist run, and weep). Both tricks, however, require some way for user programs to discover which CPU is executing them. A `getcputype' call, anyone? (But what about dynamic process relocation, where a program might move from one CPU type to another? [ECAPISTRANO, process migrated]) Here is a sample replacement for strlen (untested!), assuming there were a getcputype system call. /* get CPU type numbers */ #include <sys/cputype.h> /* lenroutine is the address of the proper routine, once known */ .lcomm lenroutine,4 ENTRY(strlen) .word 0 # save no registers movl lenroutine,r0 # know which routine to use? beql 1f # no, go figure (and pipeline flush) jmp (r0) # go do it /* * Someone should find out whether a branch to the jmp (r0) below * would be slower (two pipeline flushes vs. one?). Need to test * all architectures! */ /* figure out which routine to use */ 1: calls $0,_getcputype cmpl $UVAX2,r0 # is it a MicroVAX-II? beql 2f movl bigvax,r0 # use big vax code brb 3f 2: movl chipvax,r0 # use chip vax code 3: movl r0,lenroutine # remember which to use jmp (r0) # and go do it /* locc version */ bigvax: ... # insert 4.3BSD code here ret /* byte-at-a-time version */ chipvax: movl 4(ap),r0 # get string movl r0,r1 # and avoid two mem refs 1: tstb (r0)+ # find the \0 bneq 1b # loop until just past the \0 decl r0 # point back at \0 subl2 r1,r0 # return r0 - r1 ret -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690) UUCP: seismo!mimsy!chris ARPA/CSNet: chris@mimsy.umd.edu
jack@mcvax.UUCP (03/31/87)
This brings to mind a couple of things I've been wondering about (but never enough to do something about myself): - Does anyone know how long the average string in C is? - At what point does the 4.2 locc/movc3 get faster than the ordinary while(*s1++ = *s2++)? - How many percent of the strings that strcpy sees will be word aligned? (I have the feeling that this percentage will be *very* high). - Is there anything useful that can be done with this knowledge, (like copying words), without first having to look for end-of-string with byte accesses? Did somebody else look into these things by chance? -- Jack Jansen, jack@cwi.nl (or jack@mcvax.uucp) The shell is my oyster.
chuck@amdahl.UUCP (04/01/87)
In article <6042@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: >>In article <1531@husc6.UUCP> reiter@harvard.UUCP (Ehud Reiter) writes: >>[strcpy is inordinately slow on a uVax II running 4.3BSD] > >In article <5@wb1.cs.cmu.edu> avie@wb1.cs.cmu.edu (Avadis Tevanian) writes: >>[MicroVAX-II doesn't have the same hardware as a VAX] > >Exactly. Strcpy, strcat, and strlen were all modified to use the >Vax `locc' instruction to find the ends of strings. This instruction >is not implemented in hardware in the uVax II. The obvious solution >is to arrange the libraries so that on a uVax, programs use a >straightforward test-byte-and-branch loop (see sample code below). > >There are two ways to do this. One could attempt to determine at >run-time whether `locc' is available; or one can simply assume that >anything compiled on a uVax will run on a uVax, and anything compiled >on a `big Vax' will run on a big Vax. The former would be hard, >requring a system call, but would likely be worthwhile if this >could be done at most once per program run. The latter is easy: >just build libc.a differently on a uVax (and then watch rdist run, >and weep). > >Both tricks, however, require some way for user programs to discover >which CPU is executing them. A `getcputype' call, anyone? (But >what about dynamic process relocation, where a program might move >from one CPU type to another? [ECAPISTRANO, process migrated]) Actually, there is a third method. When using shared subroutine libraries it can be advantageous to keep all routines in the library bound into one large file with a jump vector at the top of the file. When a program issues a library subroutine call, it branches to a canonical location in the jump vector for that subroutine, and the jump vector branches to the appropriate subroutine. This type of implementation would even work for processes that migrated from one CPU to a similar, but slightly different, CPU if both CPUs implemented a shared subroutine library with jump vectors at the same locations. For example, on a VAX, a subroutine would call strcpy which would cause a subroutine call to location 0x01FC in the shared subroutine library. This location would then branch to code which performed a 'locc' and 'mov3' (or whatever). When the code migrated to a MicroVAX-II, the code would still call strcpy by branching to location 0x01FC in the shared subroutine library. But this time, this location would branch to code which performed a simple move-bytes-until-null loop. -- Chuck
bjorn@alberta.UUCP (04/01/87)
In article <6042@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes: >In article <5@wb1.cs.cmu.edu> avie@wb1.cs.cmu.edu (Avadis Tevanian) writes: >> Unfortunately, some of >>these instructions are not implemented by the MicroVAX-II hardware. >>As it turns out, what is happening is that your tests (including >>Dhrystone) are causing kernel traps to emulate those instructions! > >Exactly. Strcpy, strcat, and strlen were all modified to use the >Vax `locc' instruction to find the ends of strings. This instruction >is not implemented in hardware in the uVax II. The obvious solution >is to arrange the libraries so that on a uVax, programs use a >straightforward test-byte-and-branch loop (see sample code below). Concur somewhat at this point. >There are two ways to do this. ... There is a third and much more efficient way: Shared resident libraries. This way all you have to do is make sure you install the correct library on a particular machine. Everyone except memory and disk drive vendors benefit from shared libraries. Assuming a vectored entry point interface to the library, you can move your images from one type of Vax to another and your program will run with the most efficient `str*' routines available for that machine, ie. the routines in that machines resident library. None of this re-link everything that uses `ctime' nonsense either. Of course some people need resident libraries more than others, a case in point are the customers of Sun Microsystems. Here resident libraries, in addition to a host of other benn'ies previously alluded to, will put a stop to the following: "Gak!! That was a fifty line program. It took forever to link and it eats up 700k of disk space???" Since Sun is working on making their system SVID compatible the wait shouldn't be too long now. If I remember correctly Apollo has always had resident libraries, but then I've never even as much as seen an Apollo product. Bjorn R. Bjornsson alberta!bjorn
jfh@killer.UUCP (04/02/87)
In article <6042@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes: > > [ lots of stuff before ] > Both tricks, however, require some way for user programs to discover > which CPU is executing them. [ some more sutff ] I seem to remember for freshman days that there are registers (other than R0 -> R15 that contain the information about your CPU type. Maybe some of them can be accessed in USER mode... - john. (jfh@killer.UUCP) No disclaimer. Whatcha gonna do, sue me?
chris@mimsy.UUCP (04/03/87)
In article <724@killer.UUCP> jfh@killer.UUCP (John Haugh) writes: >I seem to remember for freshman days that there are registers (other than >R0 -> R15 that contain the information about your CPU type. Yep. >Maybe some of them can be accessed in USER mode... Nope. The register in question is the `SID', System IDentification, register. It is read with an `mfpr', Move From Processor Register, instruction, which is privileged. Incidentally, a `get me the SID' call is probably a bad idea. There is a story behind this: The format of this register varies with each Vax line. In 780s, it contains what looks like a serial number (but in fact is a plant and manufacturing number, which is not the same). This has led a number of software vendors (VMS types, fortunately) to attempt to enforce licenses by using the VMS `get me the SID' system call. These vendors were just a bit too clever, for now, when one's 8600 is upgraded, such software no longer works, as the SID changes to reflect the upgrade. On the 8600, you see, the SID contains not a manufacturing number, but instead, several version numbers. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690) UUCP: seismo!mimsy!chris ARPA/CSNet: chris@mimsy.umd.edu
guy@gorodish.UUCP (04/04/87)
>I seem to remember for freshman days that there are registers (other than >R0 -> R15 that contain the information about your CPU type. Only one such register: the System Identification Register. It is one of a series of "processor registers" that can be accessed using the Move From Processor Register or Move To Processor Register instructions... >Maybe some of them can be accessed in USER mode... but not from user mode. MFPR and MTPR are privileged instructions.