[comp.unix.wizards] String Handling and run-time libraries

chris@mimsy.UUCP (03/31/87)

>In article <1531@husc6.UUCP> reiter@harvard.UUCP (Ehud Reiter) writes:
>[strcpy is inordinately slow on a uVax II running 4.3BSD]

In article <5@wb1.cs.cmu.edu> avie@wb1.cs.cmu.edu (Avadis Tevanian) writes:
>... the 4.3 libc ... has been carefully optimized to use the fancy
>VAX instructions for the string routines.  Unfortunately, some of
>these instructions are not implemented by the MicroVAX-II hardware.
>As it turns out, what is happening is that your tests (including
>Dhrystone) are causing kernel traps to emulate those instructions!

Exactly.  Strcpy, strcat, and strlen were all modified to use the
Vax `locc' instruction to find the ends of strings.  This instruction
is not implemented in hardware in the uVax II.  The obvious solution
is to arrange the libraries so that on a uVax, programs use a
straightforward test-byte-and-branch loop (see sample code below).

There are two ways to do this.  One could attempt to determine at
run-time whether `locc' is available; or one can simply assume that
anything compiled on a uVax will run on a uVax, and anything compiled
on a `big Vax' will run on a big Vax.  The former would be hard,
requring a system call, but would likely be worthwhile if this
could be done at most once per program run.  The latter is easy:
just build libc.a differently on a uVax (and then watch rdist run,
and weep).

Both tricks, however, require some way for user programs to discover
which CPU is executing them.  A `getcputype' call, anyone?  (But
what about dynamic process relocation, where a program might move
from one CPU type to another?  [ECAPISTRANO, process migrated])

Here is a sample replacement for strlen (untested!), assuming there
were a getcputype system call.

	/* get CPU type numbers */
	#include <sys/cputype.h>

	/* lenroutine is the address of the proper routine, once known */
		.lcomm	lenroutine,4

		ENTRY(strlen)
		.word	0		# save no registers

		movl	lenroutine,r0	# know which routine to use?
		beql	1f		# no, go figure (and pipeline flush)
		jmp	(r0)		# go do it
	/*
	 * Someone should find out whether a branch to the jmp (r0) below
	 * would be slower (two pipeline flushes vs. one?).  Need to test
	 * all architectures!
	 */

	/* figure out which routine to use */
	1:	calls	$0,_getcputype
		cmpl	$UVAX2,r0	# is it a MicroVAX-II?
		beql	2f
		movl	bigvax,r0	# use big vax code
		brb	3f
	2:	movl	chipvax,r0	# use chip vax code
	3:	movl	r0,lenroutine	# remember which to use
		jmp	(r0)		# and go do it

		/* locc version */
	bigvax:
		...			# insert 4.3BSD code here
		ret

		/* byte-at-a-time version */
	chipvax:
		movl	4(ap),r0	# get string
		movl	r0,r1		# and avoid two mem refs
	1:	tstb	(r0)+		# find the \0
		bneq	1b		# loop until just past the \0
		decl	r0		# point back at \0
		subl2	r1,r0		# return r0 - r1
		ret
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
UUCP:	seismo!mimsy!chris	ARPA/CSNet:	chris@mimsy.umd.edu

jack@mcvax.UUCP (03/31/87)

This brings to mind a couple of things I've been wondering
about (but never enough to do something about myself):

- Does anyone know how long the average string in C is?
- At what point does the 4.2 locc/movc3 get faster than the
  ordinary while(*s1++ = *s2++)?
- How many percent of the strings that strcpy sees will be
  word aligned? (I have the feeling that this percentage will
  be *very* high).
- Is there anything useful that can be done with this knowledge,
  (like copying words), without first having to look for end-of-string
  with byte accesses?

Did somebody else look into these things by chance?
-- 
	Jack Jansen, jack@cwi.nl (or jack@mcvax.uucp)
	The shell is my oyster.

chuck@amdahl.UUCP (04/01/87)

In article <6042@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>>In article <1531@husc6.UUCP> reiter@harvard.UUCP (Ehud Reiter) writes:
>>[strcpy is inordinately slow on a uVax II running 4.3BSD]
>
>In article <5@wb1.cs.cmu.edu> avie@wb1.cs.cmu.edu (Avadis Tevanian) writes:
>>[MicroVAX-II doesn't have the same hardware as a VAX]
>
>Exactly.  Strcpy, strcat, and strlen were all modified to use the
>Vax `locc' instruction to find the ends of strings.  This instruction
>is not implemented in hardware in the uVax II.  The obvious solution
>is to arrange the libraries so that on a uVax, programs use a
>straightforward test-byte-and-branch loop (see sample code below).
>
>There are two ways to do this.  One could attempt to determine at
>run-time whether `locc' is available; or one can simply assume that
>anything compiled on a uVax will run on a uVax, and anything compiled
>on a `big Vax' will run on a big Vax.  The former would be hard,
>requring a system call, but would likely be worthwhile if this
>could be done at most once per program run.  The latter is easy:
>just build libc.a differently on a uVax (and then watch rdist run,
>and weep).
>
>Both tricks, however, require some way for user programs to discover
>which CPU is executing them.  A `getcputype' call, anyone?  (But
>what about dynamic process relocation, where a program might move
>from one CPU type to another?  [ECAPISTRANO, process migrated])

Actually, there is a third method.  When using shared subroutine libraries
it can be advantageous to keep all routines in the library bound into
one large file with a jump vector at the top of the file.  When a program
issues a library subroutine call, it branches to a canonical location
in the jump vector for that subroutine, and the jump vector branches to
the appropriate subroutine.

This type of implementation would even work for processes that migrated
from one CPU to a similar, but slightly different, CPU if both CPUs
implemented a shared subroutine library with jump vectors at the same
locations.

For example, on a VAX, a subroutine would call strcpy which would cause
a subroutine call to location 0x01FC in the shared subroutine library.
This location would then branch to code which performed a 'locc' and
'mov3' (or whatever).  When the code migrated to a MicroVAX-II, the code
would still call strcpy by branching to location 0x01FC in the shared
subroutine library.  But this time, this location would branch to
code which performed a simple move-bytes-until-null loop.

-- Chuck

bjorn@alberta.UUCP (04/01/87)

In article <6042@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes:
>In article <5@wb1.cs.cmu.edu> avie@wb1.cs.cmu.edu (Avadis Tevanian) writes:
>> Unfortunately, some of
>>these instructions are not implemented by the MicroVAX-II hardware.
>>As it turns out, what is happening is that your tests (including
>>Dhrystone) are causing kernel traps to emulate those instructions!
>
>Exactly.  Strcpy, strcat, and strlen were all modified to use the
>Vax `locc' instruction to find the ends of strings.  This instruction
>is not implemented in hardware in the uVax II.  The obvious solution
>is to arrange the libraries so that on a uVax, programs use a
>straightforward test-byte-and-branch loop (see sample code below).

Concur somewhat at this point.

>There are two ways to do this. ...

There is a third and much more efficient way:

	Shared resident libraries.

This way all you have to do is make sure you install the correct
library on a particular machine.  Everyone except memory and
disk drive vendors benefit from shared libraries.  Assuming a
vectored entry point interface to the library, you can move your
images from one type of Vax to another and your program will
run with the most efficient `str*' routines available for that
machine, ie.  the routines in that machines resident library.
None of this re-link everything that uses `ctime' nonsense either.

Of course some people need resident libraries more than others,
a case in point are the customers of Sun Microsystems.  Here
resident libraries, in addition to a host of other benn'ies
previously alluded to, will put a stop to the following:

	"Gak!!  That was a fifty line program.  It took
	forever to link and it eats up 700k of disk space???"

Since Sun is working on making their system SVID compatible
the wait shouldn't be too long now.  If I remember correctly
Apollo has always had resident libraries, but then I've never
even as much as seen an Apollo product.

			Bjorn R. Bjornsson
			alberta!bjorn

jfh@killer.UUCP (04/02/87)

In article <6042@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes:
>
> [ lots of stuff before ]
> Both tricks, however, require some way for user programs to discover
> which CPU is executing them.  [ some more sutff ]

I seem to remember for freshman days that there are registers (other than
R0 -> R15 that contain the information about your CPU type.  Maybe some
of them can be accessed in USER mode...

- john.		(jfh@killer.UUCP)

No disclaimer.  Whatcha gonna do, sue me?

chris@mimsy.UUCP (04/03/87)

In article <724@killer.UUCP> jfh@killer.UUCP (John Haugh) writes:
>I seem to remember for freshman days that there are registers (other than
>R0 -> R15 that contain the information about your CPU type.

Yep.

>Maybe some of them can be accessed in USER mode...

Nope.

The register in question is the `SID', System IDentification,
register.  It is read with an `mfpr', Move From Processor Register,
instruction, which is privileged.

Incidentally, a `get me the SID' call is probably a bad idea.
There is a story behind this:  The format of this register varies
with each Vax line.  In 780s, it contains what looks like a serial
number (but in fact is a plant and manufacturing number, which is
not the same).  This has led a number of software vendors (VMS
types, fortunately) to attempt to enforce licenses by using the
VMS `get me the SID' system call.  These vendors were just a bit
too clever, for now, when one's 8600 is upgraded, such software no
longer works, as the SID changes to reflect the upgrade.  On the
8600, you see, the SID contains not a manufacturing number, but
instead, several version numbers.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
UUCP:	seismo!mimsy!chris	ARPA/CSNet:	chris@mimsy.umd.edu

guy@gorodish.UUCP (04/04/87)

>I seem to remember for freshman days that there are registers (other than
>R0 -> R15 that contain the information about your CPU type.

Only one such register: the System Identification Register.  It is
one of a series of "processor registers" that can be accessed using
the Move From Processor Register or Move To Processor Register
instructions...

>Maybe some of them can be accessed in USER mode...

but not from user mode.  MFPR and MTPR are privileged instructions.