[comp.lang.c] String Handling and run-time libraries

chris@mimsy.UUCP (03/31/87)

>In article <1531@husc6.UUCP> reiter@harvard.UUCP (Ehud Reiter) writes:
>[strcpy is inordinately slow on a uVax II running 4.3BSD]

In article <5@wb1.cs.cmu.edu> avie@wb1.cs.cmu.edu (Avadis Tevanian) writes:
>... the 4.3 libc ... has been carefully optimized to use the fancy
>VAX instructions for the string routines.  Unfortunately, some of
>these instructions are not implemented by the MicroVAX-II hardware.
>As it turns out, what is happening is that your tests (including
>Dhrystone) are causing kernel traps to emulate those instructions!

Exactly.  Strcpy, strcat, and strlen were all modified to use the
Vax `locc' instruction to find the ends of strings.  This instruction
is not implemented in hardware in the uVax II.  The obvious solution
is to arrange the libraries so that on a uVax, programs use a
straightforward test-byte-and-branch loop (see sample code below).

There are two ways to do this.  One could attempt to determine at
run-time whether `locc' is available; or one can simply assume that
anything compiled on a uVax will run on a uVax, and anything compiled
on a `big Vax' will run on a big Vax.  The former would be hard,
requring a system call, but would likely be worthwhile if this
could be done at most once per program run.  The latter is easy:
just build libc.a differently on a uVax (and then watch rdist run,
and weep).

Both tricks, however, require some way for user programs to discover
which CPU is executing them.  A `getcputype' call, anyone?  (But
what about dynamic process relocation, where a program might move
from one CPU type to another?  [ECAPISTRANO, process migrated])

Here is a sample replacement for strlen (untested!), assuming there
were a getcputype system call.

	/* get CPU type numbers */
	#include <sys/cputype.h>

	/* lenroutine is the address of the proper routine, once known */
		.lcomm	lenroutine,4

		ENTRY(strlen)
		.word	0		# save no registers

		movl	lenroutine,r0	# know which routine to use?
		beql	1f		# no, go figure (and pipeline flush)
		jmp	(r0)		# go do it
	/*
	 * Someone should find out whether a branch to the jmp (r0) below
	 * would be slower (two pipeline flushes vs. one?).  Need to test
	 * all architectures!
	 */

	/* figure out which routine to use */
	1:	calls	$0,_getcputype
		cmpl	$UVAX2,r0	# is it a MicroVAX-II?
		beql	2f
		movl	bigvax,r0	# use big vax code
		brb	3f
	2:	movl	chipvax,r0	# use chip vax code
	3:	movl	r0,lenroutine	# remember which to use
		jmp	(r0)		# and go do it

		/* locc version */
	bigvax:
		...			# insert 4.3BSD code here
		ret

		/* byte-at-a-time version */
	chipvax:
		movl	4(ap),r0	# get string
		movl	r0,r1		# and avoid two mem refs
	1:	tstb	(r0)+		# find the \0
		bneq	1b		# loop until just past the \0
		decl	r0		# point back at \0
		subl2	r1,r0		# return r0 - r1
		ret
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
UUCP:	seismo!mimsy!chris	ARPA/CSNet:	chris@mimsy.umd.edu

bjorn@alberta.UUCP (04/01/87)

In article <6042@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes:
>In article <5@wb1.cs.cmu.edu> avie@wb1.cs.cmu.edu (Avadis Tevanian) writes:
>> Unfortunately, some of
>>these instructions are not implemented by the MicroVAX-II hardware.
>>As it turns out, what is happening is that your tests (including
>>Dhrystone) are causing kernel traps to emulate those instructions!
>
>Exactly.  Strcpy, strcat, and strlen were all modified to use the
>Vax `locc' instruction to find the ends of strings.  This instruction
>is not implemented in hardware in the uVax II.  The obvious solution
>is to arrange the libraries so that on a uVax, programs use a
>straightforward test-byte-and-branch loop (see sample code below).

Concur somewhat at this point.

>There are two ways to do this. ...

There is a third and much more efficient way:

	Shared resident libraries.

This way all you have to do is make sure you install the correct
library on a particular machine.  Everyone except memory and
disk drive vendors benefit from shared libraries.  Assuming a
vectored entry point interface to the library, you can move your
images from one type of Vax to another and your program will
run with the most efficient `str*' routines available for that
machine, ie.  the routines in that machines resident library.
None of this re-link everything that uses `ctime' nonsense either.

Of course some people need resident libraries more than others,
a case in point are the customers of Sun Microsystems.  Here
resident libraries, in addition to a host of other benn'ies
previously alluded to, will put a stop to the following:

	"Gak!!  That was a fifty line program.  It took
	forever to link and it eats up 700k of disk space???"

Since Sun is working on making their system SVID compatible
the wait shouldn't be too long now.  If I remember correctly
Apollo has always had resident libraries, but then I've never
even as much as seen an Apollo product.

			Bjorn R. Bjornsson
			alberta!bjorn

guy@gorodish.UUCP (04/01/87)

>Since Sun is working on making their system SVID compatible
>the wait shouldn't be too long now.

This does not follow at all.  The SVID does not describe shared
libraries; this is the correct thing for it to do, since there may be
machines out there for which shared libraries are difficult to
implement.  If we do shared libraries, it won't be because of the
SVID.

>If I remember correctly Apollo has always had resident libraries, but then
>I've never even as much as seen an Apollo product.

They do.

blarson@castor.usc.edu.UUCP (04/02/87)

In article <287@pembina.alberta.UUCP> bjorn@alberta.UUCP (Bjorn R. Bjornsson) writes:
>Assuming a
>vectored entry point interface to the library, you can move your
>images from one type of Vax to another and your program will
>run with the most efficient `str*' routines available for that
>machine, ie.  the routines in that machines resident library.

This isn't the only, or nessisarily best, way to implement shared
libraries.  Primos implements shared libraries via faulted links.
(call by name first time a routine is called, which replaces the
faulting pointer with one the actual routine.)  I think this is yet
another idea they borrowed from Multics.  (The hardware overhead is a
fault bit on pointers, but you could implement it on a vax by
reserving half of your address space. (Hmmm... does a user process
ever need to reference the kernal area of memory?))  Of course, VMS
uses entry vectors, and we all know VMS does everything right. :-)
-- 
Bob Larson
Arpa: Blarson@Usc-Eclb.Arpa
Uucp: (several backbone sites)!sdcrdcf!usc-oberon!castor.usc.edu!blarson
			seismo!cit-vax!usc-oberon!castor.usc.edu!blarson

bjorn@alberta.UUCP (04/03/87)

In article <16014@sun.uucp>, guy%gorodish@Sun.COM (Guy Harris) writes:
>>Since Sun is working on making their system SVID compatible
>>the wait shouldn't be too long now.
> 
>This does not follow at all.  The SVID does not describe shared
>libraries;

I don't know what you mean by "doesn't follow at all" (B-).

SVID specifies shared memory.  The point is that having shared
memory implies ease of shared library implementation but not
vice versa.  For systems that have neither it is probably easier
to just add shared library support than it is to go whole hog
and provide shared memory.

If I end up working at a Sun shop when I finish my degree here (about
two months from now), and Sun does not deliver shared libraries at
the time they distribute an OS with shared memory, you can be sure
that I will implement same right away.

Remember:

	+--------------------------------------------+
	| shared memory => shared resident libraries |
	|                                            |
	| The arrow does not go the other direction  |
	+--------------------------------------------+

				Bjorn R. Bjornsson
				alberta!bjorn

guy@gorodish.UUCP (04/03/87)

>SVID specifies shared memory.  The point is that having shared
>memory implies ease of shared library implementation but not
>vice versa.

Note that the shared libraries implemented in System V, Release 3 do
*!NOT!* use the SVID shared memory mechanism.  Maybe AT&T did the
wrong thing here, maybe not, but they did, at one point, have an
implementation that had shared memory but not shared libraries, and
now have an implementation with shared libraries and shared memory
but do not use the shared memory system calls to provide shared
libraries.  (Note that S5R3 shared memory can't be write-protected;
this may have had something to do with their decision not to use
it....)

>If I end up working at a Sun shop when I finish my degree here (about
>two months from now), and Sun does not deliver shared libraries at
>the time they distribute an OS with shared memory, you can be sure
>that I will implement same right away.

Sun *already* distributes an OS with SVID-compatible shared memory,
although shared memory segments are currently wired down.  This may
change in a future release.  Most of the work in implementing shared
libraries is, as somebody here put it, in the "libraries" part, not
the "shared" part.  Shared memory may be necessary, but it's far from
sufficient.

guy@gorodish.UUCP (04/03/87)

NNTP and article cancellation don't seem to work together; I don't
know if the problem is that the local "rn" and "vnews" are stupid and
don't understand how to tell when an article is yours, or the NNTP
software is stupid and doesn't understand this.  So....

>(Note that S5R3 shared memory can't be write-protected;
>this may have had something to do with their decision not to use
>it....)

Not true; it can be write-protected, so that probably wasn't what
made them decide not to use it.  There may have been other problems;
e.g., using S5 shared memory would require the C startup code to find
and map in the library, and they may not have wanted to do that, or
they may have decided it was too much work to arrange that the
library be mapped in at a particular address (which their shared
library implementation requires).

gwyn@brl-smoke.UUCP (04/06/87)

In article <291@pembina.alberta.UUCP> bjorn@alberta.UUCP (Bjorn R. Bjornsson) writes:
>SVID specifies shared memory.

Since when?

The problem is, shared memory requires adequate support from the
underlying machine architecture and memory management.  Recognizing
this, the SVID (last I looked, my copy is not at hand now) specified
what the shared memory facility must look like IF it is implemented,
but did not require its implementation.

Beyond shared memory, shared libraries require substantial support
from the link editor etc.  Compatibility with existing facilities
may well constraint Sun's ability to implement shared libraries.
Also, are shared libraries important enough to claim a substantial
chunk of their limited development resources?  There are other
important things that also need to be taken care of..

bjorn@alberta.UUCP (04/06/87)

In article <1417@castor.usc.edu>, blarson@castor.usc.edu (Bob Larson) writes:
>In article <287@pembina.alberta.UUCP> bjorn@alberta.UUCP (Bjorn R. Bjornsson) writes:
>>Assuming a
>>vectored entry point interface to the library, you can move your
>This isn't the only, or nessisarily best, way to implement shared
>libraries.  Primos implements shared libraries via faulted links.

Note that I did not specify where the entry vector was to be located.
It can be in the library or in your process.  Having the vector in
your process rather than the library may give you little bit more
flexibility at the cost of space, perhaps a significant amount
(depending on the library size and or the number of routines you need).

>(call by name first time a routine is called, which replaces the
>faulting pointer with one the actual routine.)

I was aware of this method and you can use it in conjunction with
the entry point vector scheme.  You initialize the vector with
distinct values that are guaranteed to cause an access violation.

>  I think this is yet
>another idea they borrowed from Multics.  (The hardware overhead is a
>fault bit on pointers, but you could implement it on a vax by
>reserving half of your address space.

As I alluded to above all that is needed is that such pointers
are guaranteed to fault and that they be distinguishable from each
other.  There is no requirement to reserve half the address space.

I'm getting a little rusty in some of the small details of VMS but
if memory still serves me the VMS run time library lives in system
space (this makes a lot of sense by the way).

> (Hmmm... does a user process
>ever need to reference the kernal area of memory?))

Certainly there have been a lot of lot of service programs written
by non-DEC folks that execute partially in kernel mode to obtain
whatever info they need from VMS.  This is not for the faint of
heart though and you do need CMKRNL (change mode to kernel) privileges.
Now whether you call such programs "user programs" or not is entirely
up to you.

>  Of course, VMS
>uses entry vectors, and we all know VMS does everything right. :-)

It costs enough so it had better do everything right B-).

			Bjorn R. Bjornsson
			alberta!bjorn

franka@mmintl.UUCP (04/07/87)

In article <1417@castor.usc.edu> blarson@castor.usc.edu.UUCP (Bob Larson) writes:
>In article <287@pembina.alberta.UUCP> bjorn@alberta.UUCP (Bjorn R. Bjornsson) writes:
>>Assuming a vectored entry point interface to the library,
>
>This isn't the only, or nessisarily best, way to implement shared
>libraries.  Primos implements shared libraries via faulted links.

The major problem with this is programs which throw function pointers
around, instead of just calling the routines.  If you aren't careful, you
wind up faulting *every* call in some contexts.

Frank Adams                           ihnp4!philabs!pwa-b!mmintl!franka
Ashton-Tate          52 Oakland Ave North         E. Hartford, CT 06108

jpn@teddy.UUCP (04/08/87)

In article <16014@sun.uucp> guy@sun.UUCP (Guy Harris) writes:
>This does not follow at all.  The SVID does not describe shared
>libraries; this is the correct thing for it to do, since there may be
>machines out there for which shared libraries are difficult to
>implement.  If we do shared libraries, it won't be because of the
>SVID.

Please, do the shared libraries ANYWAY!  Pretty Please?

I'm TIRED of Megabyte sized SUN executables.  And linking several tasks
together into a single executable (as recommended in the manuals) isn't
a solution, it's a hack.
.
.
.

henry@utzoo.UUCP (Henry Spencer) (04/12/87)

> ... Compatibility with existing facilities
> may well constraint Sun's ability to implement shared libraries.
> Also, are shared libraries important enough to claim a substantial
> chunk of their limited development resources? ...

For Sun, yes, because their graphics libraries go into almost every
interactive program, and said libraries are *enormous*.  Sun customers
are generally of the opinion that they have better uses for their disk
space (especially on small systems) than storing those libraries once
for every a.out.
-- 
"We must choose: the stars or	Henry Spencer @ U of Toronto Zoology
the dust.  Which shall it be?"	{allegra,ihnp4,decvax,pyramid}!utzoo!henry

gnu@hoptoad.UUCP (04/17/87)

I don't know anything about whether/when shared libraries might appear
in a Sun product, but there are a few interesting papers scheduled for
summer Usenix:

	Virtual Memory Architecture in SunOS
	Rob Gingell, Joe Moran, and Bill Shannon, Sun Microsystems

	Shared Libraries in SunOS
	Rob Gingell, Meng Lee, Xuong Dang, and Mary Weeks, Sun Microsystems

Sun has sold a LOT of 4MB Sun-3/50's that can't be upgraded with more
memory, so they can't just tell you to buy more memory if the 4.N and 5.N
software releases continue the balloon tradition.  Probably shared libraries
is part of what will put more tomatoes in that same itty bitty can.
-- 
Copyright 1987 John Gilmore; you can redistribute only if your recipients can.
(This is an effort to bend Stargate to work with Usenet, not against it.)
{sun,ptsfa,lll-crg,ihnp4,ucbvax}!hoptoad!gnu	       gnu@ingres.berkeley.edu

henry@utzoo.UUCP (Henry Spencer) (04/19/87)

> Sun has sold a LOT of 4MB Sun-3/50's that can't be upgraded with more
> memory, so they can't just tell you to buy more memory if the 4.N and 5.N
> software releases continue the balloon tradition...

For those who haven't heard it, the half-serious rule of thumb is that
Sunnix release X requires 2^(17+X) bytes of memory for the kernel.  I hear
that Sun claims this tradition will be broken with release 4...
-- 
"We must choose: the stars or     Henry Spencer @ U of Toronto Zoology
the dust.  Which shall it be?"    {allegra,ihnp4,decvax,pyramid}!utzoo!henry