[comp.lang.c] Copying a constant number of bytes

stevesu@copper.UUCP (06/14/87)

In article <900@bloom-beacon.MIT.EDU>, newman@athena.mit.edu (Ron Newman) writes:
> I want to copy a constant number of bytes...  Which is the more
> efficient method for most compilers and machines currently in use?
> 
> Method 1)
> 
>       bcopy ((char *)a, (char *)b, NBYTES);
> 
> Method (1) has the disadvantage that it always involves a subroutine
> call and return.

How sure are you that this is a disadvantage?  I always use
bcopy, and I've never found it to be an efficiency bottleneck.
Until you've profiled your code and proved that bcopy itself or
the associated function call overhead is significant, don't fuss
with other methods.  (BTW, the structure-assignment trick does
work, under "reasonable" compilers, and produces nice, tight
code, but it certainly "looks" fishy enough that I'd be loathe to
use it in bulletproof, portable code even if somebody can come up
with an argument that it's legal.)

                                           Steve Summit
                                           stevesu@copper.tek.com

gwyn@brl-smoke.UUCP (06/14/87)

In article <900@bloom-beacon.MIT.EDU> newman@athena.mit.edu (Ron Newman) writes:
>I want to copy a constant number of bytes NBYTES from address "a"
>to address "b", where "a" and "b" are pointers of unspecified type.
>      bcopy ((char *)a, (char *)b, NBYTES);

Always works (use memcpy rather than bcopy on ANSI-compatible systems).
Does not necessarily involve a subroutine call (depends on implementation),
and the calling overhead is not normally significant anyway.

>      struct nbytes {char s[NBYTES];};
>      *(struct nbytes *)b = *(struct nbytes *)a;

Not guaranteed to work.

The reason for the existence of the str*() and mem*() routines in the
first place is to provide implementation-independent access to
implementation-specific "best" methods of doing these things.  Use them.

chris@mimsy.UUCP (06/15/87)

In article <900@bloom-beacon.MIT.EDU> newman@athena.mit.edu (Ron Newman)
writes:
>I want to copy a constant number of bytes NBYTES from address "a"
>to address "b", where "a" and "b" are pointers of unspecified type.
>Which is the more efficient method for most compilers and machines
>currently in use?
>1)	bcopy ((char *)a, (char *)b, NBYTES);
>2)	struct nbytes {char s[NBYTES];};
>	*(struct nbytes *)b = *(struct nbytes *)a;
>Method (1) has the disadvantage that it always involves a subroutine
>call and return.

Not necessarily: in the dpANS, some functions may be `built in' to
the compiler; in 4.3BSD, you can use the `inline' program, found in
/sys/vax/inline/inline, to expand certain calls (including bcopy)
in line.

>But can I count on a compiler generating efficient code for method (2)?

Obviously this is compiler dependent.  Some existing compilers do
not even support structure assignment.  I hope that any compiler
that does support it does the copy at least as fast as via a bcopy
subroutine call.

>Is the answer any different if I know that "a" and "b" are 32-bit
>aligned, or that NBYTES is a multiple of 4?

Not if you want real portability.

If this particular copy *must* be fast, I would suggest defining
a macro to do it, and using whichever method is fastest on your
machine.  The macro can be put in a file that is clearly marked
`system dependencies'.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
Domain:	chris@mimsy.umd.edu	Path:	seismo!mimsy!chris

jss@hector.UUCP (06/15/87)

In article <900@bloom-beacon.MIT.EDU> newman@athena.mit.edu (Ron Newman) writes:
>
>Method 2)
>
>      struct nbytes {char s[NBYTES];};
>      *(struct nbytes *)b = *(struct nbytes *)a;
>
>

This may work on some machines, but it is not portable. At least
NBYTES will be copied, but since the compiler is normally allowed to
add padding at the end of "struct nbytes" it is also allowed to copy
the padding.

karl@haddock.UUCP (Karl Heuer) (06/16/87)

In article <900@bloom-beacon.MIT.EDU> newman@athena.mit.edu (Ron Newman) writes:
>[To copy a constant number of bytes one can use bcopy/memcpy, or]
>      struct nbytes {char s[NBYTES];};
>      *(struct nbytes *)b = *(struct nbytes *)a;
>... But can I count on a compiler generating efficient code for method (2)?

Well, you can't count on a compiler generating efficient code for ANYTHING,
but the compiler will almost certainly use something at least as efficient
(timewise) as the bcopy/memcpy subroutine.  (Usually it's better, since the
compiler knows the exact size, whereas the subroutine has to handle the
general case.)

Unfortunately, being efficient isn't as important as being right.  The major
problem is that the pointers a and b, since they are not really of this struct
type, may not have the correct alignment.  In particular, I seem to recall the
SVR2 compiler for the 3B2 would force a stricter-than-necessary alignment (and
hence size) on all structs.  Thus, you might even get more bytes than you
asked for.

>Is the answer any different if I know that "a" and "b" are 32-bit aligned, or
>that NBYTES is a multiple of 4?

The answer to your original question is "No", unless the compiler knows it
too.  (It does know the value of NBYTES, but probably doesn't know the
alignment of the pointers.)  The answer I gave is still correct (it's not
strictly portable), but if the size and alignment are sufficiently large you
happen to get the right answer on the machines I'm familiar with.

My recommendation is to use memcpy.  If you choose not to, I suggest that the
resulting code should be well-commented.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

keesan@cc5.bbn.com.UUCP (06/16/87)

In article <900@bloom-beacon.MIT.EDU> newman@athena.mit.edu (Ron Newman) writes:
>I want to copy a constant number of bytes NBYTES from address "a"
>to address "b", where "a" and "b" are pointers of unspecified type.
. . .
>Method 1)
>      bcopy ((char *)a, (char *)b, NBYTES);
>
>Method 2)
>      struct nbytes {char s[NBYTES];};
>      *(struct nbytes *)b = *(struct nbytes *)a;

I will repeat the advice of others, which is to use bcopy.  Portability and
questionable casts aside, the structure assignment may be less efficient.  The
compiler I'm most familiar with does structure assignments by generating a
function call to the function "strasg".  What does strasg do?  It calls bcopy,
but because bcopy doesn't return a value and assignments do [the value of an
assignment expression is the value of the lhs after the assignment], it returns
its first argument.  So instead of getting a simple call to bcopy (which is
in MICROCODE on this machine), you end up taking the address of the two
structures and getting an extra subroutine call, not to mention the overhead
of pushing a return value on the stack and then dereferencing it to get the
structure-valued return.  Even if the optimizer undoes some of this, you've
still got the overhead of the extra function call.
-- 
Morris M. Keesan
keesan@bbn.com
{harvard,decvax,ihnp4,etc.}!bbn!keesan