[comp.std.c] memcpy

tim@maths.tcd.ie (Timothy Murphy) (09/19/90)

Recently, while debugging the Unix version of unzip.c,
I found a surprising discrepancy between 'memcpy' on various machines.

In unzip.c it is assumed that the effect of
	buf[0] = c;
	memcpy(buf+1, buf, 20);
is to set
	buf[0] = buf[1] = buf[2] = ... = buf[21] = c.

This is indeed the effect with most compilers I tried --
rather surprisingly, to me --
but not with cc on the Sun SparcStation (sun4).

Is this a bug on the Sparc,
or is memcpy not fully specified?

-- 

Timothy Murphy  

e-mail: tim@maths.tcd.ie

karl@haddock.ima.isc.com (Karl Heuer) (09/21/90)

In article <1990Sep19.021418.11574@maths.tcd.ie> tim@maths.tcd.ie (Timothy Murphy) writes:
>In [unix] unzip.c it is assumed that the effect of
>	buf[0] = c;
>	memcpy(buf+1, buf, 20);
>is to set
>	buf[0] = buf[1] = buf[2] = ... = buf[21] = c.

This is incorrect.  The proper way to achieve that effect is
	memset(buf, c, 21);

>Is this a bug on the Sparc [which doesn't do this],
>or is memcpy not fully specified?

The latter.  The effect of memcpy() on overlapping arrays is undefined.
(There is a new function memmove() which does have predictable behavior.)

Karl W. Z. Heuer (karl@kelp.ima.isc.com or ima!kelp!karl), The Walking Lint

gwyn@smoke.BRL.MIL (Doug Gwyn) (09/21/90)

In article <1990Sep19.021418.11574@maths.tcd.ie> tim@maths.tcd.ie (Timothy Murphy) writes:
>Is this a bug on the Sparc,
>or is memcpy not fully specified?

Neither one, really.  The implementation of memcpy() is not specified,
but its functional behavior is.  The objects fed to it as source and
destination are required to not overlap.  If they do overlap, what
happens will depend on details of the memcpy() implementation.  In fact
the naive implementation would work for overlap in one direction but
not for overlap in the other direction.  Some memcpy() implementations
have attempted to "do the right thing" (meaning: follow the sequential
pick-the-entire-block-up-into-a-temporary-register, then-put-the-
temporary-register-contents-back-down-at-the-new-location model) for
cases of overlaps, while others have not.  Sometimes the available
architectural support is a factor in determining the "preferred"
implementation, since the hardware may support fast block moves that
don't work according to the simple model.

X3J11's solution was to require conforming implementations to provide
both memcpy(), which is not required to conform to the simple model,
and memmove(), which IS required to conform to the simple model (and
therefore is the function that should be used when overlap is possible).
In many implementations, memcpy() will be implemented merely as an
alternate entry point for memmove(), but in others memcpy() will have
its own separate (presumably somewhat faster) implementation.

Since memmove() is a recent invention, many existing C libraries do not
yet provide it.  I think there have been public-domain implementations,
but I doubt that a fully portable one would be possible due to the lack
of a portable way to determine whether or not the two objects overlap.

blodgett@apollo.HP.COM (Bruce Blodgett) (09/21/90)

In article <1990Sep19.021418.11574@maths.tcd.ie>, tim@maths.tcd.ie
(Timothy Murphy) wrote:
> In unzip.c it is assumed that the effect of
> 	buf[0] = c;
> 	memcpy(buf+1, buf, 20);
> is to set
> 	buf[0] = buf[1] = buf[2] = ... = buf[21] = c.
...
> Is this a bug on the Sparc,
> or is memcpy not fully specified?

According to the ANS C standard, section 4.11.2.1, "If copying takes
place between objects that overlap, the behavior is undefined."  Thus
whatever a C compiler wants to do is "correct" in this case.

By the way, why don't you use the much safer standard function:
    memset(buf, c, 21);
which sets buf[0] = buf[1] = buf[2] = ... = buf[20] = c.
If you want buf[21] == c (which is NOT what your original code does),
make the 3rd argument 22 rather than 21.
Bruce Blodgett
blodgett@apollo.hp.com
(508) 256-0176 x4037

cpcahil@virtech.uucp (Conor P. Cahill) (09/21/90)

In article <1990Sep19.021418.11574@maths.tcd.ie> tim@maths.tcd.ie (Timothy Murphy) writes:
>In unzip.c it is assumed that the effect of
>	buf[0] = c;
>	memcpy(buf+1, buf, 20);
>is to set
>	buf[0] = buf[1] = buf[2] = ... = buf[21] = c.

The effect of memcpy on overlapping memory areas is undefined.  unzip.c 
should not depend upon any particular behavior.

>Is this a bug on the Sparc,

No.  it is a bug in unzip.

-- 
Conor P. Cahill            (703)430-9247        Virtual Technologies, Inc.,
uunet!virtech!cpcahil                           46030 Manekin Plaza, Suite 160
                                                Sterling, VA 22170 

henry@zoo.toronto.edu (Henry Spencer) (09/21/90)

In article <1990Sep19.021418.11574@maths.tcd.ie> tim@maths.tcd.ie (Timothy Murphy) writes:
>In unzip.c it is assumed that the effect of
>	buf[0] = c;
>	memcpy(buf+1, buf, 20);
>is to set
>	buf[0] = buf[1] = buf[2] = ... = buf[21] = c.

The effects of invoking memcpy with overlapping operands have never been
officially promised, that I recall.  ANSI C explicitly says that the
implementation is allowed to assume that they do not overlap, and it
is the programmer's responsibility to assure this.

This sounds like rank-amateur code.  If he wants to set all characters
in a buffer to the same, memset() will do that efficiently and cleanly.
Even if an overlapping memcpy() works, it is not going to be particularly
fast.
-- 
TCP/IP: handling tomorrow's loads today| Henry Spencer at U of Toronto Zoology
OSI: handling yesterday's loads someday|  henry@zoo.toronto.edu   utzoo!henry

scjones@thor.UUCP (Larry Jones) (09/21/90)

In article <1990Sep19.021418.11574@maths.tcd.ie>, tim@maths.tcd.ie (Timothy Murphy) writes:
> Recently, while debugging the Unix version of unzip.c,
> I found a surprising discrepancy between 'memcpy' on various machines.
> 
> In unzip.c it is assumed that the effect of
> 	buf[0] = c;
> 	memcpy(buf+1, buf, 20);
> is to set
> 	buf[0] = buf[1] = buf[2] = ... = buf[21] = c.

This is a bug in unzip which is why almost every system required using
the version of memcpy that came with unzip instead of the one in the
standard library.  It has been fixed.  Note that unzip is inusual in
that it wants the DESCTRUCTIVE effect when the areas overlap where most
code wants the NON-DESCTRUCTIVE effect.
----
Larry Jones                         UUCP: uunet!sdrc!thor!scjones
SDRC                                      scjones@thor.UUCP
2000 Eastman Dr.                    BIX:  ltl
Milford, OH  45150-2789             AT&T: (513) 576-2070
Life's a lot more fun when you're not responsible for your actions. -- Calvin

burley@world.std.com (James C Burley) (09/21/90)

In article <13914@smoke.BRL.MIL> gwyn@smoke.BRL.MIL (Doug Gwyn) writes:

   ...  Some memcpy() implementations
   have attempted to "do the right thing" (meaning: follow the sequential
   pick-the-entire-block-up-into-a-temporary-register, then-put-the-
   temporary-register-contents-back-down-at-the-new-location model) for
   cases of overlaps, while others have not.

First off, it strikes me (THUNK!  :-) that my expectation of what memmove
would do is exactly the opposite of what is desired.  If one wants to use
a memory move to spew a given set of values (in the case discussed here,
just one) across memory, and indeed the more-than-one case is not covered
to my knowledge my memset or any other ANSI function, it seems that one does
not want to use a memory move that is "smart enough" to do an overlapping
move the "right" way.  If I have

char array[5] = [ 1, 2, 3, 4, 5 ];

and I memmove 4 bytes from array[0] to array[1], personally I would expect
the more natural result to be:

[ 1, 1, 2, 3, 4 ]

But the original posting seems to be asking for

[ 1, 1, 1, 1, 1 ]

Similarly, taking the original array and memmoving (sigh) 4 bytes from
array[1] to array[0], I'd think one should get:

[ 2, 3, 4, 5, 5 ];

But again someone else might want:

[ 5, 5, 5, 5, 5 ];

So I read the statement "memmove handles overlapping copies correctly" to
mean what is stated above in the reference posting I've included: the copy
happens AS IF the source array were copied into a temporary, then copied
into the destination.  What the original posting appears to ask for is a
copy that does the "wrong" thing.

Second, I don't think it is necessary for memmove to actually copy into a
temporary.  I think one can do an implementation similar to the following
(I've probably made a couple of mistakes, but you get the idea):

void *memmove(void *source,void *dest,size_t n)
{
void *save;

/* Let's not bother with (char *) casts, ok? (-:  Actually the copies can
   be done using casts to int * or long * if the proper modulo checking is
   done, for efficiency. */

if ((source == dest) || (n == 0))
    return source;  /* Nothing to do. */

save = source;

if ((source > dest) || (source + n <= dest))
    {  /* Normal case: source follows destination, or no overlap at all. */
    for (; n != 0; n--)
      *dest++ = *source++;
    return save;
    }

/* The source array precedes and overlaps with the destination array, so
   copy the array backwards. */

source += n;
dest += n;

for (; n != 0; n--)
  *--dest = *--source;

return save;
}

This should get the idea across, but probably contains silly mistakes besides
the omission of casts!

James Craig Burley, Software Craftsperson    burley@world.std.com

gwyn@smoke.BRL.MIL (Doug Gwyn) (09/22/90)

In article <BURLEY.90Sep21111221@world.std.com> burley@world.std.com (James C Burley) writes:
>In article <13914@smoke.BRL.MIL> gwyn@smoke.BRL.MIL (Doug Gwyn) writes:
>   ...  Some memcpy() implementations
>   have attempted to "do the right thing" (meaning: follow the sequential
>   pick-the-entire-block-up-into-a-temporary-register, then-put-the-
>   temporary-register-contents-back-down-at-the-new-location model) for
>   cases of overlaps, while others have not.
>First off, it strikes me (THUNK!  :-) that my expectation of what memmove
>would do is exactly the opposite of what is desired.  If one wants to use
>a memory move to spew a given set of values (in the case discussed here,
>just one) across memory, and indeed the more-than-one case is not covered
>to my knowledge my memset or any other ANSI function, it seems that one does
>not want to use a memory move that is "smart enough" to do an overlapping
>move the "right" way.

It's true that the original application was attempting a "smear" rather
than a "blit" operation, and my previous comments failed to address that.
Previous arguments about memcpy() have focussed on the block-move model
rather than a smearing model, perhaps because very few people think that
the latter is very sensible.  (Note that there are two different smearing
directions, and that applications for smearing are rather rare.)

To smear a single byte, as in the original application, memset() is
recommended.

>Second, I don't think it is necessary for memmove to actually copy into a
>temporary.

No, that was the block-transfer MODEL, not the implementation.

>I think one can do an implementation similar to the following
>...
>if ((source > dest) || (source + n <= dest))
* Warning * Non-portable pointer comparisons in above line.

You can get away with that in a flat virtual address space, but not in
general.  This is essentially the reason for my doubt that a portable
implementation would be possible.  Since then, I have received e-mail
that points out that a portable implementation would be possible by
using pointer equality tests within a loop over object bytes.  This
would, however, be excruciatingly slow, so it still appears that there
is no PRACTICAL portable implementation method for memmove().

tim@maths.tcd.ie (Timothy Murphy) (09/22/90)

In <18083@haddock.ima.isc.com> karl@haddock.ima.isc.com (Karl Heuer) writes:

>In article <1990Sep19.021418.11574@maths.tcd.ie> tim@maths.tcd.ie (Timothy Murphy) writes:
>>In [unix] unzip.c it is assumed that the effect of
>>	buf[0] = c;
>>	memcpy(buf+1, buf, 20);
>>is to set
>>	buf[0] = buf[1] = buf[2] = ... = buf[21] = c.

>This is incorrect.  The proper way to achieve that effect is
>	memset(buf, c, 21);

I'm afraid I oversimplified my example.
The intention in unix unzip.c was to repeat *any* seqence,
not just a single character, eg
	buf[0] = a;
	buf[1] = b;
	memcpy(buf+2, buf, 20);
is intended to set
	buf[0] = buf[2] = buf[4] = ... = a;
	buf[1] = buf[3] = buf[5] = ... = b;
Incidentally, it's been pointed out to me
that this problem has been fixed in more recent versions
of unix unzip.c .

-- 

Timothy Murphy  

e-mail: tim@maths.tcd.ie

stephen@estragon.uchicago.edu (Stephen P Spackman) (09/23/90)

In article <1990Sep19.021418.11574@maths.tcd.ie>, tim@maths.tcd.ie (Timothy Murphy) writes:
> Recently, while debugging the Unix version of unzip.c,
> I found a surprising discrepancy between 'memcpy' on various machines.
> 
> In unzip.c it is assumed that the effect of
> 	buf[0] = c;
> 	memcpy(buf+1, buf, 20);
> is to set
> 	buf[0] = buf[1] = buf[2] = ... = buf[21] = c.

[other people then comment about how this bug (in zip, not in Unix)
arises]

Actually, if you know how the compression algorithm used by Zip works,
you'll see that the "stupid" memcpy() does EXACTLY what is required.
The compression scheme itself relies on overwriting behaviour because
it works by copying forward stuff that is already "behind" the current
point in the buffer, but improves performance for CYCLIC data (of
which the bytewise uniform data a la memset() is only a special case)
by allowing the length to exceed the absolute value of the relative
source offset.

As to why this code works at all, it turns out that on most machines
the appropriate stupid implementation IS the fastest; in fact on most
of the CISC micros there's an instruction that does exactly that, and
does it very fast indeed (being an instruction, not a loop).

Furthermore, since it doesn't contain any transfers of control if it
arrives as an instruction, many compilers will inline it.

So what the situation amounts to is an assumption on the part of the
programmer that having been given the freedom to implement memcpy()
however you like in this case, that any "sane" implementor would do it
the "easy" way - which is precisely what the algorithm needs. Where
this falls down, of course, is that (a) a VERY CISCy machine may
provide memmove() semantics in microcode; and that (b) a very fast
machine (or a hand-coded routine for a RISC) might do all of its
string moves in bus-width chunks and without cache interlocks, and
produce very interesting gibberish indeed.

stephen p spackman  stephen@estragon.uchicago.edu  312.702.3982

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (09/24/90)

In article <STEPHEN.90Sep22181237@estragon.uchicago.edu>, stephen@estragon.uchicago.edu (Stephen P Spackman) writes:
> As to why this code works at all, it turns out that on most machines
> the appropriate stupid implementation IS the fastest; in fact on most
> of the CISC micros there's an instruction that does exactly that, and
> does it very fast indeed (being an instruction, not a loop).

But "some of the world _IS_ a VAX".  The VAX instructions for this
(MOVC3, MOVC5) do it the memmove() way.  On the NS32x32, there's a MOVS
instruction, but you have to set up the registers for it, and it has a
"backwards" option, so the entire overhead for a naive memmove() compared
with a naive memcpy() would be one comparison and one conditional
branch.  By the time you add the code to check for an opportunity to
move 16-bit (MOVSW) or 32-bit (MOVSD) chunks instead of 8-bit (MOVSB)
ones, the overhead of memmove() has disappeared in the noise.

> Furthermore, since it doesn't contain any transfers of control if it
> arrives as an instruction, many compilers will inline it.

Transfers of control needn't stop a compiler inlining it _anyway_.

> So what the situation amounts to is an assumption on the part of the
> programmer that having been given the freedom to implement memcpy()
> however you like in this case, that any "sane" implementor would do it
> the "easy" way

You are making quite unwarranted assumptions about what is "easy".
"however you like" means HOWEVER the implementor likes.
To quote the Goon Show:
	Man from the Geographical Society:
		"There are idiots in the world, you know."
	Neddie Seagoon:
		"Have you met them?"
	Man from the Geographical Society:
		"Met them?  I listen to you every week!"

There are a lot of tiny little dark spots in ANSI C where anyone
designing a language would have spelled out what was to be done.
These little dark spots are almost always there because there are
existing not-quite-hopelessly-broken implementations that did things
different ways.  Think of the dark spots as the warning "rattle" of
these implementations poised to bite you.

-- 
Heuer's Law:  Any feature is a bug unless it can be turned off.