tim@maths.tcd.ie (Timothy Murphy) (09/19/90)
Recently, while debugging the Unix version of unzip.c, I found a surprising discrepancy between 'memcpy' on various machines. In unzip.c it is assumed that the effect of buf[0] = c; memcpy(buf+1, buf, 20); is to set buf[0] = buf[1] = buf[2] = ... = buf[21] = c. This is indeed the effect with most compilers I tried -- rather surprisingly, to me -- but not with cc on the Sun SparcStation (sun4). Is this a bug on the Sparc, or is memcpy not fully specified? -- Timothy Murphy e-mail: tim@maths.tcd.ie
karl@haddock.ima.isc.com (Karl Heuer) (09/21/90)
In article <1990Sep19.021418.11574@maths.tcd.ie> tim@maths.tcd.ie (Timothy Murphy) writes: >In [unix] unzip.c it is assumed that the effect of > buf[0] = c; > memcpy(buf+1, buf, 20); >is to set > buf[0] = buf[1] = buf[2] = ... = buf[21] = c. This is incorrect. The proper way to achieve that effect is memset(buf, c, 21); >Is this a bug on the Sparc [which doesn't do this], >or is memcpy not fully specified? The latter. The effect of memcpy() on overlapping arrays is undefined. (There is a new function memmove() which does have predictable behavior.) Karl W. Z. Heuer (karl@kelp.ima.isc.com or ima!kelp!karl), The Walking Lint
gwyn@smoke.BRL.MIL (Doug Gwyn) (09/21/90)
In article <1990Sep19.021418.11574@maths.tcd.ie> tim@maths.tcd.ie (Timothy Murphy) writes: >Is this a bug on the Sparc, >or is memcpy not fully specified? Neither one, really. The implementation of memcpy() is not specified, but its functional behavior is. The objects fed to it as source and destination are required to not overlap. If they do overlap, what happens will depend on details of the memcpy() implementation. In fact the naive implementation would work for overlap in one direction but not for overlap in the other direction. Some memcpy() implementations have attempted to "do the right thing" (meaning: follow the sequential pick-the-entire-block-up-into-a-temporary-register, then-put-the- temporary-register-contents-back-down-at-the-new-location model) for cases of overlaps, while others have not. Sometimes the available architectural support is a factor in determining the "preferred" implementation, since the hardware may support fast block moves that don't work according to the simple model. X3J11's solution was to require conforming implementations to provide both memcpy(), which is not required to conform to the simple model, and memmove(), which IS required to conform to the simple model (and therefore is the function that should be used when overlap is possible). In many implementations, memcpy() will be implemented merely as an alternate entry point for memmove(), but in others memcpy() will have its own separate (presumably somewhat faster) implementation. Since memmove() is a recent invention, many existing C libraries do not yet provide it. I think there have been public-domain implementations, but I doubt that a fully portable one would be possible due to the lack of a portable way to determine whether or not the two objects overlap.
blodgett@apollo.HP.COM (Bruce Blodgett) (09/21/90)
In article <1990Sep19.021418.11574@maths.tcd.ie>, tim@maths.tcd.ie (Timothy Murphy) wrote: > In unzip.c it is assumed that the effect of > buf[0] = c; > memcpy(buf+1, buf, 20); > is to set > buf[0] = buf[1] = buf[2] = ... = buf[21] = c. ... > Is this a bug on the Sparc, > or is memcpy not fully specified? According to the ANS C standard, section 4.11.2.1, "If copying takes place between objects that overlap, the behavior is undefined." Thus whatever a C compiler wants to do is "correct" in this case. By the way, why don't you use the much safer standard function: memset(buf, c, 21); which sets buf[0] = buf[1] = buf[2] = ... = buf[20] = c. If you want buf[21] == c (which is NOT what your original code does), make the 3rd argument 22 rather than 21. Bruce Blodgett blodgett@apollo.hp.com (508) 256-0176 x4037
cpcahil@virtech.uucp (Conor P. Cahill) (09/21/90)
In article <1990Sep19.021418.11574@maths.tcd.ie> tim@maths.tcd.ie (Timothy Murphy) writes: >In unzip.c it is assumed that the effect of > buf[0] = c; > memcpy(buf+1, buf, 20); >is to set > buf[0] = buf[1] = buf[2] = ... = buf[21] = c. The effect of memcpy on overlapping memory areas is undefined. unzip.c should not depend upon any particular behavior. >Is this a bug on the Sparc, No. it is a bug in unzip. -- Conor P. Cahill (703)430-9247 Virtual Technologies, Inc., uunet!virtech!cpcahil 46030 Manekin Plaza, Suite 160 Sterling, VA 22170
henry@zoo.toronto.edu (Henry Spencer) (09/21/90)
In article <1990Sep19.021418.11574@maths.tcd.ie> tim@maths.tcd.ie (Timothy Murphy) writes: >In unzip.c it is assumed that the effect of > buf[0] = c; > memcpy(buf+1, buf, 20); >is to set > buf[0] = buf[1] = buf[2] = ... = buf[21] = c. The effects of invoking memcpy with overlapping operands have never been officially promised, that I recall. ANSI C explicitly says that the implementation is allowed to assume that they do not overlap, and it is the programmer's responsibility to assure this. This sounds like rank-amateur code. If he wants to set all characters in a buffer to the same, memset() will do that efficiently and cleanly. Even if an overlapping memcpy() works, it is not going to be particularly fast. -- TCP/IP: handling tomorrow's loads today| Henry Spencer at U of Toronto Zoology OSI: handling yesterday's loads someday| henry@zoo.toronto.edu utzoo!henry
scjones@thor.UUCP (Larry Jones) (09/21/90)
In article <1990Sep19.021418.11574@maths.tcd.ie>, tim@maths.tcd.ie (Timothy Murphy) writes: > Recently, while debugging the Unix version of unzip.c, > I found a surprising discrepancy between 'memcpy' on various machines. > > In unzip.c it is assumed that the effect of > buf[0] = c; > memcpy(buf+1, buf, 20); > is to set > buf[0] = buf[1] = buf[2] = ... = buf[21] = c. This is a bug in unzip which is why almost every system required using the version of memcpy that came with unzip instead of the one in the standard library. It has been fixed. Note that unzip is inusual in that it wants the DESCTRUCTIVE effect when the areas overlap where most code wants the NON-DESCTRUCTIVE effect. ---- Larry Jones UUCP: uunet!sdrc!thor!scjones SDRC scjones@thor.UUCP 2000 Eastman Dr. BIX: ltl Milford, OH 45150-2789 AT&T: (513) 576-2070 Life's a lot more fun when you're not responsible for your actions. -- Calvin
burley@world.std.com (James C Burley) (09/21/90)
In article <13914@smoke.BRL.MIL> gwyn@smoke.BRL.MIL (Doug Gwyn) writes: ... Some memcpy() implementations have attempted to "do the right thing" (meaning: follow the sequential pick-the-entire-block-up-into-a-temporary-register, then-put-the- temporary-register-contents-back-down-at-the-new-location model) for cases of overlaps, while others have not. First off, it strikes me (THUNK! :-) that my expectation of what memmove would do is exactly the opposite of what is desired. If one wants to use a memory move to spew a given set of values (in the case discussed here, just one) across memory, and indeed the more-than-one case is not covered to my knowledge my memset or any other ANSI function, it seems that one does not want to use a memory move that is "smart enough" to do an overlapping move the "right" way. If I have char array[5] = [ 1, 2, 3, 4, 5 ]; and I memmove 4 bytes from array[0] to array[1], personally I would expect the more natural result to be: [ 1, 1, 2, 3, 4 ] But the original posting seems to be asking for [ 1, 1, 1, 1, 1 ] Similarly, taking the original array and memmoving (sigh) 4 bytes from array[1] to array[0], I'd think one should get: [ 2, 3, 4, 5, 5 ]; But again someone else might want: [ 5, 5, 5, 5, 5 ]; So I read the statement "memmove handles overlapping copies correctly" to mean what is stated above in the reference posting I've included: the copy happens AS IF the source array were copied into a temporary, then copied into the destination. What the original posting appears to ask for is a copy that does the "wrong" thing. Second, I don't think it is necessary for memmove to actually copy into a temporary. I think one can do an implementation similar to the following (I've probably made a couple of mistakes, but you get the idea): void *memmove(void *source,void *dest,size_t n) { void *save; /* Let's not bother with (char *) casts, ok? (-: Actually the copies can be done using casts to int * or long * if the proper modulo checking is done, for efficiency. */ if ((source == dest) || (n == 0)) return source; /* Nothing to do. */ save = source; if ((source > dest) || (source + n <= dest)) { /* Normal case: source follows destination, or no overlap at all. */ for (; n != 0; n--) *dest++ = *source++; return save; } /* The source array precedes and overlaps with the destination array, so copy the array backwards. */ source += n; dest += n; for (; n != 0; n--) *--dest = *--source; return save; } This should get the idea across, but probably contains silly mistakes besides the omission of casts! James Craig Burley, Software Craftsperson burley@world.std.com
gwyn@smoke.BRL.MIL (Doug Gwyn) (09/22/90)
In article <BURLEY.90Sep21111221@world.std.com> burley@world.std.com (James C Burley) writes: >In article <13914@smoke.BRL.MIL> gwyn@smoke.BRL.MIL (Doug Gwyn) writes: > ... Some memcpy() implementations > have attempted to "do the right thing" (meaning: follow the sequential > pick-the-entire-block-up-into-a-temporary-register, then-put-the- > temporary-register-contents-back-down-at-the-new-location model) for > cases of overlaps, while others have not. >First off, it strikes me (THUNK! :-) that my expectation of what memmove >would do is exactly the opposite of what is desired. If one wants to use >a memory move to spew a given set of values (in the case discussed here, >just one) across memory, and indeed the more-than-one case is not covered >to my knowledge my memset or any other ANSI function, it seems that one does >not want to use a memory move that is "smart enough" to do an overlapping >move the "right" way. It's true that the original application was attempting a "smear" rather than a "blit" operation, and my previous comments failed to address that. Previous arguments about memcpy() have focussed on the block-move model rather than a smearing model, perhaps because very few people think that the latter is very sensible. (Note that there are two different smearing directions, and that applications for smearing are rather rare.) To smear a single byte, as in the original application, memset() is recommended. >Second, I don't think it is necessary for memmove to actually copy into a >temporary. No, that was the block-transfer MODEL, not the implementation. >I think one can do an implementation similar to the following >... >if ((source > dest) || (source + n <= dest)) * Warning * Non-portable pointer comparisons in above line. You can get away with that in a flat virtual address space, but not in general. This is essentially the reason for my doubt that a portable implementation would be possible. Since then, I have received e-mail that points out that a portable implementation would be possible by using pointer equality tests within a loop over object bytes. This would, however, be excruciatingly slow, so it still appears that there is no PRACTICAL portable implementation method for memmove().
tim@maths.tcd.ie (Timothy Murphy) (09/22/90)
In <18083@haddock.ima.isc.com> karl@haddock.ima.isc.com (Karl Heuer) writes: >In article <1990Sep19.021418.11574@maths.tcd.ie> tim@maths.tcd.ie (Timothy Murphy) writes: >>In [unix] unzip.c it is assumed that the effect of >> buf[0] = c; >> memcpy(buf+1, buf, 20); >>is to set >> buf[0] = buf[1] = buf[2] = ... = buf[21] = c. >This is incorrect. The proper way to achieve that effect is > memset(buf, c, 21); I'm afraid I oversimplified my example. The intention in unix unzip.c was to repeat *any* seqence, not just a single character, eg buf[0] = a; buf[1] = b; memcpy(buf+2, buf, 20); is intended to set buf[0] = buf[2] = buf[4] = ... = a; buf[1] = buf[3] = buf[5] = ... = b; Incidentally, it's been pointed out to me that this problem has been fixed in more recent versions of unix unzip.c . -- Timothy Murphy e-mail: tim@maths.tcd.ie
stephen@estragon.uchicago.edu (Stephen P Spackman) (09/23/90)
In article <1990Sep19.021418.11574@maths.tcd.ie>, tim@maths.tcd.ie (Timothy Murphy) writes: > Recently, while debugging the Unix version of unzip.c, > I found a surprising discrepancy between 'memcpy' on various machines. > > In unzip.c it is assumed that the effect of > buf[0] = c; > memcpy(buf+1, buf, 20); > is to set > buf[0] = buf[1] = buf[2] = ... = buf[21] = c. [other people then comment about how this bug (in zip, not in Unix) arises] Actually, if you know how the compression algorithm used by Zip works, you'll see that the "stupid" memcpy() does EXACTLY what is required. The compression scheme itself relies on overwriting behaviour because it works by copying forward stuff that is already "behind" the current point in the buffer, but improves performance for CYCLIC data (of which the bytewise uniform data a la memset() is only a special case) by allowing the length to exceed the absolute value of the relative source offset. As to why this code works at all, it turns out that on most machines the appropriate stupid implementation IS the fastest; in fact on most of the CISC micros there's an instruction that does exactly that, and does it very fast indeed (being an instruction, not a loop). Furthermore, since it doesn't contain any transfers of control if it arrives as an instruction, many compilers will inline it. So what the situation amounts to is an assumption on the part of the programmer that having been given the freedom to implement memcpy() however you like in this case, that any "sane" implementor would do it the "easy" way - which is precisely what the algorithm needs. Where this falls down, of course, is that (a) a VERY CISCy machine may provide memmove() semantics in microcode; and that (b) a very fast machine (or a hand-coded routine for a RISC) might do all of its string moves in bus-width chunks and without cache interlocks, and produce very interesting gibberish indeed. stephen p spackman stephen@estragon.uchicago.edu 312.702.3982
ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (09/24/90)
In article <STEPHEN.90Sep22181237@estragon.uchicago.edu>, stephen@estragon.uchicago.edu (Stephen P Spackman) writes: > As to why this code works at all, it turns out that on most machines > the appropriate stupid implementation IS the fastest; in fact on most > of the CISC micros there's an instruction that does exactly that, and > does it very fast indeed (being an instruction, not a loop). But "some of the world _IS_ a VAX". The VAX instructions for this (MOVC3, MOVC5) do it the memmove() way. On the NS32x32, there's a MOVS instruction, but you have to set up the registers for it, and it has a "backwards" option, so the entire overhead for a naive memmove() compared with a naive memcpy() would be one comparison and one conditional branch. By the time you add the code to check for an opportunity to move 16-bit (MOVSW) or 32-bit (MOVSD) chunks instead of 8-bit (MOVSB) ones, the overhead of memmove() has disappeared in the noise. > Furthermore, since it doesn't contain any transfers of control if it > arrives as an instruction, many compilers will inline it. Transfers of control needn't stop a compiler inlining it _anyway_. > So what the situation amounts to is an assumption on the part of the > programmer that having been given the freedom to implement memcpy() > however you like in this case, that any "sane" implementor would do it > the "easy" way You are making quite unwarranted assumptions about what is "easy". "however you like" means HOWEVER the implementor likes. To quote the Goon Show: Man from the Geographical Society: "There are idiots in the world, you know." Neddie Seagoon: "Have you met them?" Man from the Geographical Society: "Met them? I listen to you every week!" There are a lot of tiny little dark spots in ANSI C where anyone designing a language would have spelled out what was to be done. These little dark spots are almost always there because there are existing not-quite-hopelessly-broken implementations that did things different ways. Think of the dark spots as the warning "rattle" of these implementations poised to bite you. -- Heuer's Law: Any feature is a bug unless it can be turned off.