gnu@hoptoad.uucp (John Gilmore) (03/27/88)
In article <10731@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: > (void) strcpy(buf, buf + n); gwyn@brl-smoke.ARPA (Doug Gwyn) wrote: > This usage was never a good idea, because a valid implementation of > strcpy() would be to copy right-to-left rather than left-to-right > through the source string... I have seen plenty of constructs in traditional Unix and other C code that assume strcpy() can slide a string down over itself. While we are picking nits with the wording of various Unix man pages and standards, let me point out that none of them makes it perfectly clear that no bytes past the NUL are modified. If you can assume that "it copies the NUL and then stops" doesn't indicate that the NUL is copied last, as several posters have done, you might as well assume that it copies three or four more bytes beyond the NUL and then stops, too. It seems to make exactly as much sense to me, that is, no sense at all. I propose that strcpy, strncpy, strcat, and strncat be defined to perform either: * left-to-right or * non-destructively copying in the case of overlap at the implementor's choice (each function can choose independently). I think effectively 100% of the applications and 100% of the implementations will require no change with this rule. Simple implementations will just do left-to-right, while more complicated implementations like on the 29000 or MIPS can do fancy stuff 4 bytes at a time, or even copy right-to-left, as long as they avoid destructive copying. Today's fancy implementations should already be checking for overlap, since so much existing code depends on it. -- {pyramid,ptsfa,amdahl,sun,ihnp4}!hoptoad!gnu gnu@toad.com "Watch me change my world..." -- Liquid Theatre
david@dhw68k.cts.com (David H. Wolfskill) (03/29/88)
As many have pointed out, there is an expectation that strcpy() will copy characters from left to right, terminating the copy when the terminating NUL is copied. Recalling that the dpANS specifies "... the behavior of an abstract machine in which the issues of optimization are irrelevant," it would seem to make some sense to modify the specification to be similar to the above. The current dpANS also specifies "If copying takes place between objects that overlap, the behavior is undefined." I would feel rather more comfortable with changing that to read "... implementation defined." This may arguably be a "quality of implementation" issue; I prefer to think of it as a "quality of standard" issue. (Oh: I do know of one machine in which fields in main storage are addressed on the right -- for almost all instructions -- but I don't know of a C compiler for it, and I consider its architecture (for this reason, as well as others) to be sufficiently pathological that it's not worth considering important. I just wish my employer hadn't purchased so many of the brain-damaged things!) david -- David H. Wolfskill uucp: ...{trwrb,hplabs}!felix!dhw68k!david InterNet: david@dhw68k.cts.com
nevin1@ihlpf.ATT.COM (00704a-Liber) (03/31/88)
In article <6286@dhw68k.cts.com> david@dhw68k.cts.com (David H. Wolfskill) writes: >The current dpANS also specifies "If copying takes place between objects >that overlap, the behavior is undefined." I would feel rather more >comfortable with changing that to read "... implementation defined." I would not! This would imply that a program which calls strcpy() with overlapping strings is 'correct', and this is simply not true. Remember, implementation-defined behavior means (quoted from the draft section 1.6--Definitions of Terms): "behavior, for a correct program construct and correct data, that depends on the characteristics of the implementation and that each implementation shall document." If you have overlapping strings you have incorrect data. If this were to change (something which I am against), all programs that use strcpy() would be suspect every time a new version of the compiler comes out (especially since many compilers use inline assembly instead of doing a function call for strcpy()). This is not something which should depend on the implementation. -- _ __ NEVIN J. LIBER ..!ihnp4!ihlpf!nevin1 (312) 510-6194 ' ) ) "The secret compartment of my ring I fill / / _ , __o ____ with an Underdog super-energy pill." / (_</_\/ <__/ / <_ These are solely MY opinions, not AT&T's, blah blah blah
karl@haddock.ISC.COM (Karl Heuer) (04/01/88)
In article <4215@ihlpf.ATT.COM> nevin1@ihlpf.UUCP (00704a-Liber,N.J.) writes: >In article <6286@dhw68k.cts.com> david@dhw68k.cts.com (David H. Wolfskill) writes: >>The current dpANS also specifies "If copying takes place between objects >>that overlap, the behavior is undefined." I would feel rather more >>comfortable with changing that to read "... implementation defined." > >I would not! This would imply that a program which calls strcpy() with >overlapping strings is 'correct', and this is simply not true. But it would be true, if the standard were to explicitly allow it. >If this were to change, all programs that use strcpy() would be suspect every >time a new version of the compiler comes out Only those programs that use strcpy on overlapping strings. And if the "implementation-defined" part is properly phrased, strcpy(s,s+1) would be guaranteed to be safe. Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
nevin1@ihlpf.ATT.COM (00704a-Liber) (04/02/88)
In article <3267@haddock.ISC.COM> karl@haddock.ima.isc.com (Karl Heuer) writes: >In article <4215@ihlpf.ATT.COM> nevin1@ihlpf.UUCP (00704a-Liber,N.J.) writes: >>If this were to change, all programs that use strcpy() would be suspect every >>time a new version of the compiler comes out > >Only those programs that use strcpy on overlapping strings. And if the >"implementation-defined" part is properly phrased, strcpy(s,s+1) would be >guaranteed to be safe. First off, just by looking at a program how can I tell whether or not it uses overlapping strings (under your proposal)?? There is no way for me to tell the difference between a program that is using strcpy() in an implementation-DEPENDENT way and a program which can portably use strcpy() (at least not by just looking at it). From a maintenance point of view, this is very undesirable!! Secondly, I do not like the change that would have to be made to the prototype for strcpy. The prototype would change from: char *strcpy(noalias char *s1, const noalias char *s2) to char *strcpy(char *s1, char *s2) since, as you pointed out, both s1 and s2 are possibly aliased and the string pointed to by s2 is no longer guaranteed to be constant (see below). char *foo, *bar; ... /*assume that foo points to string "stuff" in read/write memory*/ bar = foo + 1; strcpy(foo, bar); Under your proposal, this would *legally* change the value of what bar points to (unless you are going to put in some wording about only being able to copy the right half or less of overlapping strings, but this wording is VERY messy)!! I'm sorry, but I like knowing that the source string should not be changed by strcpy() in a conforming program!! In article <3266@haddock.ISC.COM> karl@haddock.ima.isc.com (Karl Heuer) writes: >I don't see that such a compiler would have to depend on the implementation; >just on the functional specification (which has now been standardized). MAKE UP YOUR MIND!! You either want to have programs which are dependent on the implementation of the libraries or you don't. I don't really care which of these two views that you take, JUST BE CONSISTENT!! -- _ __ NEVIN J. LIBER ..!ihnp4!ihlpf!nevin1 (312) 510-6194 ' ) ) "The secret compartment of my ring I fill / / _ , __o ____ with an Underdog super-energy pill." / (_</_\/ <__/ / <_ These are solely MY opinions, not AT&T's, blah blah blah
john@frog.UUCP (John Woods, Software) (04/02/88)
In article <3267@haddock.ISC.COM>, karl@haddock.ISC.COM (Karl Heuer) writes: >In article <4215@ihlpf.ATT.COM> nevin1@ihlpf.UUCP (00704a-Liber,N.J.) writes: >>In article <6286@dhw68k.cts.com> david@dhw68k.cts.com (David H. Wolfskill) writes: >>>The current dpANS also specifies "If copying takes place between objects >>>that overlap, the behavior is undefined." I would feel rather more >>>comfortable with changing that to read "... implementation defined." >>I would not! This would imply that a program which calls strcpy() with >>overlapping strings is 'correct', and this is simply not true. > But it would be true, if the standard were to explicitly allow it....if the > "implementation-defined" part is properly phrased, strcpy(s,s+1) would be > guaranteed to be safe. Still no. The problem with "implementation-defined" is that there are no constraints upon what the implementation may define the behavior to be. If you port your program to an implementation where, in 3-point italic type in a margin somewhere, they mention that strcpy(s,s+1) causes the CPU chip to be launched upward with a velocity of 16 km/s, they will be _right._ From the August 3, 1987 draft (and I assume this hasn't changed): "1.7 COMPLIANCE A _strictly conforming program_ shall use only those features of the language and library specified in this standard. It shall not produce output dependent on any unspecified, undefined, or ---> implementation-defined <--- behavior..." ( ---> Emphasis <--- added). If you know that your implementation does what you want with strcpy(s,s+1), then your are free to use it. Your program won't be "strictly conforming", but you may not care about that. Just don't complain when you hear that "chuffBANG!" of the CPU chip being launched when you buy that shiny new Mark IV Datablaster... -- John Woods, Charles River Data Systems, Framingham MA, (617) 626-1101 ...!decvax!frog!john, ...!mit-eddie!jfw, jfw@eddie.mit.edu FUN: THE FINAL FRONTIER Zippy the Pinhead in '88!
jv0l+@andrew.cmu.edu (Justin Chris Vallon) (04/03/88)
>"... implementation defined."
Means NON-PORTABLE! If implementation X does it one way [ie strcpy(s, s+1)
works], and implementation Y does it another way [ie strcpy(s, s+1) does not
work], my program will behave very differently on different systems, but both
strcpy() functions adhere to the ANSI specs.
I cannot expect an ANSI standard which isn't a standard. Isn't non-portable
code something that ANSI is trying to prevent, not endorse?
-Justin
henry@utzoo.uucp (Henry Spencer) (04/03/88)
> ... The prototype would change ... to > char *strcpy(char *s1, char *s2) > since, as you pointed out, both s1 and s2 are possibly aliased and the > string pointed to by s2 is no longer guaranteed to be constant... The latter is quite irrelevant; const on a pointer does not mean that the thing pointed to is constant, just that attempts to modify it through that pointer are illegal. (If this double meaning of const strikes you as less than ideal, you're in good company.) -- "Noalias must go. This is | Henry Spencer @ U of Toronto Zoology non-negotiable." --DMR | {allegra,ihnp4,decvax,utai}!utzoo!henry
david@dhw68k.cts.com (David H. Wolfskill) (04/03/88)
In article <4215@ihlpf.ATT.COM> nevin1@ihlpf.UUCP (00704a-Liber,N.J.) writes: >In article <6286@dhw68k.cts.com> I wrote: >>The current dpANS also specifies "If copying takes place between objects >>that overlap, the behavior is undefined." I would feel rather more >>comfortable with changing that to read "... implementation defined." >I would not! This would imply that a program which calls strcpy() with >overlapping strings is 'correct', and this is simply not true. [He then quotes the definition of "implementation-defined," as used in the dpANS.] >If you have overlapping strings you have incorrect data. Well, thank you for your opinion; however, I respectfully disagree. Given an order in which the copying shall be done, the operation of copying data from one string to another (when the two strings have a known degree of overlap) can be a well-defined one. It is quite possible -- and to me, reasonable -- to define an algorithm in such a way that it uses the implementation-defined behavior of such an operation. Suppose, for example, that a given implementation defines that such a copy would be done from the beginning of the source string to its terminating NUL, character by character. Then (assuming suitable definitions of the variables in question), an algorithm to clear a given string (str1) to a given value (other than NUL) could be coded: *str1 = ch; for (c1 = str1; *++c1 != '\0'; *c1 = *(c1 -1)); or (remembering the characteristics of the implementation): *str1 = ch; strcpy(str1+1, str1) but I think the latter is easier to comprehend. I have used the technique -- although in assembler, rather than C -- and am quite willing to grant that its effects are properly defined by the characteristics of the implementation. >If this were to change (something which I am against), all programs that >use strcpy() would be suspect every time a new version of the compiler >comes out (especially since many compilers use inline assembly instead of >doing a function call for strcpy()). This is not something which should >depend on the implementation. Hmmmm.... It is my understanding that if the behavior were "implementation-defined," at least the vendor would be under an obligation to warn you of any change in the implementation's behavior when faced with such a construct; whether or not you chose to do anything about it is (of course) another issue altogether. On the other hand, if the behavior is "undefined," the vendor would be under no obligation to indicate in any way any changes in the implementation's behavior when faced with such a construct. It is not clear to me that you (or anyone else) would be well-served by such a position. That is really the main point of my earlier posting. Of course, it would only be an issue for you to the extent that you need to work with (or in spite of!) these constructs that you seem disinclined to use anyway. (Also, if you are sufficiently fortunate to use a compiler that has a mode in which it flags all constructs whose behavior is "implementation-defined," you can have that much more warning about such concerns.) Onward.... david -- David H. Wolfskill uucp: ...{trwrb,hplabs}!felix!dhw68k!david InterNet: david@dhw68k.cts.com
doug@feedme.UUCP (Doug Salot) (04/03/88)
There's seems to be a point here with which both posters' agree, but I find absurd. For background: nevin says: > >If this were to change (something which I am against), all programs that > >use strcpy() would be suspect every time a new version of the compiler > >comes out (especially since many compilers use inline assembly instead of > >doing a function call for strcpy()). This is not something which should > >depend on the implementation. and david says: > Of course, it would only be an issue for you to the extent that you need > to work with (or in spite of!) these constructs that you seem > disinclined to use anyway. (Also, if you are sufficiently fortunate to > use a compiler that has a mode in which it flags all constructs whose > behavior is "implementation-defined," you can have that much more > warning about such concerns.) Both of these passages seem to imply that C compilers "know" about the semantics of certain (all?) function calls. While someone earlier pointed out that it is possible to design a language in which some semantics can be described, C does not have this facility and seems to be philosphically antagonistic to such a facility. I would indeed be surprised if a C compiler produced inline code for strcpy (unless you are talking about a macro, in which case the behavior of the code should be clear from reading the define), and the idea of compile-time warnings about function behavior seems equally out of place (maybe link-time would be appropriate). As long as I'm here, I must say that I disagree with david. If the behavior of a function is *undefined* rather than *implementation defined* for singular cases, one would be inclined not to use the function for the singular cases, thereby insuring (used loosely) portability. - Doug -- Doug Salot || doug@feedme.UUCP || {trwrb,hplabs}!felix!dhw68k!feedme!doug Feedme Microsystems:Inventors of the Snarf->Grok->Munge Development Cycle
nevin1@ihlpf.ATT.COM (00704a-Liber) (04/07/88)
In article <6476@dhw68k.cts.com> david@dhw68k.cts.com (David H. Wolfskill) writes: >In article <4215@ihlpf.ATT.COM> nevin1@ihlpf.UUCP (00704a-Liber,N.J.) writes: >>In article <6286@dhw68k.cts.com> David Wolfskill wrote: >>>The current dpANS also specifies "If copying takes place between objects >>>that overlap, the behavior is undefined." I would feel rather more >>>comfortable with changing that to read "... implementation defined." > >>If you have overlapping strings you have incorrect data. > >Well, thank you for your opinion; however, I respectfully disagree. >Given an order in which the copying shall be done, the operation of >copying data from one string to another (when the two strings have a >known degree of overlap) can be a well-defined one. Oh, so you want the copying of two strings to be WELL-DEFINED, not implementation-defined or undefined. Why did you beat around the bush for so long?? I do agree that if you know the algorithm, all the side effects are well-defined. I just do not agree that you, the non-kernel applications programmer, should have to write code that is dependent on the *implementation* of a system call. This only leads to nightmares for code maintenance people (which is part of my job). For example: some of the people right now who are arguing for strcpy() to be *defined* as left-to-right string copy are bringing up the point that code currently being used is dependent on this implementation of strcpy(). They are claiming that it is hard to maintain since it is implementation-dependent. We should be going away from code like this, not towards it. This is one of the reasons I like C++, because it forces programmers to code without knowing the implementations of their objects/classes. If the implementation of a class is changed, the rest of the code doesn't break. >It is quite possible -- and to me, reasonable -- to define an algorithm >in such a way that it uses the implementation-defined behavior of such >an operation. You are right in one sense: it is quite possible to define an algorithm in such a way that it uses the *side effects* (aka, implementation-defined behavior) of such an operation. If you are writing code like this then you are becoming very dependent on your particular version/implementation of C. Good luck in three years, when your implementation is outdated! -- _ __ NEVIN J. LIBER ..!ihnp4!ihlpf!nevin1 (312) 510-6194 ' ) ) "The secret compartment of my ring I fill / / _ , __o ____ with an Underdog super-energy pill." / (_</_\/ <__/ / <_ These are solely MY opinions, not AT&T's, blah blah blah
chris@mimsy.UUCP (Chris Torek) (04/07/88)
In article <4309@ihlpf.ATT.COM> nevin1@ihlpf.ATT.COM (00704a-Liber) writes: >Oh, so you want the copying of two [overlapping] strings to be >WELL-DEFINED, not implementation-defined or undefined. Right. >I just do not agree that you, the non-kernel applications programmer, >should have to write code that is dependent on the *implementation* >of a system call. (Aside: strcpy is not a system call, it is a library routine.) It is not dependent upon the implementation. It is dependent upon the specification. The specification for strcpy was that it copies string `src' to string `dst' such that strcpy(s+n, s) moves `s+n' `down' n characters, while strcpy(s, s+n) `duplicates' characters from s+1 through s+strlen(s). That may not be what *you* read in the specification, but it *is* what *others* read in it. Perhaps the specification was sloppy. You have probably seen sloppy specifications before. The usual answer is to tighten the spec, and if the tightened spec invalidates a few routines, so it goes; but if, on the other hand, the tightened spec breaks hundreds of working programs, the design team might instead change the spec to explicitly grant those features/bugs that everyone else interpreted it to grant. If the semantics for strcpy() specified the action produced by copying overlapping strings, code that copied overlapping strings would not be dependent upon the implementation after all, would they? The claim is simply that the description in string(3) (the `specification') did specify this, at least to enough people that perhaps it would be best not to make it ill-defined. (Begin another aside) >... one of the reasons I like C++ ... it forces programmers to code >without knowing the implementations of their objects/classes. If >the implementation of a class is changed, the rest of the code doesn't >break. Want to bet? I can[*] write code that depends on all sorts of things that may not be true in the future, even if I do not know for certain that they are true now. Not knowing (or caring) about the implementation of a subclass just (1) discourages such dependencies and (2) tends to make them more obvious, and hence easier to squash. It does not prevent them. ----- *I try not to. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
quiroz@cs.rochester.edu (Cesar Quiroz) (04/08/88)
Sender:
Followup-To:
This article was suggested by reading <4309@ihlpf.ATT.COM>, by
(nevin1@ihlpf.UUCP (00704a-Liber,N.J.)). It is not a direct
response, though, but rather a side note prompted because one of his
arguments doesn't carry much force, but has been repeated many times
before. Liber (and others before and, no doubt, after) says:
:... [I]t is quite possible to define an algorithm in such a way
:that it uses the *side effects* (aka, implementation-defined
:behavior) of such an operation. If you are writing code like this
:then you are becoming very dependent on your particular
:version/implementation of C.
:
:Good luck in three years, when your implementation is outdated!
Before we get too excited about the purity of functional bahaviors,
let's remember that strcpy is used (in the overwheming majority of
cases) *to perform side-effects*. It is perfectly legitimate to
want to clarify exactly *what* side-effects are guaranteed. So,
when someone asks that the standard guarantee a certain order of
copying, it is a perfectly sensible thing to discuss whether that
side-effect is useful and reasonable to demand.
It is not as if you called a function from the math library and then
depended on the way your implementation leaves trash behind in your
fp registers...
If you are truly interested in defining an ADT String whose
implementation can be hidden totally, consider something along these
lines: (UNTESTED, CATCH THE IDEA, NOT THE CODE)
typedef char *string;
...
string
fstrcpy (old) /* Functional STRing CoPY */
string old;
{
return strcpy((string)malloc((unsigned)strlen(old)+1), old);
}
So you could use the side-effecting string package to write
a library that does not need to guarantee any side-effects, but can
be considerably more costly.
--
Cesar Augusto Quiroz Gonzalez
Department of Computer Science ...allegra!rochester!quiroz
University of Rochester or
Rochester, NY 14627 quiroz@cs.rochester.edu
djones@megatest.UUCP (Dave Jones) (04/08/88)
I said I was not going to say any more about the strcpy() thing. But I would like to make a few comments about *programming* *philosophy*. I DON'T LIKE programming philosophies. But I guess I have to admit that I may have one. Just a little one, mind you. (BTW. Unless otherwise indicated, my postings contain implicit smilies on every line. :-) :-) :-) ) in article <4309@ihlpf.ATT.COM>, nevin1@ihlpf.ATT.COM (00704a-Liber) says: > ... > ... some of the people right now who are arguing for strcpy() to > be *defined* as left-to-right string copy are bringing up the point that > code currently being used is dependent on this implementation of strcpy(). Correct! Do anything reasonable to prevent breaking code. Even code which you consider to be "bad". As a systems engineer, my job is to keep 'em flyin'. Nothing is more important than that. When the programs fail catastrophicly, the customers don't care that the failure is caused by a morally correct change of semantics. Here's a case history: Recently we had to retract a major software release from the field. The problem was that some ten year old Berkeley code used a statement similar to the following to skip zero or more leading white-space characters: if (scanf("%[ \t\n]") == 0) { report_error(); } Somebody, probably at the company that makes our new workstaions, had decided that the string format should have to match at least one character in order to succeed. They duly documented said behavior in the man pages. We do extensive QA, but somehow that statement didn't get executed in the QA suites. BOOM! Now, whether or not scanf %[xyz] has to match at least one character is, taken by itself, just as silly a consideration as whether strcpy() should scan from left to right. Far too silly to have caused such an expensive incident. The semantics of scanf should never have been changed. And there was no need to change them. If you really just HAVE to have a new scanf, give it a new name. It's easy to make up new names for functions which are similar to old ones. I once told a fellow programmer, "As long as we can make up new names, we can never be defeated." I felt real profound about that one. > They are claiming that it is hard to maintain since it is > implementation-dependent. We should be going away from code like this, not > towards it. Again I agree. Completely. You're right on the mark. > This is one of the reasons I like C++, because it forces > programmers to code without knowing the implementations of their > objects/classes. If the implementation of a class is changed, the rest of > the code doesn't break. > But, now you're beginning to loose the thread. I also like C++, but the notion that the exhaulted gurus can invent programming languages to "force" the ignorant masses to produce good code has been discredited again and again. (Don't forget the implicit smilies. :-) ) > ... If you are writing code like this then you > are becoming very dependent on your particular version/implementation of C. > I'm not writing the stuff. There's plenty of it is already written. > > Good luck in three years, when your implementation is outdated! > Hey!! Like, DON'T OUTDATE MY IMPLEMENTATION, dude! Did I detect just a hint of an anticipated "told-you-so"? "The transgressors will be punished!" "Infamy to the implementation- dependent rascals!" No? I guess it's just me. I'll respond anyway. Just pretend for a second that that's what you meant. In the first place, people who write shaky code are often nice people. They don't mean to. They could use some help. I have no desire to punish them. They don't need us to pull the rug out from under their code. In the second place, it is not only the vile perpetrators who suffer when code breaks. Often they are long since gone. But if they are not, then you're still on the same team! A while back I had to fix some code that my supervisor's supervisor wrote about five years ago. He's never claimed to be a great programmer. I thought his comment was very funny. His pronouncement was, "My past has come back to haunt you."
ok@quintus.UUCP (Richard A. O'Keefe) (04/08/88)
In article <17@feedme.UUCP>, doug@feedme.UUCP (Doug Salot) writes: > Both of these passages seem to imply that C compilers "know" about > the semantics of certain (all?) function calls. While someone earlier > pointed out that it is possible to design a language in which some semantics > can be described, C does not have this facility and seems to be > philosphically antagonistic to such a facility. I would indeed be > surprised if a C compiler produced inline code for strcpy ... strcpy() is defined as part of the dpANS. X3J11 has gone to a great deal of trouble to ensure that a useful chunk of the C library will be present. One way of looking at it is to think of things like strcpy() as built-in operations which merely happen to use functional notation. [This isn't 100% accurate; but it is the basic idea.] So an ANSI C compiler *will* be entitled to "know" about the semantics of "functions" which are defined in the standard, just as a Pascal compiler is entitled to "know" about sqrt(). Indeed, it simply is not possible for a user to define a function called strcpy() in a standard- conforming way, because names beginning with "str" are reserved. The argument is over what strcpy(dst, src) should mean when src <= dst <= src+strlen(src)+1. There are two basic positions: (A) Strict left-to-right copy is easy to understand and is often useful. It is also how many C books (including K&R) have explained the operation, so many C programmers expect this behaviour. (B) The implementor should be given as much freedom as possible in order to make this operation supremely fast, and if this means leaving the operation as unspecified as possible, sobeit. Note that if you have been using a system where the implementor had already taken attitude (B), and you move a working program to a system whose implementor took attitude (A), your code will continue to work, but if you move code in the other direction your code is likely to break. Requiring strict left-to-right copying in the standard will therefore improve future portability and maintainability, at the expense of prohibiting certain machine-specific optimisations in this particular operation. May I respectfully suggest that the emphasis on the implementor's freedom to optimise strcpy() may, just possibly, be misguided? I have done text manipulation in a wide variety of programming languages, and have found C to be easily the nicest of them, because user-defined operations (compare this padded string to this NUL-terminated string, for example) are not an order of magnitude slower than built-ins (such as strcmp()). If compiler-writers have N man-months to spend improving their compiler, I would much rather they spent them on optimisations that will affect code that I write rather than builtins which seldom do exactly what I want. {If they are going to optimise a builtin, let it be sprintf(), please.} It is especially misguided to spend those N man-months on optimising an operation which, in order to permit such optimisation, has been left so vaguely defined that I can't trust it! As a matter of interest, just how important is the speed of strcpy() in practice, anyway? If the cost of strcpy() were reduced to zero, would your programs go 1% faster, 2% faster, or what? In an attempt to get some sort of feeling for this, I used the Sun C compiler's ".il" (inline) facility to compare the existing library routines with my own C code and with in-lined unrolled hand-tuned assembly code. Library My C code In-line assembler (unrolled) strlen 1.0 1.22 0.26 (~ 4 times faster) strcpy 1.0 1.06 0.87 (~ 13% faster) strcmp 1.0 1.00 0.82 (~ 18% faster) To be fair to Sun, it should be noted that I was using an old compiler and library; the 4.0 compiler is supposed to be rather better. But this means that if I had used a newer Sun compiler, the C code would have looked better, and the assembler code would not have changed. Note that the in-lined versions eliminated the procedure calling overhead entirely. It is also worth noting that it costs nearly as much to find the length of a string as to move it: on this particular machine, given the choice of calling an optimised strlen() and an optimised memmove() {==bcopy()} or calling your own C code, you would be a fool not to use your own C code. What price "optimisation"? Could someone give us some figures for the 4.3 strcpy() using locc {I can't do this, because our microVAX hasn't got a locc instruction} and movc3, comparing them with similarly tuned code not using locc and with code produced by a good C compiler? A lot of C programmers use 80*86s. What about them? Well, strict left-to-right copying has the advantage of not having to fiddle with the direction flag... So I guess the question is whether the importance of strcpy is (A) as a standard operation you thoroughly understand, or (B) as a vaguely defined operation which the vendor was allowed to tune to make his Dhrystone results look good.
nevin1@ihlpf.ATT.COM (00704a-Liber) (04/09/88)
In article <17@feedme.UUCP> doug@feedme.UUCP (Doug Salot) writes: >[...] While someone earlier >pointed out that it is possible to design a language in which some semantics >can be described, C does not have this facility and seems to be >philosphically antagonistic to such a facility. I would indeed be >surprised if a C compiler produced inline code for strcpy (unless you >are talking about a macro, in which case the behavior of the code should >be clear from reading the define), and the idea of compile-time >warnings about function behavior seems equally out of place (maybe >link-time would be appropriate). I agree with you that C, the language, should not be designed in such a way as to depend upon the semantics of a function being called. However, when it comes to optimization, it shouldn't matter whether or not the compiler 'knows' about the semantics of a function being called and optimizes based on that fact (such as inlining instead of doing a function call). The user is supposed to be programming independent of the implementation of the language (assuming it is implemented correctly), so whatever the compiler wants to do in terms of correct optimization should not be restricted. I feel that, for C, it is important to distinguish between defining the language and defining its' implementation. -- _ __ NEVIN J. LIBER ..!ihnp4!ihlpf!nevin1 (312) 510-6194 ' ) ) "The secret compartment of my ring I fill / / _ , __o ____ with an Underdog super-energy pill." / (_</_\/ <__/ / <_ These are solely MY opinions, not AT&T's, blah blah blah
nevin1@ihlpf.ATT.COM (00704a-Liber) (04/09/88)
In article <10962@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: >In article <4309@ihlpf.ATT.COM> nevin1@ihlpf.ATT.COM (00704a-Liber) writes: >>I just do not agree that you, the non-kernel applications programmer, >>should have to write code that is dependent on the *implementation* >>of a system call. >It is not dependent upon the implementation. It is dependent upon >the specification. The specification for strcpy was that it copies >string `src' to string `dst' such that strcpy(s+n, s) moves `s+n' >`down' n characters, while strcpy(s, s+n) `duplicates' characters >from s+1 through s+strlen(s). It still IS dependent on the implmentation; you just want ANSI to put the implementation in the specification. You are not defining *what* the function does (ie, you are not making an abstract *description* of the function); you are defining *how* the function does a strcpy (ie, how it is suppose to be *implemented*). If I give you (for a small licensing fee :-)) all the lines of assembler for Unix and call that the specification of Unix, you will never be able to say that their is a bug in Unix (after all, it's doing everything exactly as written in the assembler code). There is no 'such that' part in the specification of strcpy(). Strcpy(), according to the man page, INCLUDING THE WARNING (something a heck of a lot of posters neglected to read), says: "Strcpy copies string s2 to s1, stopping after the null character has been copied. [...] [Strcpy] returns s1. [...] WARNING [...] Character movement is performed differently in different implementations. Thus overlapping moves may yield surprises." >That may not be what *you* read in the specification, but it *is* >what *others* read in it. You are saying that overlapping does *not* yield surprises, which is a direct contradiction with the specification. >If the semantics for strcpy() specified the action produced by copying >overlapping strings, code that copied overlapping strings would not be >dependent upon the implementation after all, would they? The claim is >simply that the description in string(3) (the `specification') did >specify this, at least to enough people that perhaps it would be best >not to make it ill-defined. If I specify the source code for the compiler, then nothing about the language can be ill-defined. But, as you have already shown, not everybody bothers to read the entire specification, anyway. >(Begin another aside) >>... one of the reasons I like C++ ... it forces programmers to code >>without knowing the implementations of their objects/classes. If >>the implementation of a class is changed, the rest of the code doesn't >>break. >Want to bet? I can[*] write code that depends on all sorts of things that >may not be true in the future, even if I do not know for certain that >they are true now. Not knowing (or caring) about the implementation >of a subclass just (1) discourages such dependencies and (2) tends to >make them more obvious, and hence easier to squash. It does not >prevent them. >*I try not to. Agreed. I should not of said that it 'forces' programmers to code well, but that it makes it easier to code well. -- _ __ NEVIN J. LIBER ..!ihnp4!ihlpf!nevin1 (312) 510-6194 ' ) ) "The secret compartment of my ring I fill / / _ , __o ____ with an Underdog super-energy pill." / (_</_\/ <__/ / <_ These are solely MY opinions, not AT&T's, blah blah blah
nevin1@ihlpf.ATT.COM (00704a-Liber) (04/09/88)
In article <8410@sol.ARPA> quiroz@cs.rochester.edu (Cesar Quiroz) writes: >Before we get too excited about the purity of functional bahaviors, >let's remember that strcpy is used (in the overwheming majority of >cases) *to perform side-effects*. Only if you are using the LISP definition of 'side effect' and not the definition I presented. In LISP, the 'purpose' of calling a function is to return a value, and the 'side effects' are what actions it did to get that value. In C, however, these definitions are reversed. Functions in C are called to perform an action. (This is simply a mindset.) -- _ __ NEVIN J. LIBER ..!ihnp4!ihlpf!nevin1 (312) 510-6194 ' ) ) "The secret compartment of my ring I fill / / _ , __o ____ with an Underdog super-energy pill." / (_</_\/ <__/ / <_ These are solely MY opinions, not AT&T's, blah blah blah
meissner@xyzzy.UUCP (Michael Meissner) (04/09/88)
In article <17@feedme.UUCP> doug@feedme.UUCP (Doug Salot) writes: | Both of these passages seem to imply that C compilers "know" about | the semantics of certain (all?) function calls. While someone earlier | pointed out that it is possible to design a language in which some semantics | can be described, C does not have this facility and seems to be | philosphically antagonistic to such a facility. I would indeed be | surprised if a C compiler produced inline code for strcpy (unless you | are talking about a macro, in which case the behavior of the code should | be clear from reading the define), and the idea of compile-time | warnings about function behavior seems equally out of place (maybe | link-time would be appropriate). The DG C compiler for one will generate inline code for strcpy if the second argument is a string literal, providing you include the standard header <string.h> (and now <strings.h> as well). I believe the Microsoft 5.0 C compiler does similar things (possibly the DEC compiler too). Just because nobody upgrades the typical UNIX compiler, it doesn't mean that it's true for all C compilers. The way it is implemented, there is a keyword ($builtin) that the standard header files use where appropriate. Users in general, don't have change the code to get inline behavior, just use the standard header files. Also, for all builtin's, there is a module in the library, and taking the address of a builtin, takes the address of the library routine. -- Michael Meissner, Data General. Uucp: ...!mcnc!rti!xyzzy!meissner Arpa/Csnet: meissner@dg-rtp.DG.COM
david@dhw68k.cts.com (David H. Wolfskill) (04/10/88)
[I had suggested that strcpy() on overlapping objects ought to be "implementation-defined," rather than "undefined," behavior. Liber then wrote "If you have overlapping strings you have incorrect data." I made the mistake of suggesting that the operation need not involve incorrect data at all, but could be well-defined. The act of making that suggestion was a mistake, in that it had little (if anything) to do with the discussion at hand.] In article <4309@ihlpf.ATT.COM> nevin1@ihlpf.UUCP (00704a-Liber,N.J.) writes: >Oh, so you want the copying of two strings to be WELL-DEFINED, not >implementation-defined or undefined. Why did you beat around the bush for >so long?? No. I am *not* asking that it be (necessarily) well-defined. I am suggesting that the Standard ought to require a conforming implementation to *document* that implementation's behavior, so that someone reading the implementation's documentation of strcpy() can make a determination about the suitability of the implementation for the purposes the individual has in mind -- and so the implementor will also be required to inform the users of the implementation if that implementation changes. It is quite possible to have such an implementation that documents that its behavior, when faced with such a construct, is that "Unpredictable results may occur." Another (competing) implementation may make a guarantee about the results of such an operation. It is even possible that (for the purpos(es) at hand), it makes no difference; at this point, the individuals responsible for acquiring a given implementation have a point of comparison -- we now have a "quality of implementation" issue. I sent my comments in to X3J11; we wouldn't want the committee to suffer from a lack of opinion.... :-) Cheers, david -- David H. Wolfskill uucp: ...{trwrb,hplabs}!felix!dhw68k!david InterNet: david@dhw68k.cts.com
mouse@mcgill-vision.UUCP (der Mouse) (04/12/88)
In article <6476@dhw68k.cts.com>, david@dhw68k.cts.com (David H. Wolfskill) writes: > Suppose, for example, that a given implementation defines that such a > copy would be done from the beginning of the source string to its > terminating NUL, character by character. Then (assuming suitable > definitions of the variables in question), an algorithm to clear a > given string (str1) to a given value (other than NUL) could be coded: > *str1 = ch; > for (c1 = str1; *++c1 != '\0'; *c1 = *(c1 -1)); This will work even when ch *is* '\0'. But it's subtly different from what one would expect out of strcpy: this is equivalent to using a strcpy that loops until it finds a null in the *destination* string, not the *source* string. (The way the same variable is used to refer to both strings helps hide this fact.) > or (remembering the characteristics of the implementation): > *str1 = ch; > strcpy(str1+1, str1) This may well be an infinite loop, or rather, a loop-until-memory-error. For example, the canonical strcpy(s1,s2) /* yes, I know this doesn't return anything */ register char *s1; register char *s2; { while (*s1++ = *s2++) ; } will scream off into higher and higher memory until it finds something it can't read (or can't write, if that happens first) - or if it doesn't find any such, as on a machine with a full complement of memory, it will keep going forever. Goodness, if even the advocates of completely-defined strcpy semantics get confused about what it does, how can they expect anyone else to keep it straight? der Mouse uucp: mouse@mcgill-vision.uucp arpa: mouse@larry.mcrcim.mcgill.edu
jfh@killer.UUCP (The Beach Bum) (04/13/88)
In article <6683@dhw68k.cts.com> david@dhw68k.cts.com (David H. Wolfskill) writes: >[I had suggested that strcpy() on overlapping objects ought to be >"implementation-defined," rather than "undefined," behavior. Liber >then wrote "If you have overlapping strings you have incorrect data." this is exactly what we don't need. the purpose of creating a standard is for all implementations of the standard to function identically. if each different implementation has a different behavior, all of which are being relied on quite heavily, then software will cease to be portable. "undefined" is "defined". stating that the behavior is unknown will force the user to not rely on questionable behavior, or to write the code herself to perform the copy in the correct fashion. - john. -- John F. Haugh II SNAIL: HECI Exploration Co. Inc. UUCP: ...!ihnp4!killer!jfh 11910 Greenville Ave, Suite 600 "You can't threaten us, we're Dallas, TX. 75243 the Oil Company!" (214) 231-0993 Ext 260
wes@obie.UUCP (Barnacle Wes) (04/13/88)
In article <858@cresswell.quintus.UUCP>, ok@quintus.UUCP (Richard A. O'Keefe) writes: > A lot of C programmers use 80*86s. What about them? Well, strict > left-to-right copying has the advantage of not having to fiddle with > the direction flag... Actually, a left-to-right strcpy is pretty easy to do on the 286 and 386: you find the end of the string with REPNZ SCASB, subtract to find the length which you put in CX, and do the move with REP MOVSB. On the 286, that limits you to 64K strings, and will puke going over segment boundaries, but then you have lots of other limitations with the 286 anyhow. I'll still bet this is faster than the code generated by MOST 286 compilers; I *know* it's faster than what is in MicroPork's C library :-). -- /\ - "Against Stupidity, - {backbones}! /\/\ . /\ - The Gods Themselves - utah-cs!utah-gr! / \/ \/\/ \ - Contend in Vain." - uplherc!sp7040! / U i n T e c h \ - Schiller - obie!wes