[comp.lang.c] strcpy wars, jeez! A proposed resolution.

gnu@hoptoad.uucp (John Gilmore) (03/27/88)

In article <10731@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>		(void) strcpy(buf, buf + n);

gwyn@brl-smoke.ARPA (Doug Gwyn) wrote:
> This usage was never a good idea, because a valid implementation of
> strcpy() would be to copy right-to-left rather than left-to-right
> through the source string...

I have seen plenty of constructs in traditional Unix and other C code
that assume strcpy() can slide a string down over itself.

While we are picking nits with the wording of various Unix man pages
and standards, let me point out that none of them makes it perfectly
clear that no bytes past the NUL are modified.  If you can assume that
"it copies the NUL and then stops" doesn't indicate that the NUL is
copied last, as several posters have done, you might as well assume
that it copies three or four more bytes beyond the NUL and then stops,
too.  It seems to make exactly as much sense to me, that is, no sense
at all.

I propose that strcpy, strncpy, strcat, and strncat be defined to perform
either:

  * left-to-right
or
  * non-destructively copying in the case of overlap

at the implementor's choice (each function can choose independently).
I think effectively 100% of the applications and 100% of the
implementations will require no change with this rule.  Simple
implementations will just do left-to-right, while more complicated
implementations like on the 29000 or MIPS can do fancy stuff 4 bytes at
a time, or even copy right-to-left, as long as they avoid destructive
copying.  Today's fancy implementations should already be checking for
overlap, since so much existing code depends on it.
-- 
{pyramid,ptsfa,amdahl,sun,ihnp4}!hoptoad!gnu			  gnu@toad.com
		"Watch me change my world..." -- Liquid Theatre

david@dhw68k.cts.com (David H. Wolfskill) (03/29/88)

As many have pointed out, there is an expectation that strcpy() will
copy characters from left to right, terminating the copy when the
terminating NUL is copied.

Recalling that the dpANS specifies "... the behavior of an abstract
machine in which the issues of optimization are irrelevant," it would
seem to make some sense to modify the specification to be similar to the
above.

The current dpANS also specifies "If copying takes place between objects
that overlap, the behavior is undefined."  I would feel rather more
comfortable with changing that to read "... implementation defined."

This may arguably be a "quality of implementation" issue; I prefer to
think of it as a "quality of standard" issue.

(Oh: I do know of one machine in which fields in main storage are
addressed on the right -- for almost all instructions -- but I don't
know of a C compiler for it, and I consider its architecture (for this
reason, as well as others) to be sufficiently pathological that it's not
worth considering important.  I just wish my employer hadn't purchased
so many of the brain-damaged things!)

david
-- 
David H. Wolfskill
uucp: ...{trwrb,hplabs}!felix!dhw68k!david	InterNet: david@dhw68k.cts.com

nevin1@ihlpf.ATT.COM (00704a-Liber) (03/31/88)

In article <6286@dhw68k.cts.com> david@dhw68k.cts.com (David H. Wolfskill) writes:
>The current dpANS also specifies "If copying takes place between objects
>that overlap, the behavior is undefined."  I would feel rather more
>comfortable with changing that to read "... implementation defined."

I would not!  This would imply that a program which calls strcpy() with
overlapping strings is 'correct', and this is simply not true.

Remember, implementation-defined behavior means (quoted from the draft
section 1.6--Definitions of Terms):

"behavior, for a correct program construct and correct data, that depends
on the characteristics of the implementation and that each implementation
shall document."

If you have overlapping strings you have incorrect data.

If this were to change (something which I am against), all programs that
use strcpy() would be suspect every time a new version of the compiler
comes out (especially since many compilers use inline assembly instead of
doing a function call for strcpy()).  This is not something which should
depend on the implementation.
-- 
 _ __			NEVIN J. LIBER	..!ihnp4!ihlpf!nevin1	(312) 510-6194
' )  )				"The secret compartment of my ring I fill
 /  / _ , __o  ____		 with an Underdog super-energy pill."
/  (_</_\/ <__/ / <_	These are solely MY opinions, not AT&T's, blah blah blah

karl@haddock.ISC.COM (Karl Heuer) (04/01/88)

In article <4215@ihlpf.ATT.COM> nevin1@ihlpf.UUCP (00704a-Liber,N.J.) writes:
>In article <6286@dhw68k.cts.com> david@dhw68k.cts.com (David H. Wolfskill) writes:
>>The current dpANS also specifies "If copying takes place between objects
>>that overlap, the behavior is undefined."  I would feel rather more
>>comfortable with changing that to read "... implementation defined."
>
>I would not!  This would imply that a program which calls strcpy() with
>overlapping strings is 'correct', and this is simply not true.

But it would be true, if the standard were to explicitly allow it.

>If this were to change, all programs that use strcpy() would be suspect every
>time a new version of the compiler comes out

Only those programs that use strcpy on overlapping strings.  And if the
"implementation-defined" part is properly phrased, strcpy(s,s+1) would be
guaranteed to be safe.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

nevin1@ihlpf.ATT.COM (00704a-Liber) (04/02/88)

In article <3267@haddock.ISC.COM> karl@haddock.ima.isc.com (Karl Heuer) writes:
>In article <4215@ihlpf.ATT.COM> nevin1@ihlpf.UUCP (00704a-Liber,N.J.) writes:
>>If this were to change, all programs that use strcpy() would be suspect every
>>time a new version of the compiler comes out
>
>Only those programs that use strcpy on overlapping strings.  And if the
>"implementation-defined" part is properly phrased, strcpy(s,s+1) would be
>guaranteed to be safe.

First off, just by looking at a program how can I tell whether or not it
uses overlapping strings (under your proposal)??  There is no way for me to
tell the difference between a program that is using strcpy() in an
implementation-DEPENDENT way and a program which can portably use
strcpy() (at least not by just looking at it).  From a maintenance point of
view, this is very undesirable!!

Secondly, I do not like the change that would have to be made to the
prototype for strcpy.  The prototype would change from:

char *strcpy(noalias char *s1, const noalias char *s2)

to

char *strcpy(char *s1, char *s2)

since, as you pointed out, both s1 and s2 are possibly aliased and the
string pointed to by s2 is no longer guaranteed to be constant (see below).

char *foo, *bar;
...
/*assume that foo points to string "stuff" in read/write memory*/
bar = foo + 1;
strcpy(foo, bar);

Under your proposal, this would *legally* change the value of what bar points
to (unless you are going to put in some wording about only being able to
copy the right half or less of overlapping strings, but this wording is
VERY messy)!!  I'm sorry, but I like knowing that the source string should
not be changed by strcpy() in a conforming program!!

In article <3266@haddock.ISC.COM> karl@haddock.ima.isc.com (Karl Heuer) writes:
>I don't see that such a compiler would have to depend on the implementation;
>just on the functional specification (which has now been standardized).

MAKE UP YOUR MIND!!  You either want to have programs which are dependent
on the implementation of the libraries or you don't.  I don't really care
which of these two views that you take, JUST BE CONSISTENT!!
-- 
 _ __			NEVIN J. LIBER	..!ihnp4!ihlpf!nevin1	(312) 510-6194
' )  )				"The secret compartment of my ring I fill
 /  / _ , __o  ____		 with an Underdog super-energy pill."
/  (_</_\/ <__/ / <_	These are solely MY opinions, not AT&T's, blah blah blah

john@frog.UUCP (John Woods, Software) (04/02/88)

In article <3267@haddock.ISC.COM>, karl@haddock.ISC.COM (Karl Heuer) writes:
>In article <4215@ihlpf.ATT.COM> nevin1@ihlpf.UUCP (00704a-Liber,N.J.) writes:
>>In article <6286@dhw68k.cts.com> david@dhw68k.cts.com (David H. Wolfskill) writes:
>>>The current dpANS also specifies "If copying takes place between objects
>>>that overlap, the behavior is undefined."  I would feel rather more
>>>comfortable with changing that to read "... implementation defined."
>>I would not!  This would imply that a program which calls strcpy() with
>>overlapping strings is 'correct', and this is simply not true.
> But it would be true, if the standard were to explicitly allow it....if the
> "implementation-defined" part is properly phrased, strcpy(s,s+1) would be
> guaranteed to be safe.

Still no.  The problem with "implementation-defined" is that there are no
constraints upon what the implementation may define the behavior to be.
If you port your program to an implementation where, in 3-point italic type
in a margin somewhere, they mention that strcpy(s,s+1) causes the CPU chip to
be launched upward with a velocity of 16 km/s, they will be _right._

From the August 3, 1987 draft (and I assume this hasn't changed):
"1.7 COMPLIANCE
   A _strictly conforming program_ shall use only those features of the
language and library specified in this standard.  It shall not produce output
dependent on any unspecified, undefined, or ---> implementation-defined <---
behavior..."	( ---> Emphasis <--- added).

If you know that your implementation does what you want with strcpy(s,s+1),
then your are free to use it.  Your program won't be "strictly conforming",
but you may not care about that.  Just don't complain when you hear that
"chuffBANG!" of the CPU chip being launched when you buy that shiny new
Mark IV Datablaster...

--
John Woods, Charles River Data Systems, Framingham MA, (617) 626-1101
...!decvax!frog!john, ...!mit-eddie!jfw, jfw@eddie.mit.edu

FUN:  THE FINAL FRONTIER
Zippy the Pinhead in '88!

jv0l+@andrew.cmu.edu (Justin Chris Vallon) (04/03/88)

>"... implementation defined."

Means NON-PORTABLE!  If implementation X does it one way [ie strcpy(s, s+1)
works], and implementation Y does it another way [ie strcpy(s, s+1) does not
work], my program will behave very differently on different systems, but both
strcpy() functions adhere to the ANSI specs.

I cannot expect an ANSI standard which isn't a standard.  Isn't non-portable
code something that ANSI is trying to prevent, not endorse?

-Justin

henry@utzoo.uucp (Henry Spencer) (04/03/88)

> ... The prototype would change ... to
> char *strcpy(char *s1, char *s2)
> since, as you pointed out, both s1 and s2 are possibly aliased and the
> string pointed to by s2 is no longer guaranteed to be constant...

The latter is quite irrelevant; const on a pointer does not mean that the
thing pointed to is constant, just that attempts to modify it through that
pointer are illegal.  (If this double meaning of const strikes you as less
than ideal, you're in good company.)
-- 
"Noalias must go.  This is           |  Henry Spencer @ U of Toronto Zoology
non-negotiable."  --DMR              | {allegra,ihnp4,decvax,utai}!utzoo!henry

david@dhw68k.cts.com (David H. Wolfskill) (04/03/88)

In article <4215@ihlpf.ATT.COM> nevin1@ihlpf.UUCP (00704a-Liber,N.J.) writes:
>In article <6286@dhw68k.cts.com> I wrote:
>>The current dpANS also specifies "If copying takes place between objects
>>that overlap, the behavior is undefined."  I would feel rather more
>>comfortable with changing that to read "... implementation defined."

>I would not!  This would imply that a program which calls strcpy() with
>overlapping strings is 'correct', and this is simply not true.

[He then quotes the definition of "implementation-defined," as used in
the dpANS.]

>If you have overlapping strings you have incorrect data.

Well, thank you for your opinion; however, I respectfully disagree.
Given an order in which the copying shall be done, the operation of
copying data from one string to another (when the two strings have a
known degree of overlap) can be a well-defined one.

It is quite possible -- and to me, reasonable -- to define an algorithm
in such a way that it uses the implementation-defined behavior of such
an operation.

Suppose, for example, that a given implementation defines that such a
copy would be done from the beginning of the source string to its
terminating NUL, character by character.  Then (assuming suitable
definitions of the variables in question), an algorithm to clear a given
string (str1) to a given value (other than NUL) could be coded:

	*str1 = ch;
	for (c1 = str1; *++c1 != '\0'; *c1 = *(c1 -1));

or (remembering the characteristics of the implementation):

	*str1 = ch;
	strcpy(str1+1, str1)

but I think the latter is easier to comprehend.

I have used the technique -- although in assembler, rather than C -- and
am quite willing to grant that its effects are properly defined by the
characteristics of the implementation.

>If this were to change (something which I am against), all programs that
>use strcpy() would be suspect every time a new version of the compiler
>comes out (especially since many compilers use inline assembly instead of
>doing a function call for strcpy()).  This is not something which should
>depend on the implementation.

Hmmmm....  It is my understanding that if the behavior were
"implementation-defined," at least the vendor would be under an
obligation to warn you of any change in the implementation's behavior
when faced with such a construct; whether or not you chose to do
anything about it is (of course) another issue altogether.

On the other hand, if the behavior is "undefined," the vendor would be
under no obligation to indicate in any way any changes in the
implementation's behavior when faced with such a construct.  It is not
clear to me that you (or anyone else) would be well-served by such a
position.

That is really the main point of my earlier posting.

Of course, it would only be an issue for you to the extent that you need
to work with (or in spite of!) these constructs that you seem
disinclined to use anyway.  (Also, if you are sufficiently fortunate to
use a compiler that has a mode in which it flags all constructs whose
behavior is "implementation-defined," you can have that much more
warning about such concerns.)

Onward....

david
-- 
David H. Wolfskill
uucp: ...{trwrb,hplabs}!felix!dhw68k!david	InterNet: david@dhw68k.cts.com

doug@feedme.UUCP (Doug Salot) (04/03/88)

There's seems to be a point here with which both posters' agree, but I find
absurd.  For background:

nevin says:
> >If this were to change (something which I am against), all programs that
> >use strcpy() would be suspect every time a new version of the compiler
> >comes out (especially since many compilers use inline assembly instead of
> >doing a function call for strcpy()).  This is not something which should
> >depend on the implementation.

and david says:
> Of course, it would only be an issue for you to the extent that you need
> to work with (or in spite of!) these constructs that you seem
> disinclined to use anyway.  (Also, if you are sufficiently fortunate to
> use a compiler that has a mode in which it flags all constructs whose
> behavior is "implementation-defined," you can have that much more
> warning about such concerns.)

Both of these passages seem to imply that C compilers "know" about
the semantics of certain (all?) function calls.  While someone earlier
pointed out that it is possible to design a language in which some semantics 
can be described, C does not have this facility and seems to be
philosphically antagonistic to such a facility.  I would indeed be
surprised if a C compiler produced inline code for strcpy (unless you
are talking about a macro, in which case the behavior of the code should
be clear from reading the define), and the idea of compile-time
warnings about function behavior seems equally out of place (maybe
link-time would be appropriate).

As long as I'm here, I must say that I disagree with david.  If the
behavior of a function is *undefined* rather than *implementation
defined* for singular cases, one would be inclined not to use the
function for the singular cases, thereby insuring (used loosely)
portability.

- Doug
-- 
Doug Salot || doug@feedme.UUCP || {trwrb,hplabs}!felix!dhw68k!feedme!doug
Feedme Microsystems:Inventors of the Snarf->Grok->Munge Development Cycle

nevin1@ihlpf.ATT.COM (00704a-Liber) (04/07/88)

In article <6476@dhw68k.cts.com> david@dhw68k.cts.com (David H. Wolfskill) writes:
>In article <4215@ihlpf.ATT.COM> nevin1@ihlpf.UUCP (00704a-Liber,N.J.) writes:
>>In article <6286@dhw68k.cts.com> David Wolfskill wrote:
>>>The current dpANS also specifies "If copying takes place between objects
>>>that overlap, the behavior is undefined."  I would feel rather more
>>>comfortable with changing that to read "... implementation defined."
>
>>If you have overlapping strings you have incorrect data.
>
>Well, thank you for your opinion; however, I respectfully disagree.
>Given an order in which the copying shall be done, the operation of
>copying data from one string to another (when the two strings have a
>known degree of overlap) can be a well-defined one.

Oh, so you want the copying of two strings to be WELL-DEFINED, not
implementation-defined or undefined.  Why did you beat around the bush for
so long??  I do agree that if you know the algorithm, all the side effects
are well-defined.  I just do not agree that you, the non-kernel applications
programmer, should have to write code that is dependent on the
*implementation* of a system call.  This only leads to nightmares for code
maintenance people (which is part of my job).

For example:  some of the people right now who are arguing for strcpy() to
be *defined* as left-to-right string copy are bringing up the point that
code currently being used is dependent on this implementation of strcpy().
They are claiming that it is hard to maintain since it is
implementation-dependent.  We should be going away from code like this, not
towards it.  This is one of the reasons I like C++, because it forces
programmers to code without knowing the implementations of their
objects/classes.  If the implementation of a class is changed, the rest of
the code doesn't break.

>It is quite possible -- and to me, reasonable -- to define an algorithm
>in such a way that it uses the implementation-defined behavior of such
>an operation.

You are right in one sense:  it is quite possible to define an algorithm in
such a way that it uses the *side effects* (aka, implementation-defined
behavior) of such an operation.  If you are writing code like this then you
are becoming very dependent on your particular version/implementation of C.

Good luck in three years, when your implementation is outdated!
-- 
 _ __			NEVIN J. LIBER	..!ihnp4!ihlpf!nevin1	(312) 510-6194
' )  )				"The secret compartment of my ring I fill
 /  / _ , __o  ____		 with an Underdog super-energy pill."
/  (_</_\/ <__/ / <_	These are solely MY opinions, not AT&T's, blah blah blah

chris@mimsy.UUCP (Chris Torek) (04/07/88)

In article <4309@ihlpf.ATT.COM> nevin1@ihlpf.ATT.COM (00704a-Liber) writes:
>Oh, so you want the copying of two [overlapping] strings to be
>WELL-DEFINED, not implementation-defined or undefined.

Right.

>I just do not agree that you, the non-kernel applications programmer,
>should have to write code that is dependent on the *implementation*
>of a system call.

(Aside: strcpy is not a system call, it is a library routine.)

It is not dependent upon the implementation.  It is dependent upon
the specification.  The specification for strcpy was that it copies
string `src' to string `dst' such that strcpy(s+n, s) moves `s+n'
`down' n characters, while strcpy(s, s+n) `duplicates' characters
from s+1 through s+strlen(s).

That may not be what *you* read in the specification, but it *is*
what *others* read in it.  Perhaps the specification was sloppy.
You have probably seen sloppy specifications before.  The usual
answer is to tighten the spec, and if the tightened spec invalidates
a few routines, so it goes; but if, on the other hand, the tightened
spec breaks hundreds of working programs, the design team might
instead change the spec to explicitly grant those features/bugs
that everyone else interpreted it to grant.

If the semantics for strcpy() specified the action produced by copying
overlapping strings, code that copied overlapping strings would not be
dependent upon the implementation after all, would they?  The claim is
simply that the description in string(3) (the `specification') did
specify this, at least to enough people that perhaps it would be best
not to make it ill-defined.

(Begin another aside)
>... one of the reasons I like C++ ... it forces programmers to code
>without knowing the implementations of their objects/classes.  If
>the implementation of a class is changed, the rest of the code doesn't
>break.

Want to bet?  I can[*] write code that depends on all sorts of things that
may not be true in the future, even if I do not know for certain that
they are true now.  Not knowing (or caring) about the implementation
of a subclass just (1) discourages such dependencies and (2) tends to
make them more obvious, and hence easier to squash.  It does not
prevent them.

-----
*I try not to.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

quiroz@cs.rochester.edu (Cesar Quiroz) (04/08/88)

Sender:

Followup-To:



This article was suggested by reading <4309@ihlpf.ATT.COM>, by
(nevin1@ihlpf.UUCP (00704a-Liber,N.J.)).  It is not a direct
response, though, but rather a side note prompted because one of his
arguments doesn't carry much force, but has been repeated many times
before.  Liber (and others before and, no doubt, after) says:

:... [I]t is quite possible to define an algorithm in such a way
:that it uses the *side effects* (aka, implementation-defined
:behavior) of such an operation.  If you are writing code like this
:then you are becoming very dependent on your particular
:version/implementation of C.
:
:Good luck in three years, when your implementation is outdated!

Before we get too excited about the purity of functional bahaviors,
let's remember that strcpy is used (in the overwheming majority of
cases) *to perform side-effects*.  It is perfectly legitimate to
want to clarify exactly *what* side-effects are guaranteed.  So,
when someone asks that the standard guarantee a certain order of
copying, it is a perfectly sensible thing to discuss whether that
side-effect is useful and reasonable to demand.

It is not as if you called a function from the math library and then
depended on the way your implementation leaves trash behind in your
fp registers...

If you are truly interested in defining an ADT String whose
implementation can be hidden totally, consider something along these
lines: (UNTESTED, CATCH THE IDEA, NOT THE CODE)

    typedef char *string;
    ...
    string
    fstrcpy (old)                   /* Functional STRing CoPY */
        string  old;

    {
        return strcpy((string)malloc((unsigned)strlen(old)+1), old);
    }

So you could use the side-effecting string package to write
a library that does not need to guarantee any side-effects, but can
be considerably more costly.

-- 
Cesar Augusto  Quiroz Gonzalez
Department of Computer Science     ...allegra!rochester!quiroz
University of Rochester            or
Rochester,  NY 14627               quiroz@cs.rochester.edu

djones@megatest.UUCP (Dave Jones) (04/08/88)

I said I was not going to say any more about the strcpy() thing.

But I would like to make a few comments about *programming*
*philosophy*. 

I DON'T LIKE programming philosophies.  But I guess I have
to admit that I may have one.  Just a little one, mind you.

(BTW. Unless otherwise indicated, my postings contain implicit smilies
 on every line. :-)  :-) :-)  )


in article <4309@ihlpf.ATT.COM>, nevin1@ihlpf.ATT.COM (00704a-Liber) says:
> 
...

> ...  some of the people right now who are arguing for strcpy() to
> be *defined* as left-to-right string copy are bringing up the point that
> code currently being used is dependent on this implementation of strcpy().

  Correct!  Do anything reasonable to prevent breaking code.  Even code
  which you consider to be "bad".  As a systems engineer, my job is to
  keep 'em flyin'.  Nothing is more important than that.  When the
  programs fail catastrophicly, the customers don't care that the failure is
  caused by a morally correct change of semantics.

  Here's a case history:  Recently we had to retract a major software
  release from the field.  The problem was that some ten year old
  Berkeley code used a statement similar to the following to skip zero or
  more leading white-space characters:

	if (scanf("%[ \t\n]") == 0) { report_error(); }
  
  Somebody, probably at the company that makes our new workstaions,
  had decided that the string format should have to match
  at least one character in order to succeed.  They duly documented
  said behavior in the man pages.  

  We do extensive QA, but somehow that statement didn't get executed in 
  the QA suites.  BOOM!

  Now, whether or not scanf %[xyz] has to match at least one character
  is, taken by itself, just as silly a consideration as whether strcpy()
  should scan from left to right.  Far too silly to have caused such
  an expensive incident.  The semantics of scanf should never have been
  changed.

  And there was no need to change them.  If you really just HAVE to have
  a new scanf, give it a new name.

  It's easy to make up new names for functions which are similar to old 
  ones.  I once told a fellow programmer, "As long as we can make up new 
  names, we can never be defeated."  I felt real profound about that one.

> They are claiming that it is hard to maintain since it is
> implementation-dependent.  We should be going away from code like this, not
> towards it. 

  Again I agree. Completely.  You're right on the mark.


> This is one of the reasons I like C++, because it forces
> programmers to code without knowing the implementations of their
> objects/classes.  If the implementation of a class is changed, the rest of
> the code doesn't break.
> 
  But, now you're beginning to loose the thread.

  I also like C++, but the notion that the exhaulted gurus can invent
  programming languages to "force" the ignorant masses to produce good 
  code has been discredited again and again.

  (Don't forget the implicit smilies. :-) )

> ...  If you are writing code like this then you
> are becoming very dependent on your particular version/implementation of C.
> 

  I'm not writing the stuff.  There's plenty of it is already written.

>
> Good luck in three years, when your implementation is outdated!
>

  Hey!!  Like, DON'T OUTDATE MY IMPLEMENTATION, dude!

  Did I detect just a hint of an anticipated "told-you-so"?  
  "The transgressors will be punished!"  "Infamy to the implementation-
  dependent rascals!"  

  No?  I guess it's just me.

  I'll respond anyway.  Just pretend for a second that that's what 
  you meant.

  In the first place, people who write shaky code
  are often nice people.  They don't mean to.  They could use some 
  help.  I have no desire to punish them.   They don't need us to pull 
  the rug out from under their code.

  In the second place, it is not only the vile perpetrators who suffer 
  when code breaks.  Often they are long since gone.  But if they are
  not, then you're still on the same team!

  A while back I had to fix some code that my supervisor's supervisor 
  wrote about five years ago.  He's never claimed to be a great programmer.  
  I thought his comment was very funny.   His pronouncement 
  was, "My past has come back to haunt you."

ok@quintus.UUCP (Richard A. O'Keefe) (04/08/88)

In article <17@feedme.UUCP>, doug@feedme.UUCP (Doug Salot) writes:
> Both of these passages seem to imply that C compilers "know" about
> the semantics of certain (all?) function calls.  While someone earlier
> pointed out that it is possible to design a language in which some semantics 
> can be described, C does not have this facility and seems to be
> philosphically antagonistic to such a facility.  I would indeed be
> surprised if a C compiler produced inline code for strcpy ...

strcpy() is defined as part of the dpANS.  X3J11 has gone to a great
deal of trouble to ensure that a useful chunk of the C library will be
present.  One way of looking at it is to think of things like strcpy()
as built-in operations which merely happen to use functional notation.
[This isn't 100% accurate; but it is the basic idea.]  So an ANSI C
compiler *will* be entitled to "know" about the semantics of "functions"
which are defined in the standard, just as a Pascal compiler is entitled
to "know" about sqrt(). Indeed, it simply is not possible for a user to
define a function called strcpy() in a standard- conforming way, because
names beginning with "str" are reserved.

The argument is over what strcpy(dst, src) should mean when
	src <= dst <= src+strlen(src)+1.

There are two basic positions:
(A)	Strict left-to-right copy is easy to understand and is often
	useful.  It is also how many C books (including K&R) have
	explained the operation, so many C programmers expect this
	behaviour.

(B)	The implementor should be given as much freedom as possible
	in order to make this operation supremely fast, and if this
	means leaving the operation as unspecified as possible, sobeit.

Note that if you have been using a system where the implementor had
already taken attitude (B), and you move a working program to a system
whose implementor took attitude (A), your code will continue to work,
but if you move code in the other direction your code is likely to break.

Requiring strict left-to-right copying in the standard will therefore
improve future portability and maintainability, at the expense of
prohibiting certain machine-specific optimisations in this particular
operation.

May I respectfully suggest that the emphasis on the implementor's
freedom to optimise strcpy() may, just possibly, be misguided?  I have
done text manipulation in a wide variety of programming languages, and
have found C to be easily the nicest of them, because user-defined
operations (compare this padded string to this NUL-terminated string,
for example) are not an order of magnitude slower than built-ins (such
as strcmp()).  If compiler-writers have N man-months to spend improving
their compiler, I would much rather they spent them on optimisations
that will affect code that I write rather than builtins which seldom do
exactly what I want.  {If they are going to optimise a builtin, let it
be sprintf(), please.}  It is especially misguided to spend those N
man-months on optimising an operation which, in order to permit such
optimisation, has been left so vaguely defined that I can't trust it!

As a matter of interest, just how important is the speed of strcpy()
in practice, anyway?  If the cost of strcpy() were reduced to zero,
would your programs go 1% faster, 2% faster, or what?  In an attempt
to get some sort of feeling for this, I used the Sun C compiler's
".il" (inline) facility to compare the existing library routines with
my own C code and with in-lined unrolled hand-tuned assembly code.

	Library		My C code	In-line assembler (unrolled)
strlen	1.0		1.22		0.26	(~ 4 times faster)
strcpy	1.0		1.06		0.87	(~ 13% faster)
strcmp	1.0		1.00		0.82	(~ 18% faster)

To be fair to Sun, it should be noted that I was using an old compiler
and library; the 4.0 compiler is supposed to be rather better.  But
this means that if I had used a newer Sun compiler, the C code would
have looked better, and the assembler code would not have changed.

Note that the in-lined versions eliminated the procedure calling
overhead entirely.  It is also worth noting that it costs nearly as
much to find the length of a string as to move it:  on this particular
machine, given the choice of calling an optimised strlen() and an
optimised memmove() {==bcopy()} or calling your own C code, you would
be a fool not to use your own C code.  What price "optimisation"?

Could someone give us some figures for the 4.3 strcpy() using locc
{I can't do this, because our microVAX hasn't got a locc instruction}
and movc3, comparing them with similarly tuned code not using locc
and with code produced by a good C compiler?

A lot of C programmers use 80*86s.  What about them?  Well, strict
left-to-right copying has the advantage of not having to fiddle with
the direction flag...

So I guess the question is whether the importance of strcpy is
(A)	as a standard operation you thoroughly understand, or

(B)	as a vaguely defined operation which the vendor was allowed
	to tune to make his Dhrystone results look good.

nevin1@ihlpf.ATT.COM (00704a-Liber) (04/09/88)

In article <17@feedme.UUCP> doug@feedme.UUCP (Doug Salot) writes:
>[...]  While someone earlier
>pointed out that it is possible to design a language in which some semantics 
>can be described, C does not have this facility and seems to be
>philosphically antagonistic to such a facility.  I would indeed be
>surprised if a C compiler produced inline code for strcpy (unless you
>are talking about a macro, in which case the behavior of the code should
>be clear from reading the define), and the idea of compile-time
>warnings about function behavior seems equally out of place (maybe
>link-time would be appropriate).

I agree with you that C, the language, should not be designed in such a way
as to depend upon the semantics of a function being called.  However, when
it comes to optimization, it shouldn't matter whether or not the compiler
'knows' about the semantics of a function being called and optimizes based
on that fact (such as inlining instead of doing a function call).  The user
is supposed to be programming independent of the implementation of the
language (assuming it is implemented correctly), so whatever the compiler
wants to do in terms of correct optimization should not be restricted.

I feel that, for C, it is important to distinguish between defining the
language and defining its' implementation.
-- 
 _ __			NEVIN J. LIBER	..!ihnp4!ihlpf!nevin1	(312) 510-6194
' )  )				"The secret compartment of my ring I fill
 /  / _ , __o  ____		 with an Underdog super-energy pill."
/  (_</_\/ <__/ / <_	These are solely MY opinions, not AT&T's, blah blah blah

nevin1@ihlpf.ATT.COM (00704a-Liber) (04/09/88)

In article <10962@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>In article <4309@ihlpf.ATT.COM> nevin1@ihlpf.ATT.COM (00704a-Liber) writes:
>>I just do not agree that you, the non-kernel applications programmer,
>>should have to write code that is dependent on the *implementation*
>>of a system call.

>It is not dependent upon the implementation.  It is dependent upon
>the specification.  The specification for strcpy was that it copies
>string `src' to string `dst' such that strcpy(s+n, s) moves `s+n'
>`down' n characters, while strcpy(s, s+n) `duplicates' characters
>from s+1 through s+strlen(s).

It still IS dependent on the implmentation; you just want ANSI to put the
implementation in the specification.

You are not defining *what* the function does (ie, you are not making an
abstract *description* of the function); you are defining *how* the
function does a strcpy (ie, how it is suppose to be *implemented*).  If I
give you (for a small licensing fee :-)) all the lines of assembler for
Unix and call that the specification of Unix, you will never be able to say
that their is a bug in Unix (after all, it's doing everything exactly as
written in the assembler code).

There is no 'such that' part in the specification of strcpy().  Strcpy(),
according to the man page, INCLUDING THE WARNING (something a heck of a lot
of posters neglected to read), says:

"Strcpy copies string s2 to s1, stopping after the null character has been
copied. [...] [Strcpy] returns s1.
[...]
WARNING
[...]
Character movement is performed differently in different implementations.
Thus overlapping moves may yield surprises."

>That may not be what *you* read in the specification, but it *is*
>what *others* read in it.

You are saying that overlapping does *not* yield surprises, which is a direct
contradiction with the specification.

>If the semantics for strcpy() specified the action produced by copying
>overlapping strings, code that copied overlapping strings would not be
>dependent upon the implementation after all, would they?  The claim is
>simply that the description in string(3) (the `specification') did
>specify this, at least to enough people that perhaps it would be best
>not to make it ill-defined.

If I specify the source code for the compiler, then nothing about the
language can be ill-defined.  But, as you have already shown, not
everybody bothers to read the entire specification, anyway.

>(Begin another aside)
>>... one of the reasons I like C++ ... it forces programmers to code
>>without knowing the implementations of their objects/classes.  If
>>the implementation of a class is changed, the rest of the code doesn't
>>break.

>Want to bet?  I can[*] write code that depends on all sorts of things that
>may not be true in the future, even if I do not know for certain that
>they are true now.  Not knowing (or caring) about the implementation
>of a subclass just (1) discourages such dependencies and (2) tends to
>make them more obvious, and hence easier to squash.  It does not
>prevent them.

>*I try not to.

Agreed.  I should not of said that it 'forces' programmers to code well, but
that it makes it easier to code well.
-- 
 _ __			NEVIN J. LIBER	..!ihnp4!ihlpf!nevin1	(312) 510-6194
' )  )				"The secret compartment of my ring I fill
 /  / _ , __o  ____		 with an Underdog super-energy pill."
/  (_</_\/ <__/ / <_	These are solely MY opinions, not AT&T's, blah blah blah

nevin1@ihlpf.ATT.COM (00704a-Liber) (04/09/88)

In article <8410@sol.ARPA> quiroz@cs.rochester.edu (Cesar Quiroz) writes:

>Before we get too excited about the purity of functional bahaviors,
>let's remember that strcpy is used (in the overwheming majority of
>cases) *to perform side-effects*.

Only if you are using the LISP definition of 'side effect' and not the
definition I presented.  In LISP, the 'purpose' of calling a function is to
return a value, and the 'side effects' are what actions it did to get that
value.  In C, however, these definitions are reversed.  Functions in C are
called to perform an action.  (This is simply a mindset.)
-- 
 _ __			NEVIN J. LIBER	..!ihnp4!ihlpf!nevin1	(312) 510-6194
' )  )				"The secret compartment of my ring I fill
 /  / _ , __o  ____		 with an Underdog super-energy pill."
/  (_</_\/ <__/ / <_	These are solely MY opinions, not AT&T's, blah blah blah

meissner@xyzzy.UUCP (Michael Meissner) (04/09/88)

In article <17@feedme.UUCP> doug@feedme.UUCP (Doug Salot) writes:
| Both of these passages seem to imply that C compilers "know" about
| the semantics of certain (all?) function calls.  While someone earlier
| pointed out that it is possible to design a language in which some semantics 
| can be described, C does not have this facility and seems to be
| philosphically antagonistic to such a facility.  I would indeed be
| surprised if a C compiler produced inline code for strcpy (unless you
| are talking about a macro, in which case the behavior of the code should
| be clear from reading the define), and the idea of compile-time
| warnings about function behavior seems equally out of place (maybe
| link-time would be appropriate).

The DG C compiler for one will generate inline code for strcpy if the
second argument is a string literal, providing you include the standard
header <string.h> (and now <strings.h> as well).  I believe the
Microsoft 5.0 C compiler does similar things (possibly the DEC compiler
too).  Just because nobody upgrades the typical UNIX compiler, it
doesn't mean that it's true for all C compilers.  The way it is
implemented, there is a keyword ($builtin) that the standard header
files use where appropriate.  Users in general, don't have change the
code to get inline behavior, just use the standard header files.  Also,
for all builtin's, there is a module in the library, and taking the
address of a builtin, takes the address of the library routine.
-- 
Michael Meissner, Data General.		Uucp: ...!mcnc!rti!xyzzy!meissner
					Arpa/Csnet:  meissner@dg-rtp.DG.COM

david@dhw68k.cts.com (David H. Wolfskill) (04/10/88)

[I had suggested that strcpy() on overlapping objects ought to be
"implementation-defined," rather than "undefined," behavior.  Liber
then wrote "If you have overlapping strings you have incorrect data."
I made the mistake of suggesting that the operation need not involve
incorrect data at all, but could be well-defined.  The act of making
that suggestion was a mistake, in that it had little (if anything) to
do with the discussion at hand.]

In article <4309@ihlpf.ATT.COM> nevin1@ihlpf.UUCP (00704a-Liber,N.J.) writes:
>Oh, so you want the copying of two strings to be WELL-DEFINED, not
>implementation-defined or undefined.  Why did you beat around the bush for
>so long??

No.  I am *not* asking that it be (necessarily) well-defined.  I am
suggesting that the Standard ought to require a conforming
implementation to *document* that implementation's behavior, so that
someone reading the implementation's documentation of strcpy() can make
a determination about the suitability of the implementation for the
purposes the individual has in mind -- and so the implementor will also
be required to inform the users of the implementation if that
implementation changes.

It is quite possible to have such an implementation that documents that
its behavior, when faced with such a construct, is that "Unpredictable
results may occur."  Another (competing) implementation may make a
guarantee about the results of such an operation.  It is even possible
that (for the purpos(es) at hand), it makes no difference; at this
point, the individuals responsible for acquiring a given implementation
have a point of comparison -- we now have a "quality of implementation"
issue.

I sent my comments in to X3J11; we wouldn't want the committee to suffer
from a lack of opinion....  :-)

Cheers,
david
-- 
David H. Wolfskill
uucp: ...{trwrb,hplabs}!felix!dhw68k!david	InterNet: david@dhw68k.cts.com

mouse@mcgill-vision.UUCP (der Mouse) (04/12/88)

In article <6476@dhw68k.cts.com>, david@dhw68k.cts.com (David H. Wolfskill) writes:
> Suppose, for example, that a given implementation defines that such a
> copy would be done from the beginning of the source string to its
> terminating NUL, character by character.  Then (assuming suitable
> definitions of the variables in question), an algorithm to clear a
> given string (str1) to a given value (other than NUL) could be coded:
> 	*str1 = ch;
> 	for (c1 = str1; *++c1 != '\0'; *c1 = *(c1 -1));

This will work even when ch *is* '\0'.  But it's subtly different from
what one would expect out of strcpy: this is equivalent to using a
strcpy that loops until it finds a null in the *destination* string,
not the *source* string.  (The way the same variable is used to refer
to both strings helps hide this fact.)

> or (remembering the characteristics of the implementation):
> 	*str1 = ch;
> 	strcpy(str1+1, str1)

This may well be an infinite loop, or rather, a
loop-until-memory-error.  For example, the canonical

strcpy(s1,s2) /* yes, I know this doesn't return anything */
register char *s1;
register char *s2;
{
 while (*s1++ = *s2++) ;
}

will scream off into higher and higher memory until it finds something
it can't read (or can't write, if that happens first) - or if it
doesn't find any such, as on a machine with a full complement of
memory, it will keep going forever.

Goodness, if even the advocates of completely-defined strcpy semantics
get confused about what it does, how can they expect anyone else to
keep it straight?

					der Mouse

			uucp: mouse@mcgill-vision.uucp
			arpa: mouse@larry.mcrcim.mcgill.edu

jfh@killer.UUCP (The Beach Bum) (04/13/88)

In article <6683@dhw68k.cts.com> david@dhw68k.cts.com (David H. Wolfskill) writes:
>[I had suggested that strcpy() on overlapping objects ought to be
>"implementation-defined," rather than "undefined," behavior.  Liber
>then wrote "If you have overlapping strings you have incorrect data."

this is exactly what we don't need.  the purpose of creating a standard
is for all implementations of the standard to function identically.
if each different implementation has a different behavior, all of which
are being relied on quite heavily, then software will cease to be
portable.

"undefined" is "defined".  stating that the behavior is unknown will
force the user to not rely on questionable behavior, or to write the
code herself to perform the copy in the correct fashion.

- john.
-- 
John F. Haugh II                  SNAIL:  HECI Exploration Co. Inc.
UUCP: ...!ihnp4!killer!jfh                11910 Greenville Ave, Suite 600
"You can't threaten us, we're             Dallas, TX. 75243
  the Oil Company!"                       (214) 231-0993 Ext 260

wes@obie.UUCP (Barnacle Wes) (04/13/88)

In article <858@cresswell.quintus.UUCP>, ok@quintus.UUCP (Richard A. O'Keefe) writes:
> A lot of C programmers use 80*86s.  What about them?  Well, strict
> left-to-right copying has the advantage of not having to fiddle with
> the direction flag...

Actually, a left-to-right strcpy is pretty easy to do on the 286 and
386: you find the end of the string with REPNZ SCASB, subtract to find
the length which you put in CX, and do the move with REP MOVSB.  On
the 286, that limits you to 64K strings, and will puke going over
segment boundaries, but then you have lots of other limitations with
the 286 anyhow.  I'll still bet this is faster than the code generated
by MOST 286 compilers; I *know* it's faster than what is in
MicroPork's C library :-).
-- 
    /\              -  "Against Stupidity,  -    {backbones}!
   /\/\  .    /\    -  The Gods Themselves  -  utah-cs!utah-gr!
  /    \/ \/\/  \   -   Contend in Vain."   -  uplherc!sp7040!
 / U i n T e c h \  -       Schiller        -     obie!wes