[comp.lang.c] strcpy specifications

chris@mimsy.UUCP (Chris Torek) (04/09/88)

In article <4343@ihlpf.ATT.COM> nevin1@ihlpf.ATT.COM (00704a-Liber) writes:
>You are not defining *what* the function does (ie, you are not making an
>abstract *description* of the function); you are defining *how* the
>function does a strcpy (ie, how it is suppose to be *implemented*).

Nope.  I can define what strcpy should do without saying how it should
do it:


    char *strcpy(char *dst, char *src);

0.  copies string `src' to `dst'.  `src' and `dst' shall not overlap.

or

1.  copies string `src' to `dst'; if src and dst overlap, the result
    is implementation-defined.

or

2.  copies string `src' to `dst' nondestructively.

or

3.  copies string `src' to `dst' such that the copy is nondestructive
    when src and dst are distinct or when src < dst.

or

4.  copies string `src' to `dst'.  By the time strcpy returns, the
    result is as if the copy were done using the following code:

	while ((*dst++ = *src++) != 0) /*void*/;

>There is no 'such that' part in the specification of strcpy().

In *whose* specification?

>Strcpy(), according to the man page, INCLUDING THE WARNING

What warning?  There is no warning in my string(3).  Maybe the warning
in yours is a bug :-) .

>You are saying that overlapping does *not* yield surprises, which is a direct
>contradiction with the specification.

WHAT specification?

The dpANS uses something like number 1 above; I have been saying that
it may be best for it to use any of 2, 3, or 4.  V7 Unix uses none of
the above.

As general design principles, let me offer these statements:
  - provide as few primitives as you can get away with;
  - make them as general as possible.

Moving a string within a single buffer is a reasonable thing to want to
do; if it is cheap enough to do that with the same primitive that moves
strings from one buffer to another, I would say it should be done.  I
think having separate `memcpy' and `memmove' routines is a mistake,
just as I think having multiple kinds of files (blocked, unblocked,
random, sequential, ...) in an O/S is a mistake.  If you must add a
feature, or a restriction, or a new routine, make sure it carries its
weight (as dmr put it).  I think allowing overlapping strings in
strcpy carries its weight better than does asking people to use
memmove(dst, src, strlen(src)).
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

nevin1@ihlpf.ATT.COM (00704a-Liber) (04/12/88)

In article <10987@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>In article <4343@ihlpf.ATT.COM> nevin1@ihlpf.ATT.COM (00704a-Liber) writes:
>>You are not defining *what* the function does (ie, you are not making an
>>abstract *description* of the function); you are defining *how* the
>>function does a strcpy (ie, how it is suppose to be *implemented*).

>Nope.  I can define what strcpy should do without saying how it should
>do it:

>    char *strcpy(char *dst, char *src);

>0.  copies string `src' to `dst'.  `src' and `dst' shall not overlap.

As my man page states.

>1.  copies string `src' to `dst'; if src and dst overlap, the result
>    is implementation-defined.

As a few people on the net want it stated.

>2.  copies string `src' to `dst' nondestructively.

This one can never be right, since some types of overlap are destructive.

>3.  copies string `src' to `dst' such that the copy is nondestructive
>    when src and dst are distinct or when src < dst.

By 'src < dst' do you mean 'strlen(src) < sizeof(dst) / sizeof(char)', or
do you mean that the addresses should just be subtracted??  (Assuming you
are talking about the length(dst) instead of address dst, I'm not sure what
you mean by length(dst).  It can't be strlen(dst), since this is
meaningless for a newly malloc()ed block.)
Since you are (thoretically, anyway) trying to define a standard, please be
more precise with your terms.  That's what got us into this trouble in the
first place! :-)

>4.  copies string `src' to `dst'.  By the time strcpy returns, the
>    result is as if the copy were done using the following code:
>
>	while ((*dst++ = *src++) != 0) /*void*/;

Sorry, but this IS defining it in terms of an implementation!  If you were
to define it in terms of what the properties of your 'while' statement is,
then I would be satisfied that your definition is implementation-free.


Just what are the properties of implementing strcpy() as the while loop you
stated above?  Here is the list I came up with:

Case I:		strlen(src) < abs(src - dst)

This is the non-destructive strcpy() that we all know and love.   :-)

Case II:	0 < dst - src <= strlen(src)

This is an infinite loop which trashes memory starting at location dst.

Case III:	src == dst

Nothing happens except for a few lost CPU cycles.

Case IV:	0 < src - dst <= strlen(src)

This is the DESTRUCTIVE strcpy().  When done, the array src[] is the string
which was formerly pointed to by 2 * src - dst.


Now, ask yourself a question.  Wouldn't it be nice to tell the difference
between someone using strcpy() for a non-destructive use instead of using
strcpy() for this VERY SPECIALIZED destructive use (as given in Case IV)??
Personally, there are very few times that I need or want to change the src
string AND the dst string in this manner at the same time (none that I can
think of).  And those few times that I need to do both of these at the same
time I would rather call a different function, for the sake of readability
and maintainability.  But, if this is added to the Standard, then strcpy()
should always be used when Case IV destructive copies are needed as well as
when Case I non-destructive copies are needed.

From what I understand, a degenerate of Case IV is currently being relied
upon (ie, destructive copies where the programmer doesn't care about what
happens to the src string).  If this were added to the Standard, then the
whole of Case IV would start being relied upon, and this would just lead to
horrible programming styles!!

>As general design principles, let me offer these statements:
>  - provide as few primitives as you can get away with;
>  - make them as general as possible.

I agree.  However, it is worth adding a not-so-general primitive if it will
be used a lot and/or it's efficiency can be significantly improved (such as
having printf() as well as the more general fprintf()).

>I think allowing overlapping strings in
>strcpy carries its weight better than does asking people to use
>memmove(dst, src, strlen(src)).

But you don't allow all types of overlapping strings with your primitive;
only a very special subset of overlapping strings (where src >= dst).  And
by adding this to the Standard, you also allow an abuse of strcpy() when it
is used specifically to modify the src string.
-- 
 _ __			NEVIN J. LIBER	..!ihnp4!ihlpf!nevin1	(312) 510-6194
' )  )				"The secret compartment of my ring I fill
 /  / _ , __o  ____		 with an Underdog super-energy pill."
/  (_</_\/ <__/ / <_	These are solely MY opinions, not AT&T's, blah blah blah

chris@mimsy.UUCP (Chris Torek) (04/12/88)

I am not going to respond to the whole thing, for I am getting quite
sick of this debate.

`>>' below is mine, `>' is Nevin:
In article <4383@ihlpf.ATT.COM> nevin1@ihlpf.ATT.COM (00704a-Liber) writes:

>>[optional def. 2]  copies string `src' to `dst' nondestructively.

>This one can never be right, since some types of overlap are destructive.

It is `right' in the same sense that it is `right' to describe
memmove(src, dst, len) as `nondestructive'.

>>3.  copies string `src' to `dst' such that the copy is nondestructive
>>    when src and dst are distinct or when src < dst.

>By 'src < dst' do you mean 'strlen(src) < sizeof(dst) / sizeof(char)', or
>do you mean that the addresses should just be subtracted??

If `src' and `dst' point to different objects, they cannot overlap,
and there is no question as to interference.  If `src' and `dst' *do*
point to places within the same object---e.g.,

	char buf[1000]; src = &buf[0]; dst = &buf[500];

---then the two pointers can be meaningfully subtracted, so the
condition is simply `src < dst'.  In other words, yes, the pointers
should simply be subtracted, as long as it is meaningful to do so.

>Since you are (thoretically, anyway) trying to define a standard, please be
>more precise with your terms.

I might if I thought anything might come of this.

>>4.  copies string `src' to `dst'.  By the time strcpy returns, the
>>    result is as if the copy were done using the following code:
>>
>>	while ((*dst++ = *src++) != 0) /*void*/;

>Sorry, but this IS defining it in terms of an implementation!

Yes.  You may have noticed that I was working from least to most
concrete [possibly with definitions 0 & 1 reversed].  The absolute most
concrete definition is to say `this library function shall be
implemented by the following C code'---that pins the semantics of the
routine down as firmly as they may ever be pinned (providing, of
course, that you have already defined the actions of the various
statements).

I was merely trying to show (with the `overlap' definitions 2, 3)
that one can be less concrete and still make claims about overlapping
copies.

>If you were to define it in terms of what the properties of your
>'while' statement is, then I would be satisfied that your definition
>is implementation-free.

The properties of the `while' statement were defined back in section
three.  Why should I repeat them?  But this *is* an implemetational
specification, although there is an escape clause (the `as if' rule).

>From what I understand, a degenerate of Case IV is currently being relied
>upon (ie, destructive copies where the programmer doesn't care about what
>happens to the src string).

Yes.

>If this were added to the Standard, then the whole of Case IV would
>start being relied upon,

Quite possibly.

>and this would just lead to horrible programming styles!!

That remains to be demonstrated.

>>I think allowing overlapping strings in
>>strcpy carries its weight better than does asking people to use
>>memmove(dst, src, strlen(src)).

>But you don't allow all types of overlapping strings with your primitive;
[which definition? I think he means 3]
>only a very special subset of overlapping strings (where src >= dst).

>And by adding this to the Standard, you also allow an abuse of strcpy()
>when it is used specifically to modify the src string.

[now he probably means 4, but more generally:]
What makes this an abuse, beyond the fact that right now *your*
manual entry (but not mine!) says so?  Why is this *inherently* wrong?

The facts:

  - There exists code now that looks like this:

	char buf[SIZE];
	...
	if (buf[0] == '/')	/* remove the leading slash */
		(void) strcpy(buf, buf + 1);

  - According to the current draft, this operation is `implementation
    defined'.

  - There are no known implementations in which this does anything
    other than what the comment in the above code suggests.

  - Hence, making this particular action well-defined would affect
    no known implementations, but would make the code above portable.

Opinions:

  - This intended action of that code is a reasonable thing to
    want.  (`char *bp; bp = buf[0]=='/' ? buf+1 : buf' will usually
    be faster and cleaner, but may be contraindicated for some
    other reason.)

  - Defining strcpy such that this operation is well-defined is
    a reasonable thing to do.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

ok@quintus.UUCP (Richard A. O'Keefe) (04/13/88)

In article <4383@ihlpf.ATT.COM>, nevin1@ihlpf.ATT.COM (00704a-Liber) writes:
> As my man page states.
> >1.  copies string `src' to `dst'; if src and dst overlap, the result
> >    is implementation-defined.

I find that the 2nd edition of the System V Interface Definition says that
	Character movement is performed differently in different
	implementations.  Thus overlapping moves may yield surprises.

Since I generally take the SVID to be the official definition of things,
it follows that I was wrong to rely on strcpy(s, s+1) and should now use
my own C code in such cases.  Sigh.  It would be nice to have a definition
of what precisely IS an "overlapping move":  I have encountered machines
where the critical area was the range of _words_ including a sequence, not
the bytes alone.

Nevin Liber objected to Chris Torek's attempt to define strcpy() by
exhibiting C code for it.  It may be naive of me, but since the rest of
the standard is supposed to define the constructs Chris Torek used,
giving a definition by means of C code seems to me to be the ideal way
of defining such operations.  I have no reason to expect the dpANS
drafters to be any better at writing English definitions of things than
Chris Torek is at writing C code, and you can at least _test_ the C code
to see if it does what you intended.  From a user's point of view, having
something defined so clearly and unambiguously seems like a good idea.

C code is appropriate for defining some things (notably the "string"
operations) and not appropriate for others (notably the floating-point
library functions).  I think there are two reasons for this:
(a) The "string" operations only need primitive operations which the
    rest of the standard is supposed to define thoroughly, but cos() and
    so on depend on floating-point arithmetic which the standard leaves
    rather vague and (necessarily) rather implementation-dependent.
(b) The "string" operations belong to C, so the C community can define
    them however they please, but cos() and so on already have other
    definitions, so can't be bent to suit C's convenience.