[net.lang.c] strncpy

jeff@rlgvax.UUCP (Jeffrey Kegler) (08/19/83)

As most of you know, the UNIX philosophy is to have everything have
explicit terminators, instead of being terminated by counts.

In strcpy() this has raised the complaint that a slight error in an
argument to strcpy() can result is destruction of a considerable amount
of data.  This could happen when the string to be copied is not
properly terminated, or is a bad pointer, or is just overlong.  One can
say that it is really up to the programmer to check for these things.
Another part of the UNIX philosophy is, wherever something might be
seen as either a protection or a restriction of the programmer,
depending on point of view, to assume it is a restriction.

For those of us who do desire to be protected from our own lapses,
especially when these lapses may lead to a core with its stack
destroyed, strncpy() exists.  It takes a third argument which gives an
explicit maximum length to the string.  However, for some strange
reason, it still does not guarantee its result will be null terminated
if it is overlong.  Instead it puts characters right up into the last
permitted location.  Always writing

		strncpy(string1, string2, n);
		string1[n] = '\0';

solves the problem, but whenever I see this in code, I wonder why
UNIX's string copy of explicitly n characters does not always result in
a string of n characters.

Further, it always copies a string of the length specified, adding
nulls once the source string has been terminated.  I can think of
situations where this is useful, but one has never occurred to me, and
far more often I wind up copying a string whose maximum length is 5000
and whose average length is 10.  So as a byproduct of this feature, I
would get an average of 4990 extra trips through the loop.

Finally, strncpy() is far less efficient than it could be.  Its main
loop maintains and updates both a string pointer and a count, and each
character copied requires a comparison to n, which is not is a
register.  I would get more into its behavior, but I do not know if the
code is public domain, unlike strcpy(), 4 versions of which are in K&R
on pages 100-101.

Below is the code for my rewrite of strncpy(), which never accesses
anything except a register in its main loop (if your compiler supports
this sort of thing), has a main loop one instruction shorter on the VAX
(all claims are for code generated by the C optimizer), guarantees the
copied string will be a legal string, even where the original was not,
and does no copying beyond the end-of-string.  The main efficiency
improvement does not depend on my changes from the behavior of
strncpy(), and scpy() could be rewritten to simulate strncpy()
exactly.

===========================

/*
 * Copy a string in a fail-safe manner.
 */
void
scpy(s1,s2,n)
register char *s1,*s2;
{
	register char *eos = s1+n-1;

	while(*s1++ = *s2++)
		if (s1 >= eos) {
			*eos = '\0';
			return;
		}
}

==============================

One could argue that the above is not as fail-safe as it should be, because
if n is zero or negative, the byte before s1 is zeroed, and a check for
such an n could be placed before the main loop.

tll@druxu.UUCP (08/20/83)

One place where strncpy's behavior is desirable is in dealing with UNIX*
directory entries.  A directory entry contains a 14-character array for
the filename, which is padded with nulls.  It is only null-terminated
if the filename is less than 14 characters long.

			Tom Laidig

* UNIX is a trademark of some subsidiary of AT&T, which may have almost
  any name or logo, depending on the date nad Judge Greene's mood.

mash@whuxlb.UUCP (John Mashey) (08/20/83)

strcpyn() (before it was renamed strncpy(), was originally
written specifically t odeal with directories, accouting records,
and various other places in UNIX that expect fixed-length fields
containing varaible length strings padded with nulls.
There used to be numerous instances of slightly different code to
do the strncpy/strncmp functions in-line.
-mashey

stevesu@bronze.UUCP (Steve Summit) (08/31/83)

(Tom Laidig pointed out that the sometimes peculiar behavior of
strncpy(), i.e. not always appending a '\0', is useful in strings
that need not be null terminated, like filenames in directories.)

By the way, that reminds me of a tidbit I learned while building
directories by hand (I was salvaging a trashed filesystem, and it
was PAINFUL): 

Those filenames that are shorter than DIRSIZE (14) characters
should be fully null padded.  The kernel must not use the strncmp()
I'm used to, because if there are characters other than '\0'
after the first '\0', it won't match.

					Steve Summit

guy@rlgvax.UUCP (Guy Harris) (09/01/83)

The kernel does not use "strncmp" at all for comparing directory entries.
Given this, there is a bug in "fsck" (V7, S3, 4.1BSD) where it will NOT zero
out the full d_name portion of a directory entry when it is reattaching an
orphan file to "lost+found".  If the slot being used for that had a pathname
longer than the (6 or 7, depending on which UNIX) characters used by the
reattachment name, you will end up with a totally inaccessible file with a
null character in the middle of its name.

	Guy Harris
	{seismo,mcnc,we13,brl-bmd,allegra}!rlgvax!guy

mrm@datagen.UUCP (09/01/83)

Strncpy() is spec'ed to completely  null pad the buffer if given a shorter
string, and not to append any nulls if given a same size or larger string.

	Michael Meissner	Data General Corporation
	...(decvax!ittvax, allegra)!datagen!mrm

hal@cornell.UUCP (Hal Perkins) (09/04/83)

Why are so many articles about C also posted to unix-wizards?  Is there
any need for this?

The reason I complain is that there is a lot of traffic in both news
groups.  I read these groups in my spare time (when waiting for troff
or TeX to do something, for instance), and I rarely have time to read
through both of these groups in one sitting.  Readnews appears to be
unable to filter articles posted to several groups unless you read all
of the groups at the same time.

Of course, the real solution would be to fix the news programs so they
would remember what articles have been read, regardless of when.  This
is a long standing bug, and it is pretty annoying that it hasn't been
fixed sometime in the last few releases of readnews.

But even if the news stuff is fixed, why can't all the C articles be
posted to net.lang.c, all the micro stuff to net.micro, and just the
Unix operating system items to net.unix-wizards?  It would be easier
to filter through the tremendous amount of stuff in unix-wizards if
as much of it as possible were posted to more relevent specialized
groups.

Arrrrgh


Hal Perkins                         UUCP: {decvax|vax135|...}!cornell!hal
Cornell Computer Science            ARPA: hal@cornell  BITNET:  hal@crnlcs