[net.lang.c] Need strnlen

stephen@dcl-cs.UUCP (Stephen J. Muir) (11/08/85)

I think that there should be a function called "strnlen" as follows:

int strnlen (string, size)
	char	*string;
	int	size;

where "size" is the maximum number of bytes in "string".

The reason is that, if a character array is passed as argument, and it is not
terminated with a null byte, then "strlen" will keep going, possibly hitting an
unallocated piece of memory.
-- 
UUCP:	...!seismo!mcvax!ukc!dcl-cs!stephen
DARPA:	stephen%comp.lancs.ac.uk@ucl-cs	| Post: University of Lancaster,
JANET:	stephen@uk.ac.lancs.comp	|	Department of Computing,
Phone:	+44 524 65201 Ext. 4599		|	Bailrigg, Lancaster, UK.
Project:Alvey ECLIPSE Distribution	|	LA1 4YR

gwyn@BRL.ARPA (VLD/VMB) (11/11/85)

Re: proposed strnlen()

All the str*() functions work with NUL-terminated strings.
Probably the best thing that could happen if you fed them
something else would be for the implementation to access
unallocated memory.  At least then you would be able to
find the bug.

Your suggestion would make more sense for the mem*()
functions, which work with arbitary byte arrays.  A good
generalization would be "find the first occurrence of a
given byte, not going beyond a certain distance".  If you
look up memchr(), you will see that that's what it does.

It is possible that your system is not System V compatible,
in which case the memchr() function may not be in your C
library.  But it is very easy to implement and there is no
need to invent a different-shaped wheel.
	char *memchr( const char *s, int c, int n );
returns a pointer to the first occurrence of `c' in the
first `n' bytes of memory area (whose lowest address is)
`s', or NULL if not found.  Calling this function with
`c' set to '\0' is equivalent to your proposed strnlen().
(You would have to handle the exception case, and use
whatever memchr() returns minus `s' in the regular case
for the length of the string.)

dlc@a.sei.cmu.edu (Daryl Clevenger) (11/13/85)

One should never allow a character array to not have a null terminating byte.
Doing things like this should be automatic.  Besides, usually when the program
dumps core, one usually can find the problem readily.  Also, if I am not
mistaken, having character arrays that do not have terminating null bytes
will cause problems with many other funtions e.g printf().  printf() (or
maybe _doprint() I'm not sure which) will keep printing characters until they
hit that null byte, but they probably won't find it where it should be.
Unless I'm wrong, making sure strings are null terminated should be as
automatic as making sure that you aren't trying to use NULL pointers or
that malloc() returns a valid pointer.

vishniac@wanginst.UUCP (Ephraim Vishniac) (11/14/85)

> I think that there should be a function called "strnlen" as follows:
> 
> int strnlen (string, size)
> 	char	*string;
> 	int	size;
> 
> where "size" is the maximum number of bytes in "string".
> 
I agree that such a facility is needed, but I don't think it will ever
be provided as "standard" C.  The basic problem (or the C problem, if you
prefer :-) is that C defines a representation for strings, but leaves
the user to implement the operations.  This allows one to cook up all sorts
of invalid strings (such as the unterminated ones the original poster is
worried about).

To my mind, a string consists of three things:
	1.  The characters of the string;
	2.  The length of the string, either as such or encoded by marking
		the string;
	3.  The storage block where the string is located, which has its
		own attributes (alignment and size, to name two).
C has no problem with the first (the characters are easy to access); some
problems with the second (null termination is good for some purposes,
rotten for others); and completely ignores the third.

But: since this is C, you don't have to use the standard representation and
functions.  Just as I did when sufficiently burned, you can use your own
representation and macros.  Then the only problem is that nobody will use
your modules, because they're "non-standard".

-- 
Ephraim Vishniac
  [apollo, bbncca, cadmus, decvax, harvard, linus, masscomp]!wanginst!vishniac
  vishniac%Wang-Inst@Csnet-Relay

mikes@3comvax.UUCP (Mike Shannon) (11/14/85)

Stephen Muir in the cited article:
> I think that there should be a function called "strnlen" as follows:
> 
> int strnlen (string, size)
> 	char	*string;
> 	int	size;
> 
> where "size" is the maximum number of bytes in "string".
> 
> The reason is that, if a character array is passed as argument, and it is not
> terminated with a null byte, then "strlen" will keep going, possibly hitting
> an unallocated piece of memory.

	Then write one!  Geez!!
-- 
			Michael Shannon {ihnp4,hplabs}!oliveb!3comvax!mikes

mek@rruxd.UUCP (M Kaufman) (11/16/85)

>I think that there should be a function called "strnlen" as follows:

>int strnlen (string, size)
>	char	*string;
>	int	size;
>
>where "size" is the maximum number of bytes in "string".

	I THINK WE SHOULD PLACE A TAX ON ALL FOREIGNERS LIVING ABROAD!

				With Apologies to R. P. Gumby, Esq.

jack@boring.UUCP (11/16/85)

In article <207@a.sei.cmu.edu> dlc@a.sei.cmu.edu (Daryl Clevenger) writes:
>One should never allow a character array to not have a null terminating byte.
That's right, one *should* never allow it. On the other hand,
if ou look at the format of /etc/utmp (v7, at least, dunno about
4.n or S5), there's 8 bytes available for loginnames, so if
you've got an 8 char loginname, it won't be zero terminated.

Or, look at the V7 directory entry for a 14-char filename.

I *know* these should be avoided, but I would still prefer to see
code like
	len = strnlen(buffer,BUFSIZ);
above
	if( (len=strlen(buffer))>BUFSIZ) len=BUFSIZ;
which will fail at some unknown point in the future, long
after everyone who knew the code has passed away.......
-- 
	Jack Jansen, jack@mcvax.UUCP
	The shell is my oyster.

jsdy@hadron.UUCP (Joseph S. D. Yao) (11/16/85)

In article <6691@boring.UUCP> jack@boring.UUCP (Jack Jansen) writes:
>I *know* these should be avoided, but I would still prefer to see
>code like
>	len = strnlen(buffer,BUFSIZ);
>above
>	if( (len=strlen(buffer))>BUFSIZ) len=BUFSIZ;
>which will fail at some unknown point in the future, long
>after everyone who knew the code has passed away.......

That code should have failed immediately!  If it ever got to a code
review, it should have been shot dead immediately.  You  NEVER
assume that there's something beyond where you put it, unless ...
you put it there.	;-)

#define NUL	'\0'	/* Now, class, tell me why this is not NULL. */

	char buffer[BUFLEN+1];
		...
	fread(buffer, 1, BUFLEN, infile);
	buffer[BUFLEN] = NUL;
	len = strlen(buffer);	/* if you must */
-- 

	Joe Yao		hadron!jsdy@seismo.{CSS.GOV,ARPA,UUCP}

steiny@scc.UUCP (Don Steiny) (11/18/85)

**

	Yikes! Too much talk, I have 25 seconds to spare, here: 

	strnlen(str,len)
	char *str;
	{
		register *s;

		for(s=str;*s && (int) s-str<len;s++)
			;
		return((int) s-str);
	}
	

-- 
scc!steiny
Don Steiny @ Don Steiny Software 
109 Torrey Pine Terrace
Santa Cruz, Calif. 95060
(408) 425-0382

stephen@dcl-cs.UUCP (Stephen J. Muir) (11/19/85)

In article <207@a.sei.cmu.edu> dlc@a.sei.cmu.edu (Daryl Clevenger) writes:
>One should never allow a character array to not have a null terminating byte.

This is absolute rubbish.  If I want character arrays without a terminating
null byte then I'm quite entitled to do that.  In fact, I *have* to do that as
I'm writing interface routines for ADA.  "strnlen" is to "strlen" as "strncmp"
is to "strcmp".  I've written the routine myself now, but I just think that it
should be part of the standard library, that's all.
-- 
UUCP:	...!seismo!mcvax!ukc!dcl-cs!stephen
DARPA:	stephen%comp.lancs.ac.uk@ucl-cs	| Post: University of Lancaster,
JANET:	stephen@uk.ac.lancs.comp	|	Department of Computing,
Phone:	+44 524 65201 Ext. 4599		|	Bailrigg, Lancaster, UK.
Project:Alvey ECLIPSE Distribution	|	LA1 4YR

levy@ttrdc.UUCP (Daniel R. Levy) (11/19/85)

In article <562@scc.UUCP>, steiny@scc.UUCP (Don Steiny) writes:
>**
>
>	Yikes! Too much talk, I have 25 seconds to spare, here:
>

Yikes! Needs a brushup, I have 25 seconds to spare, here: [:-)]

>	int strnlen(str,len)
	^^^
>	char *str;
	
	int len;
	^^^^^^^^

>	{
>		register char *s;
			 ^^^^
>
>		for(s=str;*s && s-str<len;s++) /* Diff of pointers is auto-
						matically cast to int */
>			;
>		return(s-str);
>	}
>	

	/* Did I get it right, Guy?  Huh? Huh??? :-) */
>
>--
>scc!steiny
>Don Steiny @ Don Steiny Software
>109 Torrey Pine Terrace
>Santa Cruz, Calif. 95060
>(408) 425-0382
-- 
 -------------------------------    Disclaimer:  The views contained herein are
|       dan levy | yvel nad      |  my own and are not at all those of my em-
|         an engihacker @        |  ployer or the administrator of any computer
| at&t computer systems division |  upon which I may hack.
|        skokie, illinois        |
 --------------------------------   Path: ..!ihnp4!ttrdc!levy

mash@mips.UUCP (John Mashey) (11/19/85)

> One should never allow a character array to not have a null terminating byte.
> Doing things like this should be automatic. 
Yes, except that for (at the time, and perhaps still) good reasons,
certain structures had strings that might not be null-terminated.
Such include directories (except in 4.2), accounting file command names, 
utmp/wtmp.  Those things are why the strn* routines were written in
the first place.

I generally agree with the first advice, but there are at least a few
occasions where one extra byte is fairly handy.
-- 
-john mashey
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash
DDD:  	415-960-1200
USPS: 	MIPS Computer Systems, 1330 Charleston Rd, Mtn View, CA 94043

roger@dedalus.UUCP (Roger L. Cordes Jr.) (11/19/85)

	What is the big deal here? How about:

extern	int	strlen();
static	int	_nstrnlen_;	/* to avoid two calls to strlen() */
#define	strnlen(S,N)	( (_nstrnlen_=strlen(S)) > (N) ? (N) : _nstrnlen_ )

	or:

int	strnlen(s,n)
char	*s;
int	n;
{
	char	*c;
	int	len = 0;

	for ( c=s; *c && (len<n); c++, len++ )
		;
	return(len);
}

	Was it a joke? Did I miss something?

	Roger L. Cordes, Jr.		William G. Daniel & Associates
	...!mcnc!ikonas!dedalus!roger	8000 Regency Parkway, Suite 140
	(919) 467-9708			Cary, N.C.  27511

bc@cyb-eng.UUCP (Bill Crews) (11/20/85)

> One should never allow a character array to not have a null terminating byte.
> Doing things like this should be automatic.  Besides, usually when the program
> dumps core, one usually can find the problem readily.  Also, if I am not
> mistaken, having character arrays that do not have terminating null bytes
> will cause problems with many other funtions e.g printf().  printf() (or
> maybe _doprint() I'm not sure which) will keep printing characters until they
> hit that null byte, but they probably won't find it where it should be.
> Unless I'm wrong, making sure strings are null terminated should be as
> automatic as making sure that you aren't trying to use NULL pointers or
> that malloc() returns a valid pointer.

It should be obvious that one is sometimes governed by external
constraints, such as the format of a directory entry.  If a file or
directory entry format contains a fixed-length field which contains
textual data that can fill the entire field, one must deal with it
somehow.
-- 
	- bc -

..!{seismo,topaz,gatech,nbires,ihnp4}!ut-sally!cyb-eng!bc  (512) 835-2266

gwyn@BRL.ARPA (VLD/VMB) (11/20/85)

Maybe you should have taken more than 25 seconds.
Your strnlen() code has a bug (repeated twice).

jsdy@hadron.UUCP (Joseph S. D. Yao) (11/20/85)

OK.  Let's all stop arguing over whether this routine should exist.

#define NUL	'\0'

/*
** This routine, for whatever reason, wants to calculate the length
** of string 's' but be sure that it stops before element 'n'.
*/
int strnlen(s, n)
  register char *s;
  register int n;
{
	register int i;

	for (i = 0; i < n; i++) {
		if (*s++ == NUL)
			break;
	}
	return(i);
}

There.  It exists.  If you must use it, do so.  I will try not to,
but may have to some day.  Fair 'nuff?
-- 

	Joe Yao		hadron!jsdy@seismo.{CSS.GOV,ARPA,UUCP}

peter@graffiti.UUCP (Peter da Silva) (11/21/85)

> #define NUL	'\0'	/* Now, class, tell me why this is not NULL. */
> 
> 	char buffer[BUFLEN+1];
> 		...
> 	fread(buffer, 1, BUFLEN, infile);
> 	buffer[BUFLEN] = NUL;
> 	len = strlen(buffer);	/* if you must */

Uh...

	len = fread(buffer, BUFLEN, 1, infile);
	if(len==0)
		aha(endoffile);
	if(len<0)
		ohmygod(panic);
	else
		buffer[len] = NUL;

...would be much better code. Don't assume fread succeeds (it fails at least
once on almost any file :-)).
-- 
Name: Peter da Silva
Graphic: `-_-'
UUCP: ...!shell!{graffiti,baylor}!peter
IAEF: ...!kitty!baylor!peter

preece@ccvaxa.UUCP (11/21/85)

> I think that there should be a function called "strnlen" as follows:
> 
> int strnlen (string, size)
> 	char	*string;
> 	int	size;
> 
> where "size" is the maximum number of bytes in "string".
> /* Written  4:14 pm  Nov  7, 1985 by stephen@dcl-cs.UUCP
>  in ccvaxa:net.lang.c */
----------
This seems perfectly reasonable and useful.  Most of the previously
posted responses seem pretty obnoxious.  On the other hand, where
you should have put this is mod.std.c.  This is something that
belongs in the standard for the string library routines.

You might make a proposal for what value you'd like returned
if no null character is found before the given limit (-1 and size
are reasonable alternatives).

-- 
scott preece
gould/csd - urbana
ihnp4!uiucdcs!ccvaxa!preece

steiny@scc.UUCP (Don Steiny) (11/22/85)

>
> >**
> 
> >	int strnlen(str,len)
> 	^^^
> >	char *str;
> 	
> 	int len;
> 	^^^^^^^^

	Why bother to do that?  It is implicitly int anyway.

-- 
scc!steiny
Don Steiny @ Don Steiny Software 
109 Torrey Pine Terrace
Santa Cruz, Calif. 95060
(408) 425-0382

gwyn@BRL.ARPA (VLD/VMB) (11/25/85)

I disagree that strnlen() should be proposed for the X3J11
standard.  The draft standard already includes memchr(),
which offers similar functionality.  The standard should
not serve as a repository for everybody's personal favorite
private function library.

jsdy@hadron.UUCP (Joseph S. D. Yao) (11/27/85)

In article <715@dedalus.UUCP> roger@dedalus.UUCP (Roger L. Cordes Jr.) writes:
>	What is the big deal here? How about:
>extern	int	strlen();
>static	int	_nstrnlen_;	/* to avoid two calls to strlen() */
>#define	strnlen(S,N)	( (_nstrnlen_=strlen(S)) > (N) ? (N) : _nstrnlen_ )

Once again:  doing a strlen() on something not guaranteed to be
NUL-terminated is guaranteed to net you a core dump sooner or later.
You can declare str to be char str[MAXSIZE+1] and always do a two-step:
str[MAXSIZE] = NUL; n = strlen(str); or you can use the deplorable
function I put on the net to try and end this line of discussion; or
you can give the people who use your code fun trying to figure out
"Oh no!  How did I break the computer?  Where did the core dump, and
can I sweep it under the rug before morning?"	;-)

Incidentally, Doug's suggestion to use memchr() is great, once we
all have ANSI compilers.	;-}
-- 

	Joe Yao		hadron!jsdy@seismo.{CSS.GOV,ARPA,UUCP}

preece@ccvaxa.UUCP (11/28/85)

If someone would post either a draft of x3j11 or a pointer to
somewhere whence it can be FTPed, more of us might know what
was in it...

-- 
scott preece
gould/csd - urbana
ihnp4!uiucdcs!ccvaxa!preece

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (11/30/85)

> If someone would post either a draft of x3j11 or a pointer to
> somewhere whence it can be FTPed, more of us might know what
> was in it...

I asked its editor about this, and he noted that
	(a) It is too big for net-copying around
	(b) To cover costs, ANSI needs to charge a nominal
		fee for membership or observership
I think you can get a copy from the X3 Secretariat for on
the order of $20.  Perhaps someone has more exact information
on this..

peter@baylor.UUCP (Peter da Silva) (01/18/86)

> One should never allow a character array to not have a null terminating byte.

Except that in lots of places in UNIX you find character arrays that may or
may not be null-terminated. Examples: directory entries and entries in
/etc/utmp. The following program fragment lists all files in the current
directory seperated by commas:

	if(!(fp = fopen(".", "r"))) {
		perror(".");
		return(ERROR);
	}
	commastring = "";
	while(fread(dirp, 14, 1, fp))
		if(dirp->d_ino) {
			printf("%s%.14s", commastring, dirp->d_name);
			commastring = ", ";
		}
	if(commastring[0])
		putchar('\n');

dirp->d_name may or may not be null terminated. Printf doesn't get bent out
of shape over it, now does it? (please, no flames from 4.2 people who want
me to use their routines. I know all about them. (1) I'm on a 4.2 system
right now. (2) I posted a generic UNIX implementation of them to net.sources
a few months ago (and had to deal with non-null-terminated strings then)).
-- 
-- Peter da Silva
-- UUCP: ...!shell!{baylor,graffiti}!peter; MCI: PDASILVA; CIS: 70216,1076