[comp.std.c] Safe coding practices

diamond@jit345.swstokyo.dec.com (Norman Diamond) (01/24/91)

In article <22855@well.sf.ca.us> Jef Poskanzer <jef@well.sf.ca.us> writes:

>    #define MAXNAMES 1000
>    static char users[MAXNAMES][UT_NAMESIZE+1];
>    (void) strncpy( users[nusers], u.ut_name, UT_NAMESIZE );
>    users[nusers][UT_NAMESIZE] = '\0';
>And yes, this will fail if more than 1000 users are logged in at
>the same time.  Imagine how concerned I am.

Uh, maybe equally concerned as people who knew that their operating system
would never last 10 years, or 28 years, or whatever?
Equally concerned as people who knew that the spacecraft would not last a
year, or when it did, they knew it wouldn't last another 4 years?
You should know to set a better example than this.

Followups (if any are necessary) are directed to comp.lang.c.
--
Norman Diamond       diamond@tkov50.enet.dec.com
If this were the company's opinion, I wouldn't be allowed to post it.

jef@well.sf.ca.us (Jef Poskanzer) (01/25/91)

In the referenced message, diamond@jit345.enet@tkou02.enet.dec.com (Norman Diamond) wrote:
}In article <22855@well.sf.ca.us> Jef Poskanzer <jef@well.sf.ca.us> writes:
}>    #define MAXNAMES 1000
}>    static char users[MAXNAMES][UT_NAMESIZE+1];
}>    (void) strncpy( users[nusers], u.ut_name, UT_NAMESIZE );
}>    users[nusers][UT_NAMESIZE] = '\0';
}>And yes, this will fail if more than 1000 users are logged in at
}>the same time.  Imagine how concerned I am.
}
}Uh, maybe equally concerned as people who knew that their operating system
}would never last 10 years, or 28 years, or whatever?
}Equally concerned as people who knew that the spacecraft would not last a
}year, or when it did, they knew it wouldn't last another 4 years?

Gosh, in ten years, if every trend in computer usage magically reverses
itself, I'll get a message telling me to change the number from 1000 to
10000.  Yes, it does check for overflow.

}You should know to set a better example than this.

I think this is an *excellent* example of appropriate programming
technology.  Dan Bernstein's hack of reading utmp twice and allocating
50 extra slots in case more users log in between the two is, when you
come down to it, *no better*.  Just more complicated.  Worse, in fact,
since he *doesn't* check for overflow.  He complained about a hard
limit of 200 users and then went and programmed a different hard limit
of 50 new users in an unknowable time period.  Foo on that.  If you
must handle an arbitrary number of users, do the doubling-realloc
trick.  But don't invest the effort until you get at least one report
of someone overflowing the fixed-size array, since any malloc hacking
that anyone does has a good chance of being buggy.  End of sermon.
---
Jef

  Jef Poskanzer  jef@well.sf.ca.us  {apple, ucbvax, hplabs}!well!jef
                       "Why me, John Bigboote?"

diamond@jit345.swstokyo.dec.com (Norman Diamond) (01/25/91)

In article <22870@well.sf.ca.us> Jef Poskanzer <jef@well.sf.ca.us> writes:
>In the referenced message, diamond@jit345.enet@tkou02.enet.dec.com (Norman Diamond) wrote:
>}In article <22855@well.sf.ca.us> Jef Poskanzer <jef@well.sf.ca.us> writes:
>}>    #define MAXNAMES 1000
>}>    static char users[MAXNAMES][UT_NAMESIZE+1];
>}>    (void) strncpy( users[nusers], u.ut_name, UT_NAMESIZE );
>}>    users[nusers][UT_NAMESIZE] = '\0';
>}>And yes, this will fail if more than 1000 users are logged in at
>}>the same time.  Imagine how concerned I am.
>}
>}Uh, maybe equally concerned as ...
>
>Gosh, in ten years, if every trend in computer usage magically reverses
>itself, I'll get a message telling me to change the number from 1000 to
>10000.

Suppose someone starts logging NFS clients?  Or the clients of some other
service?  1000 would already be a bit small for that.

>Yes, it does check for overflow.

Uh, you mean that it doesn't abort on overflow, but only gives inaccurate
answers.  OK, so your example does about 1/4 of what a good example would do.

>Dan Bernstein's hack of reading utmp twice and allocating
>50 extra slots in case more users log in between the two is, when you
>come down to it, *no better*.  Just more complicated.  Worse, in fact,
>since he *doesn't* check for overflow.

If I had seen that posting, and if Mr. Bernstein had made some claim about
adequacy, and if I had the time, I would have criticized that too.  In fact,
if I had seen the posting, and given the hypocrisy that you attributed to
him (which I deleted, sorry), then it wouldn't matter if I had the time;
I'd've flamed him ;-) .  But I didn't see it, sorry.
--
Norman Diamond       diamond@tkov50.enet.dec.com
If this were the company's opinion, I wouldn't be allowed to post it.

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (01/25/91)

Ah, yes, Jef takes his place next to Chris on my list of gurus I've
caught in a mistake. Two mistakes, in fact. Read on...

In article <22870@well.sf.ca.us> Jef Poskanzer <jef@well.sf.ca.us> writes:
> I think this is an *excellent* example of appropriate programming
> technology.  Dan Bernstein's hack of reading utmp twice and allocating
> 50 extra slots in case more users log in between the two is, when you
> come down to it, *no better*.

I address this below. You aren't thinking things through. My program
would be objectively better than yours even if it allocated *zero* extra
slots for users who log in after the first read.

> Just more complicated.  Worse, in fact,
> since he *doesn't* check for overflow.

I do check for overflow. See that test for i < lines + 50?

Jef, I expect an apology. I appreciate criticism of my code,
particularly when it gives me better insight into what people are
looking for. But I don't appreciate someone trying to excuse his
programming mistakes by saying ``Dan's code fucks up too'' when, in
fact, my code works exactly as it's supposed to.

Before this, I thought you were the type who would give constructive
criticism---things like ``You should've cast back and forth to char * or
void * at the qsort() interface.'' Not false accusations that show you
hardly even pay attention to what you're talking about.

> He complained about a hard
> limit of 200 users and then

Actually, my main project for last May was writing pty 3.0 from scratch,
including the PD utilities (like u.c, who.c, etc.) that come with the
package. So don't think I'm complaining about a problem without already
having tried to fix it.

> went and programmed a different hard limit
> of 50 new users in an unknowable time period.

You are wrong. You're correctly reporting the limit I coded, but you
said above that this behavior is no better than your fixed limit. In
that statement you are wrong.

I won't go into a long treatise about the principles of taking
snapshots of a dynamic system. But here are the two most important
properties of ``foo'', a utmp scanner: 1. foo reports a user only if he
is logged on at some point between when foo is invoked and when it
finishes. 2. There is some time interval during which foo is running,
such that foo reports any user who is logged on throughout that
interval. (The reason readdir() isn't safe for some applications is that
it doesn't obey #2 in most implementations.)

Guess what? My version of users satisfies these properties. Your version
fails #2.

Now you can talk all you want about reallocating memory (btw, there's no
safe way to use realloc(), but you knew that) to read in as many users
as possible. I'll skip the comments about a quadratic time requirement,
and about people who simply *talk* about code instead of *writing* code,
and cut to the heart of the issue: You won't be able to identify a
single functional requirement that your reallocating version satisfies
and that my users program doesn't. You see, users has to exit at some
point, and before that point there must be a window when users doesn't
detect new logins. No external requirement can tell how big that window
is. So there's no way to tell the difference between a program that cuts
things off when people log on too fast and a program that is cut off by
the scheduler. The best you can do is #2 above. (This explanation isn't
particularly lucid, but if you try to say what advantage a reallocating
version will have, you'll realize that there is none.)

---Dan

jef@well.sf.ca.us (Jef Poskanzer) (01/25/91)

In the referenced message, diamond@jit345.enet@tkou02.enet.dec.com (Norman Diamond) wrote:
}>Gosh, in ten years, if every trend in computer usage magically reverses
}>itself, I'll get a message telling me to change the number from 1000 to
}>10000.
}
}Suppose someone starts logging NFS clients?  Or the clients of some other
}service?  1000 would already be a bit small for that.

Huh?  In the users command?  What are you talking about?  Stick to the
given problem domain.

}>Yes, it does check for overflow.
}
}Uh, you mean that it doesn't abort on overflow, but only gives inaccurate
}answers.  OK, so your example does about 1/4 of what a good example would do.

No, of course that's not what I mean.  It checks for overflow, tells
you that it needs to be recompiled on overflow, and aborts on
overflow.  Why is that so hard to understand?  Complete source is
appended, so that we will have no more creative misunderstandings.
Note that it does a few more things than the usual users command.

Anyway, if you don't like the fixed-size array answer or the
doubling-realloc answer or the read it twice answer, then let's see
what you *do* like.  Time to sling some code, dude.
---
Jef

  Jef Poskanzer  jef@well.sf.ca.us  {apple, ucbvax, hplabs}!well!jef
                            INSPECTED BY #6

/*
** users - show users, with those on a list highlighted
**
** version of 10oct90
**
** Copyright (C) 1990 by Jef Poskanzer.
**
** Permission to use, copy, modify, and distribute this software and its
** documentation for any purpose and without fee is hereby granted, provided
** that the above copyright notice appear in all copies and that both that
** copyright notice and this permission notice appear in supporting
** documentation.  This software is provided "as is" without express or
** implied warranty.
*/

#include <stdio.h>
#include <strings.h>
#include <sys/types.h>
#include <utmp.h>
#ifndef UT_NAMESIZE
#define UT_NAMESIZE 8
#endif
#ifndef _PATH_UTMP
#define _PATH_UTMP "/etc/utmp"
#endif

#define TBUFSIZE 1024
#define MAXNAMES 1000
#define LINEWIDTH 79

extern char* getenv();
extern char* tgetstr();
int cmp();
int inlist();
void putch();

main( argc, argv )
    int argc;
    char* argv[];
    {
    char* term;
    char* strptr;
    char* soptr;
    char* septr;
    int smart;
    char buf[TBUFSIZE];
    static char strbuf[TBUFSIZE];
    struct utmp u;
    FILE* fp;
    static char friends[MAXNAMES][UT_NAMESIZE+1];
    static char users[MAXNAMES][UT_NAMESIZE+1];
    int i, nfriends, nusers;
    int wid;

    /* Check args. */
    if ( argc == 1 )
	;
    else if ( argc == 3 && strcmp( argv[1], "-h" ) == 0 )
	{
	/* Read the friends list. */
	fp = fopen( argv[2], "r" );
	if ( fp == NULL )
	    {
	    perror( argv[2] );
	    exit( 1 );
	    }
	nfriends = 0;
	while ( fgets( buf, sizeof(buf), fp ) != NULL )
	    {
	    if ( buf[strlen(buf)-1] == '\n' )
		buf[strlen(buf)-1] = '\0';
	    if ( nfriends >= MAXNAMES )
		{
		(void) fprintf( stderr, "Oops, too many names in the friends file.  Gotta increase MAXNAMES.\n" );
		exit( 1 );
		}
	    (void) strncpy( friends[nfriends], buf, UT_NAMESIZE );
	    friends[nfriends][UT_NAMESIZE] = '\0';
	    ++nfriends;
	    }
	(void) fclose( fp );
	/* qsort( friends, nfriends, sizeof(friends[0]), cmp ); */
	}
    else
	{
	(void) fprintf( stderr, "usage:  %s [-h highlightlist]\n", argv[0] );
	exit( 1 );
	}

    /* Initialize termcap stuff. */
    if ( isatty( fileno( stdout ) ) == 0 )
	smart = 0;
    else
	{
	term = getenv( "TERM" );
	if ( term == 0 )
	    smart = 0;
	else if ( tgetent( buf, term ) <= 0 )
	    smart = 0;
	else
	    {
	    strptr = strbuf;
	    soptr = tgetstr( "so", &strptr );
	    septr = tgetstr( "se", &strptr );
	    if ( soptr == NULL || septr == NULL )
		smart = 0;
	    else
		smart = 1;
	    }
	}

    /* Open utmp and read the users. */
    fp = fopen( _PATH_UTMP, "r" );
    if ( fp == NULL )
	{
	perror( "utmp" );
	exit( 1 );
	}
    nusers = 0;
    while ( fread( (char*) &u, sizeof(u), 1, fp ) == 1 )
	{
	if ( u.ut_name[0] != '\0' )
	    {
	    if ( nusers >= MAXNAMES )
		{
		(void) fprintf( stderr, "Oops, too many users logged in.  Gotta increase MAXNAMES.\n" );
		exit( 1 );
		}
	    (void) strncpy( users[nusers], u.ut_name, UT_NAMESIZE );
	    users[nusers][UT_NAMESIZE] = '\0';
	    ++nusers;
	    }
	}
    (void) fclose( fp );
    qsort( users, nusers, sizeof(users[0]), cmp );
    
    /* Show the users. */
    wid = 0;
    for ( i = 0; i < nusers; ++i )
	{
	if ( wid + strlen( users[i] ) + 3 > LINEWIDTH )
	    {
	    putchar( '\n' );
	    wid = 0;
	    }
	if ( wid > 0 )
	    {
	    putchar( ' ' );
	    ++wid;
	    }
	if ( inlist( users[i], friends, nfriends ) )
	    {
	    if ( smart )
		tputs( soptr, 1, putch );
	    else
		putchar( '<' );
	    fputs( users[i], stdout );
	    if ( smart )
		tputs( septr, 1, putch );
	    else
		putchar( '>' );
	    if ( ! smart )
		wid += 2;
	    }
	else
	    fputs( users[i], stdout );
	wid += strlen( users[i] );
	}
    putchar( '\n' );

    exit( 0 );
    }

int
cmp( a, b )
    char* a;
    char* b;
    {
    return strcmp( a, b );
    }

int
inlist( str, list, nlist )
    char* str;
    char list[MAXNAMES][UT_NAMESIZE+1];
    int nlist;
    {
    int i;

    /* (This could be made into a binary search.) */
    for ( i = 0; i < nlist; ++i )
	if ( strcmp( str, list[i] ) == 0 )
	    return 1;
    return 0;
    }

void
putch( ch )
char ch;
    {
    putchar( ch );
    }

jef@well.sf.ca.us (Jef Poskanzer) (01/25/91)

In the referenced message, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) wrote:
}Ah, yes, Jef takes his place next to Chris on my list of gurus I've
}caught in a mistake.

You're right, I didn't notice the i < lines + 50 test.  I grovel at your
feet O Master.

}Now you can talk all you want about reallocating memory (btw, there's no
}safe way to use realloc(), but you knew that)

Actually, I didn't.  Say more.

}I'll skip the comments about a quadratic time requirement,

Please do.

}and about people who simply *talk* about code instead of *writing* code,

Please get stuffed.

}You won't be able to identify a
}single functional requirement that your reallocating version

You must have mis-read my message.  I don't have any version which uses
realloc.

}This explanation isn't particularly lucid,

You're right, but I understood it anyway.  As long as you've got that
overflow check in there, fine, it works.  But after correctness you
have to consider simplicity, and the fixed-size (but large and checked)
array wins there.  I realize they tell you in Computer Science School
that you're not supposed to do things like this.  I'm telling you now
that it can be appropriate.
---
Jef

  Jef Poskanzer  jef@well.sf.ca.us  {apple, ucbvax, hplabs}!well!jef
                  Published simultaneously in Canada.

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (01/26/91)

In article <22879@well.sf.ca.us> Jef Poskanzer <jef@well.sf.ca.us> writes:
> }Now you can talk all you want about reallocating memory (btw, there's no
> }safe way to use realloc(), but you knew that)
> Actually, I didn't.  Say more.

Some versions of realloc() return the original pointer rather than 0 if
they run out of memory. So you have to code the malloc()/bcopy()/free()
sequence yourself if you want error checking.

> }and about people who simply *talk* about code instead of *writing* code,
> Please get stuffed.

Hey, bud, you started. My code can't defend itself against your insults,
so someone has to do the job... :-)

> }You won't be able to identify a
> }single functional requirement that your reallocating version
> You must have mis-read my message.  I don't have any version which uses
> realloc.

This was in the hypothetical case that you do write a reallocating
version.

> As long as you've got that
> overflow check in there, fine, it works.  But after correctness you
> have to consider simplicity, and the fixed-size (but large and checked)
> array wins there.

It depends on whether you consider the fixed-size array to be correct.
Anyway, it's so simple to allow any number of users that you might as
well make the change.

> I realize they tell you in Computer Science School
> that you're not supposed to do things like this.

Hey, bud, don't accuse me of being a computer scientist, or I'll have to
start flaming you again. (Last I heard, programming wasn't even part of
the computer science curriculum.)

> I'm telling you now
> that it can be appropriate.

Be serious. We're talking about a trivial piece of code. Why is it
``appropriate'' to use an arbitrary limit when it's so easy to get rid
of the limit?

---Dan

gwyn@smoke.brl.mil (Doug Gwyn) (01/26/91)

In article <22311:Jan2502:34:1191@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>... there's no safe way to use realloc() ...

In Standard C realloc() is required to be safe.  Of course it may return
NULL even if you're attempting to shrink the allocation, although it is
unlikely that an implementation would be so deficient.

The relevant point is that one should be prepared to deal with realloc()
failure, not blindly assume it will always work.

smryan@garth.UUCP (Steven Ryan) (01/26/91)

>No, of course that's not what I mean.  It checks for overflow, tells
>you that it needs to be recompiled on overflow, and aborts on
>overflow.  Why is that so hard to understand?  Complete source is

Recompile what? Is the source always available? Is the build process
properly documented and all build files available? Is the routine
coded so that Joe Average can fix, recompile, and continue in five
minutes? Do you what Joe Average is going to think of you afterwards?
Do you think he'll be eager to run anything else with your name on
it?

Why is it difficult for so-called programmers to avoid arbitrary limits?
-- 
...!uunet!ingr!apd!smryan                                       Steven Ryan
...!{apple|pyramid}!garth!smryan              2400 Geng Road, Palo Alto, CA

manson@iguana.cis.ohio-state.edu (Bob Manson) (01/27/91)

In article <60@garth.UUCP> smryan@garth.UUCP (Steven Ryan) writes:
>Recompile what? Is the source always available? Is the build process
>properly documented and all build files available? Is the routine
>coded so that Joe Average can fix, recompile, and continue in five
>minutes? Do you what Joe Average is going to think of you afterwards?

I know what I thought of the "person" that hard-coded a limit on
the # of /etc/magic entries in AT&Ts file program...and it wasn't
kind. No, I didn't have source. No, I couldn't recompile. The
solution was to write a replacement that didn't have any such
stupid limit coded in it.

>Why is it difficult for so-called programmers to avoid arbitrary limits?

Because they don't care. I've met several people who call themselves
"programmers" that think writing portable, reasonably limit-free code
is a joke. They've just got a job to get done, a hacky piece of code
to be written, and they don't care what it looks like or if it'll work
a year from now.

I tend to write any program as if I were going to show it to someone
else, someone who could appreciate it and say "That's a really sharp
implementation" as opposed to "Who wrote this piece of shit?" I tend
to do this simply because I've had to port a wide range of software to
various machines, and I can't say that I was pleased to have worked on
most of it. I really don't want someone calling me some of the names
I've been calling others.

You think 1000 users is a large number in a users program? Suppose I
decide to start recording all users over a large network in my utmp
file? (Wouldn't that be nice...how I hate rwho.) I'll bet that in a
few years, 1000 will be far too small....and I won't be able to
recompile your program, because let's face it, 99.9% of all Unix
distributors don't give source. So get a grip, take the time to create
data structures that don't involve fixed-sized arrays, and a lot of
people will be much happier with you. I know it's hard to think that
not everyone has two machines & 10 users, but it's true.

>...!uunet!ingr!apd!smryan                                       Steven Ryan
							Bob
manson@cis.ohio-state.edu

rsalz@bbn.com (Rich Salz) (01/29/91)

In <23975:Jan2516:36:5891@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>Some versions of realloc() return the original pointer rather than 0 if
>they run out of memory.
Than such versions are seriously broken, since returning the original pointer is
a valid thing to do if there already was enough space allocated.

I would not code for such systems.
-- 
Please send comp.sources.unix-related mail to rsalz@uunet.uu.net.
Use a domain-based address or give alternate paths, or you may lose out.

jef@well.sf.ca.us (Jef Poskanzer) (01/29/91)

In the referenced message, Bob Manson <manson@cis.ohio-state.edu> wrote:
}You think 1000 users is a large number in a users program? Suppose I
}decide to start recording all users over a large network in my utmp
}file? (Wouldn't that be nice...how I hate rwho.)

Yes, that might be nice... but if you did that, why would you want to
run "users"?  Three screenfuls of usernames is not particularly
useful.  And as for piping it to another program, there's the small
problem that most "users" programs don't bother to write out any
newlines.  When you have fixed the far more serious problem of most
Unix programs dumping core on such input (not even a "recompile me"
message, how rude), then maybe I'll consider it worthwhile to add the
malloc gunk.

In general, sure, handling arbitrary input is great.  In specific cases
where you can make a confident estimate of the maximum input size, I have
no problem at all with using checked fixed size arrays of ten times
that size.  The benefit is N fewer lines to get wrong, and the cost, if
your estimate is good, is non-existant.

}I'll bet that in a few years, 1000 will be far too small....

What is the precise meaning of "far too small"?  At least one system
where 1000 is too small?  We probably have that already.  But if you
mean that such systems will be common, sure, I'll take that bet.  How
much?

}and I won't be able to
}recompile your program, because let's face it, 99.9% of all Unix
}distributors don't give source.

I give source.  In fact, one reason I like code which prints messages
like "change XYZ and recompile me please" is to discourage bozos from
doing any god damned binary-only distributions of *my* source.
---
Jef

  Jef Poskanzer  jef@well.sf.ca.us  {apple, ucbvax, hplabs}!well!jef
                     "So young, so bad, so what."

manson@python.cis.ohio-state.edu (Bob Manson) (01/29/91)

In article <22921@well.sf.ca.us> Jef Poskanzer <jef@well.sf.ca.us> writes:
>In the referenced message, Bob Manson <manson@cis.ohio-state.edu> wrote:
>}You think 1000 users is a large number in a users program? Suppose I
>}decide to start recording all users over a large network in my utmp
>}file? (Wouldn't that be nice...how I hate rwho.)
>
>Yes, that might be nice... but if you did that, why would you want to
>run "users"?

Well, I probably wouldn't want to _look_ at it, per se. But...

>Unix programs dumping core on such input (not even a "recompile me"
>message, how rude), then maybe I'll consider it worthwhile to add the
>malloc gunk.

The argument of "everything else is busted, so I'll leave my program
broken too" isn't a real good one, but I do see a point there.
Hmmm...Lets see what dies on an 13K input line. (Sun SLC+ running
SunOS 4.1.) Well, tr was happy to convert the spaces into newlines,
and I don't see much reason to go further, as the output of that could
be postprocessed as I wished. (Yes, most unix utilities puke badly on
input lines > 2K. On this Sun, grep and egrep deal with it OK,
producing correct output, but sed silently truncates the output to
4001 bytes. The behavior of grep & egrep is atypical, but I'll bet tr
will work in any case.)

>In general, sure, handling arbitrary input is great.  In specific cases
>where you can make a confident estimate of the maximum input size, I have
>no problem at all with using checked fixed size arrays of ten times

I've had to deal with one too many utilities where someone makes a
"confident estimate of the maximum input size" only to find that it's
too small. Assuming that someone would never have more than 2048
password entries, for example. OK, I question strongly whether most
unix sites have 2500 entries in their password files. Ours did (when I
worked for the CIS dept. here), and I didn't have source to the
programs. I was hosed.

Seeing messages from programs like "recompile program with larger
NENTS" is useless in these cases, as all I can do is call {insert your
workstation maker here} and say "I need program X recompiled with a
larger NENTS" and they laugh. And not everyone who does sysadmin is
even capable of recompiling programs; the people I'm currently working
with couldn't if their life depended on it.

>What is the precise meaning of "far too small"?  At least one system
>where 1000 is too small?  We probably have that already.  But if you
>mean that such systems will be common, sure, I'll take that bet.  How
>much?

What does "common" have to do with anything? If your utility won't work at
my site, what good is it?

>I give source.  In fact, one reason I like code which prints messages
>like "change XYZ and recompile me please" is to discourage bozos from
>doing any god damned binary-only distributions of *my* source.

Hasn't stopped HP or AT&T from distributing code with similar limits.
Won't stop anyone else either. 

I know what you're trying to say. It's a useless waste of time to
write extra code to make a program limit-independent when we can make
a good estimate of the maximum numbers & provide source for
recompilation. My argument is, it really doesn't cost that much more
to design the program properly to function without limits. The cost in
making utilities with fixed limits in them is unhappy customers & time
spent rewriting programs, since I seriously doubt source policies will
change anytime soon. Your point about utilities dying on too long
input lines is an excellent example; really, there is no much thing as
a "too long input line". Whoever wrote sed decided that lines would
never be longer than 4000 characters, and they were quite wrong...

>  Jef Poskanzer  jef@well.sf.ca.us  {apple, ucbvax, hplabs}!well!jef

						Bob
manson@cis.ohio-state.edu

barmar@think.com (Barry Margolin) (01/30/91)

In article <87774@tut.cis.ohio-state.edu> Bob Manson <manson@cis.ohio-state.edu> writes:
>I know what you're trying to say. It's a useless waste of time to
>write extra code to make a program limit-independent when we can make
>a good estimate of the maximum numbers & provide source for
>recompilation. My argument is, it really doesn't cost that much more
>to design the program properly to function without limits. The cost in
>making utilities with fixed limits in them is unhappy customers & time
>spent rewriting programs, since I seriously doubt source policies will
>change anytime soon. Your point about utilities dying on too long
>input lines is an excellent example; really, there is no much thing as
>a "too long input line". Whoever wrote sed decided that lines would
>never be longer than 4000 characters, and they were quite wrong...

I agree with this most emphatically.  The kind of software design Mr.Manson
is complaining about is rampant in the industry, and pervades Unix.  Most
programmers learn software design by example.  Sometimes this is good, when
a good programming style (e.g. programs that filter stdin to stdout) is
mimicked, but it also propogates poor programming practices.  When I talk
about the "brokenness" of Unix, it's this kind of stuff I'm thinking of.

These kinds of problems aren't even in utility programs, but in just about
every level of the system.  For instance, file descriptors are indexes into
a per-process table in the kernel; in many of the older Unix versions I
don't even think the size of this table was configurable, but I may be
mistaken.  Of course, there are often good reasons to put some limits on
per-process and per-user resources, to keep a single user or buggy program
from hogging a system, but why aren't they runtime options?  Why should I
have to rebuild a kernel because I need more ptys?

Yes, I admit that it is easier to program with fixed-size tables and
buffers, but who ever said good programming was supposed to be easy?  Of
course, I'm biased, because I do much of my programming in Lisp, which
makes it easy to write programs with few arbitrary limits.

--
Barry Margolin, Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

schwartz@groucho.cs.psu.edu (Scott Schwartz) (01/30/91)

barmar@think.com (Barry Margolin) writes:
| I agree with this most emphatically.  The kind of software design Mr.Manson
| is complaining about is rampant in the industry, and pervades Unix.  Most
| programmers learn software design by example.  Sometimes this is good, when
| a good programming style (e.g. programs that filter stdin to stdout) is
| mimicked, but it also propogates poor programming practices.  When I talk
| about the "brokenness" of Unix, it's this kind of stuff I'm thinking of.

Part of the problem is that the standard libraries most systems supply
are flawed in various ways.  In stdio, ``gets'' leaps to mind.
Moreover, ``fgets'' imposes an upper bound on input length, so lots of
programs inherit that flaw.  In V10 the fast io library imposes a
fixed length (not even user selectable) on lines that ``Frdline'' will
return.  Happily, Chris Torek's new 4.4BSD stdio provides a way to
read lines of any length using ``fgetline''.  The only problem with
that is that there is no general mechanism to read arbitrarily long
tokens -- fgetline should either take a user supplied delimiter, or
there should be a separate routine (fgettoken?)  to do the job.  Now's
that time to fix this, before 4.4BSD really hits the streets.

| I'm biased, because I do much of my programming in Lisp, which
| makes it easy to write programs with few arbitrary limits.

I'd kill for a scheme compiler that was suitable for writing systems
programs.

sef@kithrup.COM (Sean Eric Fagan) (01/30/91)

In article <87774@tut.cis.ohio-state.edu> Bob Manson <manson@cis.ohio-state.edu> writes:
>Hmmm...Lets see what dies on an 13K input line. (Sun SLC+ running
>SunOS 4.1.) Well, tr was happy to convert the spaces into newlines,
>and I don't see much reason to go further, as the output of that could
>be postprocessed as I wished. (Yes, most unix utilities puke badly on
>input lines > 2K.

On kithrup (an SCO SysVr3.2v2 system), tr, grep, and wc all dealt nicely
with a 120k line.  (/etc/termcap is useful for such things 8-).)  I was
actually quite impressed.

>>I give source.  In fact, one reason I like code which prints messages
>>like "change XYZ and recompile me please" is to discourage bozos from
>>doing any god damned binary-only distributions of *my* source.
>Hasn't stopped HP or AT&T from distributing code with similar limits.
>Won't stop anyone else either. 

Just a note here.  SCO's version of yacc has some semi-fixed limits.  If it
runs out of space for some of the tables, it complains, and says to rerun it
with a different option.  The actual message is something like:

	Out of <whatever> space.  Run with -Sm# option (current
	setting 5000).

(That's how I got perl working.)  Although that's not the best solution (for
example, it could be argued that yacc should realloc() up the space itself),
it *does* manage what I consider a decent compromise between static space
and run-time limitations.

Anyway, just my two cents...

-- 
Sean Eric Fagan  | "I made the universe, but please don't blame me for it;
sef@kithrup.COM  |  I had a bellyache at the time."
-----------------+           -- The Turtle (Stephen King, _It_)
Any opinions expressed are my own, and generally unpopular with others.

chip@tct.uucp (Chip Salzenberg) (01/30/91)

According to schwartz@groucho.cs.psu.edu (Scott Schwartz):
>Happily, Chris Torek's new 4.4BSD stdio provides a way to
>read lines of any length using ``fgetline''.

BSD isn't the world; fixing 4.4BSD won't help me.

Each site (or programmer) needs to write fgetline() or its moral
equivalent using getc(), malloc() and realloc(), and use it every time
gets() or fgets() would have been used.
-- 
Chip Salzenberg at Teltronics/TCT     <chip@tct.uucp>, <uunet!pdn!tct!chip>
 "I want to mention that my opinions whether real or not are MY opinions."
             -- the inevitable William "Billy" Steinmetz

martin@mwtech.UUCP (Martin Weitzel) (01/31/91)

In article <87774@tut.cis.ohio-state.edu> Bob Manson <manson@cis.ohio-state.edu> writes:
>My argument is, it really doesn't cost that much more
>to design the program properly to function without limits. The cost in
>making utilities with fixed limits in them is unhappy customers & time
>spent rewriting programs, since I seriously doubt source policies will
>change anytime soon.

Well, ...

sience fiction ON

I could see a time when by law

	a) software manufacturers must name all fixed limits of
	   their products
	b) the customer can assume that all unnamed limits are
	   in fact not fixed to some arbitrary value
	c) the customer has the right to request the sources and
	   and whatever else is needed (e.g. a special compiler)
	   from the manufacturer for no additional fee if any
	   limit is hit which is below the promises of a) and b)

science fiction OFF

Of course, somewhat more realistic is that we'll have PD-versions
of all the useful programs some day so that no manufacturer can sell
something of less quality ...
-- 
Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83

byron@archone.tamu.edu (Byron Rakitzis) (01/31/91)

In article <22921@well.sf.ca.us> Jef Poskanzer <jef@well.sf.ca.us> writes:
>In the referenced message, Bob Manson <manson@cis.ohio-state.edu> wrote:
>}You think 1000 users is a large number in a users program? Suppose I
>}decide to start recording all users over a large network in my utmp
>}file? (Wouldn't that be nice...how I hate rwho.)
>
>Yes, that might be nice... but if you did that, why would you want to
>run "users"?  Three screenfuls of usernames is not particularly
>useful.  And as for piping it to another program, there's the small
>problem that most "users" programs don't bother to write out any
>newlines.  When you have fixed the far more serious problem of most
>Unix programs dumping core on such input (not even a "recompile me"
>message, how rude), then maybe I'll consider it worthwhile to add the
>malloc gunk.
>
>In general, sure, handling arbitrary input is great.  In specific cases
>where you can make a confident estimate of the maximum input size, I have
>no problem at all with using checked fixed size arrays of ten times
>that size.  The benefit is N fewer lines to get wrong, and the cost, if
>your estimate is good, is non-existant.

I think the point made here is that there *are* utilities written
with bad a priori limits in their data structures. The most flagrant
examples I can think of are vi and sh. Under certain circumstances,
if you declare too many (== over 30 or so, not really that many!!)
environment variables, vi and sh will dump core on my sun 4/280
running StunOS 4.0.3. It remains to be seen whether Sun addressed this
bug in 4.1, but in the meantime I will agree wholeheartedly with the
opinion that hard limits in code must be avoided.

I've finishing writing a small sh-like shell whose only hard limit
(which I'm thinking of taking out) is the number of commands that can
be entered in a single pipeline. Currently the value is 512; more than
the maximum number of processes allowed on any unix machine I've seen,
so I consider myself safe. But at least I am aware of this as a
shortcoming.

Byron.

--
Byron Rakitzis
byron@archone.tamu.edu

tchrist@convex.COM (Tom Christiansen) (02/04/91)

From the keyboard of chip@tct.uucp (Chip Salzenberg):
:According to schwartz@groucho.cs.psu.edu (Scott Schwartz):
:>Happily, Chris Torek's new 4.4BSD stdio provides a way to
:>read lines of any length using ``fgetline''.
:
:BSD isn't the world; fixing 4.4BSD won't help me.

It's not the world, but it's a start.  Do you have a scheme for fixing
everything everywhere simultaneously?  It's a hard problem.  (I often wish
RTM's Internet worm had gone around fixing broken code: the ultimate
update engine. :-)

:Each site (or programmer) needs to write fgetline() or its moral
:equivalent using getc(), malloc() and realloc(), and use it every time
:gets() or fgets() would have been used.

Ug.  If it's written once, published, and made available for use free
of charge *and* without viral strings attached, each site or programmer
won't have to re-invent the wheel.  Of course, sites without source are
still largely at the mercy of vendors.

--tom
--
Still waiting to read alt.fan.dan-bernstein using DBWM, Dan's own AI
window manager, which argues with you 10 weeks before resizing your window.
    Tom Christiansen	tchrist@convex.com	convex!tchrist

darcy@druid.uucp (D'Arcy J.M. Cain) (02/07/91)

In article <1991Feb03.181937.9090@convex.com> Tom Christiansen writes:
>From the keyboard of chip@tct.uucp (Chip Salzenberg):
>:Each site (or programmer) needs to write fgetline() or its moral
>:equivalent using getc(), malloc() and realloc(), and use it every time
>:gets() or fgets() would have been used.
>
>Ug.  If it's written once, published, and made available for use free
>of charge *and* without viral strings attached, each site or programmer
>won't have to re-invent the wheel.  Of course, sites without source are
>still largely at the mercy of vendors.

OK, I have made a stab at it.  Of course the first thing to do is define
it.  I have whipped up a man page for the way I think this function
should work and it is included here.  While I am at it the code to implement
it is also included.  (Yah, source but it's so small.)

Anybody want to use this as a starting point?  I have made it completely
free so that no one has to worry about licensing restrictions.  Besides,
it's so trivial who couldn't duplicate it in 20 minutes anyway?

----------------------------- cut here --------------------------------
/*
NAME
	fgetline

SYNOPSIS
	char *fgetline(FILE *fp, int exclusive);

DESCRIPTION
	Reads a line from the stream given by fp and returns a pointer to
	the string.  There is no length restiction on the returned string.
	Space is dynamically allocated for the string as needed.  If the
	exclusive flag is set then the space won't be reused on the next
	call to fgetline.

RETURNS
	A pointer to the string without the terminating EOL is returned if
	successful or NULL if there was an error.

AUTHOR
	D'Arcy J.M. Cain (darcy@druid.UUCP)

CAVEATS
	This function is in the public domain.

*/

#include	<stdio.h>
#include	<malloc.h>

/* I originally was going to use 80 here as the most common case but */
/* decided that a few extra bytes to save a malloc from time to time */
/* would be a better choice.  Comments welcome.  */
#define		CHUNK	128

static char	*buf = NULL;

char	*fgetline(FILE *fp, int exclusive)
{
	size_t	sz = CHUNK;	/* this keeps track of the current size of buffer */
	size_t	i = 0;		/* index into string tracking current position */
	char	*ptr;		/* since we may set buf to NULL before returning */
	int		c;			/* to store getc() return */

	/* set buf to 128 bytes */
	if (buf == NULL)
		buf = malloc(sz);
	else
		buf = realloc(buf, sz);

	/* check for memory problem */
	if (buf == NULL)
		return(NULL);

	/* get characters from stream until EOF */
	while ((c = getc(fp)) != EOF)
	{
		/* check for end of line */
		if (c == '\n')
			goto finished;		/* cringe */

		buf[i++] = c;

		/* check for buffer overflow */
		if (i >= sz)
			if ((buf = realloc(buf, (sz += CHUNK))) == NULL)
				return(NULL);
	}

	/* see if anything read in before EOF */
	/* perhaps some code to preserve errno over free() call needed? */
	if (!i)
	{
		free(buf);
		buf = NULL;
		return(NULL);
	}

finished:
	buf[i++] = 0;

	/* the realloc may be overkill here in most cases - perhaps it */
	/* should be moved to the 'if (exclusive)' block */
	ptr = buf = realloc(buf, i);

	/* prevent reuse if necessary */
	if (exclusive)
		buf = NULL;

	return(ptr);
}
---------------------------------------------------------------------------

-- 
D'Arcy J.M. Cain (darcy@druid)     |
D'Arcy Cain Consulting             |   There's no government
West Hill, Ontario, Canada         |   like no government!
+1 416 281 6094                    |

sms@lonex.radc.af.mil (Steven M. Schultz) (02/07/91)

In article <1991Feb6.170055.2081@druid.uucp> darcy@druid.uucp (D'Arcy J.M. Cain) writes:
>In article <1991Feb03.181937.9090@convex.com> Tom Christiansen writes:
>>From the keyboard of chip@tct.uucp (Chip Salzenberg):
>>:Each site (or programmer) needs to write fgetline() or its moral

	This whole thread is 1) inappropriate for this group, it has
	about as much to do with 4BSD as the price of tea in China, 
	2) has devolved into a religious issue best resolved off in
	a place like alt.computers.religion.  Please move this discussion
	elsewhere.

	Steven

mcdaniel@adi.com (Tim McDaniel) (02/08/91)

torek@h2opolo.ee.lbl.gov (who CALLS himself Chris Torek) writes:
           int
           fgetline(FILE *stream, int *len);
...
           while ((p = fgetline(inf)) != NULL)

Notice the "int" return code and the argument count problem.  Chris
Torek making TWO errors in a SINGLE posting (about a routine He designed
and wrote) is patently impossible.  torek@h2opolo.ee.lbl.gov is
therefore NOT The REAL Chris Torek, but a shameless forger.

It's obvious what happened.  Someone saw Chris Torek's announcement
that He was off the net temporarily and didn't know His future e-mail
address.  This person waited a plausible amount of time, and then
posted this crude forgery.  The "oops, bug in the man page" followup
didn't fool me one little bit.

I'm writing to the system administrators at ee.lbl.gov to get this
dastardly imposter fired immediately, and (if possible) brought up on
criminal charges.

--
Tim McDaniel                 Applied Dynamics Int'l.; Ann Arbor, Michigan, USA
Work phone: +1 313 973 1300                        Home phone: +1 313 677 4386
Internet: mcdaniel@adi.com                UUCP: {uunet,sharkey}!amara!mcdaniel

torek@h2opolo.ee.lbl.gov (Chris Torek) (02/08/91)

I posted two articles early this morning (<9644@dog.ee.lbl.gov> and
<9653@dog.ee.lbl.gov>) with the second being a correction to the first.
Now that article cancellation is fixed, here is a corrected version;
I have cancelled the previous two articles.

Before things get out of hand, here is the fgetline man page from
4.3-and-two-thirds-or-whatever-you-call-it:

   FGETLINE(3)         UNIX Programmer's Manual          FGETLINE(3)

   NAME
	fgetline - get a line from a stream

   SYNOPSIS
	#include <stdio.h>

	char *
	fgetline(FILE *stream, int *len);

   DESCRIPTION
	Fgetline returns a pointer to the next line from the stream
	pointed to by stream.  The newline character at the end of
	the line is replaced by a '\0' character.

	If len is non-NULL, the length of the line, not counting the
	terminating NUL, is stored in the memory location it refer-
	ences.

   SEE ALSO
	ferror(3), fgets(3), fopen(3), putc(3)

   RETURN VALUE
	Upon successful completion a pointer is returned; this
	pointer becomes invalid after the next I/O operation on
	stream (whether successful or not) or as soon as the stream
	is closed.  Otherwise, NULL is returned.  Fgetline does not
	distinguish between end-of-file and error, and callers must
	use feof and ferror to determine which occurred.  If an
	error occurrs, the global variable errno is set to indicate
	the error.  The end-of-file condition is remembered, even on
	a terminal, and all subsequent attempts to read will return
	NULL until the condition is cleared with clearerr.

	It is not possible to tell whether the final line of an
	input file was terminated with a newline.

   ERRORS
	[EBADF]        Stream is not a stream open for reading.

	Fgetline may also fail and set errno for any of the errors
	specified for the routines fflush(3), malloc(3), read(2),
	stat(2), or realloc(3).

(the underlining and boldface have vanished, but the above should still
be comprehensible).

Note that fgetline makes no promises about the pointer it returns.  If
you want a copy of the line, you must copy it yourself.  This is so that
fgetline can return pointers within the original stdio buffers; in
particular, the sequence:

	/* add quote widgets */
	while ((p = fgetline(inf, (int *)NULL)) != NULL)
		if (fprintf(outf, ">%s\n", p) < 0)	/* error */
			break;

does not require an intermediate buffer into which lines are copied.
They go directly from the input file's buffer to the output file's
buffer.  (Thus, there is one memory-to-memory copy in the above loop.)

It is unfortunate that there is no formal mechanism to avoid read
copies for other operations.  In particular, copying an input file to
an output file could be done with no (user) memory-to-memory copies
with a loop of the form:

	while (there is more in the input buffer)
		write the input buffer to the output file;

whenever the block sizes match, since the input buffer can be written
to the output file with a direct write() system call.  As it is, you
must use fread to obtain data, with at least one copy.

[Thanks to: Arnold Robbins, Cesar A Quiroz, Jef Poskanzer, Henry
Spencer, and Ray Butterworth for the fixes included here.  (These are
in alphabetical order, by first name.)]
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab EE div (+1 415 486 5427)
Berkeley, CA		Domain:	torek@ee.lbl.gov

darcy@druid.uucp (D'Arcy J.M. Cain) (02/08/91)

In article <9644@dog.ee.lbl.gov> torek@ee.lbl.gov (Chris Torek) writes:
>In article <1991Feb6.170055.2081@druid.uucp> darcy@druid.uucp
>(D'Arcy J.M. Cain) writes:
>>NAME
>>	fgetline
>>
>>SYNOPSIS
>>	char *fgetline(FILE *fp, int exclusive);
>
>Before things get out of hand, here is the fgetline man page from
>4.3-and-two-thirds-or-whatever-you-call-it:
>
>   NAME
>	fgetline - get a line from a stream
>
>   SYNOPSIS
>	#include <stdio.h>
>
>	int
>	fgetline(FILE *stream, int *len);

Oops, don't have such a beast on my SVR3.2.  I thought from the discussion
that people were proposing such a function.  I still like the exclusive
use flag though.

-- 
D'Arcy J.M. Cain (darcy@druid)     |
D'Arcy Cain Consulting             |   There's no government
West Hill, Ontario, Canada         |   like no government!
+1 416 281 6094                    |