[net.sources.bugs] getopt

ian@utcs.uucp (Ian F. Darwin) (10/12/85)

In article <346@uw-june> schwartz@uw-june includes (from
.\" @(#)getopt.3c 1.1 83/08/30 SMI; from UCB 4.2):
>This routine is included for compatibility with UNIX system-III.
>It is of marginal value, and should not be used in new programs.

I disagree. First, it is for compatability with USG UNIX (all
of them, not just System III (note spelling), but also System V,
System V Release 2, etc.).

Second, it is very useful and should be used in all new programs.
People should not write their own version of argument parsing
in every new program, and get it wrong, when a standard argument
parser is available.

Third, since did not write the code (the version you posted
was written at the University of Toronto), you should give credit
to the person who did. A difference of opinion is no excuse
for a lack of courtesy; taking somebody's name off code 
that you don't like but haven't modified is certainly discourteous.

Sounds like something that might have happened to the code
in or around Berkeley; they are famous for trying to take
the credit for others' work by taking names off contributions.

Ian F. Darwin
Toronto, Canada

peter@graffiti.UUCP (Peter da Silva) (10/15/85)

Disclaimer: the following text should be ignored by 90% of the readers of
mod.std.c, since they've already gone through this.

> Second, it is very useful and should be used in all new programs.
> People should not write their own version of argument parsing
> in every new program, and get it wrong, when a standard argument
> parser is available.

Not when (as has been pointed out by many people) the standard argument parser
does the wrong thing. It can't even handle the arguments that sort(1) (V7)
uses, to wit:

	sort -mubdfincrtx

Where the final 'tx' means 'tab character <x>'. The rest of sort's arguments
are even less parsable by getopt. There is no reason for getopt's
insistence on lots of whitespace, nor for its ignoring argument order, nor
for its inability to handle '+' and '-' type command flags...

And finally it's too big. If your program takes the following arguments:

	foo [-someflags] [file]...

Which is the usual case, what's wrong with:

	char *prog;

	main(ac, av)
	int ac;
	char **av;
	{
		int flag = 0;

		prog = *av;
		while(av++, ac--)
			if(**av=='-')
				while(*++*av)
					switch(**av) {
						case 's': /* -s */
							sflag++;
							break;
						...
						case 'g': /* -g<s> */
							if(av[0][1])
								gchar = *++*av;
							else if(av[1])
								gchar = **++av;
							else
								usage(*av);
						default:
							usage(*av);
					}
			else {
				FILE *fp = fopen(*av, "r");
				if(fp) {
					do_something_with(fp, *av);
					fclose(fp);
				}
				flag = 1;
			}
		if(flag==0) /* no files processed */
			do_something_with(stdin, "standard input");
	}

which is not much more complex than the main you have to write with getopt to
do the same thing, allows more flexibility (foo -s -g:; foo -s -g :; foo -sg:;
foo -sg :), and produces a program that needs less core. If you think that's
a minor consideration, remember why vi doesn't use stdio on a PDP-11.

keith@seismo.CSS.GOV (Keith Bostic) (10/17/85)

In article <306@graffiti.UUCP>, peter@graffiti.UUCP (Peter da Silva) writes:

> Disclaimer: the following text should be ignored by 90% of the readers of
> mod.std.c, since they've already gone through this.

Disclaimer: the following text should be read by 90% of the readers of
mod.std.c, 'cause they're purely wrong.

> Not when (as has been pointed out by many people) the standard argument parser
> does the wrong thing. It can't even handle the arguments that sort(1) (V7)
> uses, to wit:
> 
> 	sort -mubdfincrtx
> 
> Where the final 'tx' means 'tab character <x>'.

Wrong.  What you're trying to do is assign the character 'x' to a char
variable, correct?  Code can be written to use getopt that does this quite
nicely.  Important code fragment:

		case 't':	/* tab char */
			tabch = *optarg;
			break;

> The rest of sort's arguments are even less parsable by getopt.

Wrong again.  The *only* arguments that sort has that getopt can't handle
are the +/- flags.  No, I take that back.  The V7 sort also allowed you
"-mutxbd" where you could insert the argument into the flag string, and
the program realized the length of the argument as a single character and
simply picked up the next character and continued on.  I think that "feature"
can wander on out of our lives, don't you?

> There is no reason for getopt's insistence on lots of whitespace,

Wrong.  It doesn't insist on lots of whitespace, any more than any other
command interface.  You can group flags together, e.g. "sort -efghi",
until you enter a flag that requires an argument.  Then, you have to have
whitespace, otherwise there's no way to know when the argument terminates.
That's nothing new.

> nor for its ignoring argument order,

Wrong again.  Why should getopt pay any attention whatsoever to argument
order?  It's easy enough to implement if you really care about it:

	short	sflag = 0;

		case 's':	/* sflag */
			++sflag;
			break;
		case 't':	/* tflag */
			if (sflag) {
				puts("no.");
				exit(-1);
			}

but that has nothing to do with getopt.  All getopt is supposed to do is
provide an interface to the user's command line.  *Not* decide that the
flags are incorrectly ordered.  Besides, there's a very valid reason for
programs ignoring argument order in general; it complicates the user interface
unnecessarily.

> nor for its inability to handle '+' and '-' type command flags...

Here, you may have a point.  Getopt requires that all flags be preceded
by a '-', and that "--" denote the end of the arguments.  Now, you can
certainly have "sort -+3.5"

	while ((ch = getopt(argc,argv,"t+:")) != EOF)
		switch((char)ch) {
			case '+':
				printf("got +: arg was <%s>\n",optarg);
				break;

but not "sort --3.5".  Now... how many programs really use '+' and '-'?  And
just how much heartbreak is it going to cause you to enter "sort -mubd -s3.5
-e3.5" as opposed to the current "sort -mubd +3.5 -3.5"?  There's a difference
of exactly two characters.  I think this is a minor price to pay for a
consistent user interface.

> And finally it's too big. If your program takes the following arguments:
> 
> 	foo [-someflags] [file]...

> Which is the usual case, what's wrong with:

... insert large example ...

> which is not much more complex than the main you have to write with getopt to
> do the same thing, allows more flexibility (foo -s -g:; foo -s -g :; foo -sg:;
> foo -sg :), and produces a program that needs less core. If you think that's
> a minor consideration, remember why vi doesn't use stdio on a PDP-11.

First off, the code to parse a command list sanely is fairly complex.  Argv
is not an that easy a variable to handle, especially for novice programmers.
Getopt offers a clean, simple interface to command lines.  Secondly, your
code is no more flexible than getopt.  The following code fragment will
handle all of your examples.

	while ((ch = getopt(argc,argv,"sg:")) != EOF)
		switch((char)ch) {
			case 's':
				puts("got s");
				break;
			case 'g':
				printf("got g: arg was <%s>\n",optarg);
				break;
			default:
				puts("got nothing");
				exit(ERR);
		}

Secondly, the size differences are negligible.  On a PDP or anywhere else.
Getopt doesn't use stdio, therefore your code isn't going to improve it
a lot.

Getopt is a good idea, folks.
	-- it provides consistent syntax error messages
	-- most programmers don't handle bizarre flag/argument combinations;
		getopt takes care of that problem.
	-- simplifies the effort of writing a command interface to the
		copying of a while loop from your last program and editing
		a couple of lines.

Keith Bostic
	keith@seismo.CSS.GOV

rcj@burl.UUCP (Curtis Jackson) (10/17/85)

In article <306@graffiti.UUCP> peter@graffiti.UUCP (Peter da Silva) writes:
>[getopt] does the wrong thing. It can't even handle the arguments that
>sort(1) (V7) uses, to wit:
>
>	sort -mubdfincrtx
>
>Where the final 'tx' means 'tab character <x>'. The rest of sort's arguments
>are even less parsable by getopt. There is no reason for getopt's
>insistence on lots of whitespace, nor for its ignoring argument order, nor
>for its inability to handle '+' and '-' type command flags...

All this is based in getopt from AT&T Unix Sys III and up:

Agreed that getopt cannot handle '+' type command flags, BUT -- it can indeed
handle the trailing tx mentioned above, it ignores whitespace between switches
that do not require arguments, and it does NOT ignore argument order.  Sounds
to me like you have an inferior, 'non-standard' getopt.

>do the same thing, allows more flexibility (foo -s -g:; foo -s -g :; foo -sg:;
>foo -sg :), and produces a program that needs less core. If you think that's
>a minor consideration, remember why vi doesn't use stdio on a PDP-11.

Again, 'real' getopt will accept all of the above combinations of -s and -g
above.  Agreed, it does add somewhat to the size of your program -- but since
I write microassemblers and compilers that generally have 15-20 command-line
switches I don't really mind -- it buys me a lot of clarity.  Also, I am on
a Vax 11/780 with 10 meg main memory   :-)
-- 

The MAD Programmer -- 919-228-3313 (Cornet 291)
alias: Curtis Jackson	...![ ihnp4 ulysses cbosgd mgnetp ]!burl!rcj
			...![ ihnp4 cbosgd akgua masscomp ]!clyde!rcj

peter@graffiti.UUCP (Peter da Silva) (10/20/85)

> In article <306@graffiti.UUCP>, peter@graffiti.UUCP (Peter da Silva) writes:
> 
> > Disclaimer: the following text should be ignored by 90% of the readers of
> > mod.std.c, since they've already gone through this.
> 
> Disclaimer: the following text should be read by 90% of the readers of
> mod.std.c, 'cause they're purely wrong.

Actually, I agree with you here. They're wrong. Most of them agree with you.

> > 	sort -mubdfincrtx
> > 
> > Where the final 'tx' means 'tab character <x>'.
> 
> Wrong.  What you're trying to do is assign the character 'x' to a char
> variable, correct?  Code can be written to use getopt that does this quite
> nicely.  Important code fragment:

But according to the docs & both versions of getopt that have shown up on the
net that won't do the same thing. According to them, you need:

	sort -mubdfincr -tx

Now then: you may have an improved version of getopt, or the versions posted
to the net may be incomplete or innacurate. In either case you still can't use
*AVAILABLE* versions of getopt to parse those args.

> simply picked up the next character and continued on.  I think that "feature"
> can wander on out of our lives, don't you?

Why? It's an unabiguous parse, and doesn't break anything to leave it in.
I can see a situation where you have 2 flags like that: -tx -sx. Someone's
going to type 'foo -s:t:' and get hit with an un-necessary error message.

> > There is no reason for getopt's insistence on lots of whitespace,
> 
> Wrong.  It doesn't insist on lots of whitespace, any more than any other
> command interface.  You can group flags together, e.g. "sort -efghi",
> until you enter a flag that requires an argument.  Then, you have to have
> whitespace, otherwise there's no way to know when the argument terminates.
> That's nothing new.

Not according to what I've seen. Getopt requires that flags with arguments
stand alone.

> > nor for its ignoring argument order,
> 
> Wrong again.  Why should getopt pay any attention whatsoever to argument
> order?  It's easy enough to implement if you really care about it:
> 
> ... code segment to demonstrate getopt doesn't care about argument order.
> 
> but that has nothing to do with getopt.  All getopt is supposed to do is
> provide an interface to the user's command line.  *Not* decide that the
> flags are incorrectly ordered.

A counterexample to show you what I'm talking about:

	connect: a UNIX modem program that I wrote. It allows a series of
phone numbers on the command line & keeps trying them until it gets one that
works. Handy for calling bbs-es:
	usage: connect -s<baud> -l<line> number...
		Note: direct is considered a number for compatibility with cu.

	connect -s 1200 4445555 4446666 -s300 5556666 6667777 -l tty1 direct

How would you deal with that using getopt, which seems to require that all
options be before all arguments?

> Besides, there's a very valid reason for
> programs ignoring argument order in general; it complicates the user interface
> unnecessarily.

But sometimes it's necessary. Like the above example. Or like any reasonable
permutation of "find".

> > nor for its inability to handle '+' and '-' type command flags...
> 
> but not "sort --3.5".  Now... how many programs really use '+' and '-'?  And
> just how much heartbreak is it going to cause you to enter "sort -mubd -s3.5
> -e3.5" as opposed to the current "sort -mubd +3.5 -3.5"?  There's a difference
> of exactly two characters.  I think this is a minor price to pay for a
> consistent user interface.

The "tail" on the Tek development system I've been using has exactly that
change, and it causes much heartbreak & swearing every time I forget and
type "tail -60" instead of "tail -e 60".

> > And finally it's too big....
> 
> First off, the code to parse a command list sanely is fairly complex.  Argv
> is not an that easy a variable to handle, especially for novice programmers.

The above code parses any command list getopt can deal with and a whole bunch
more. It's not that complex.

> Getopt offers a clean, simple
			    incomplete
>				interface to command lines.  Secondly, your
> code is no more flexible than getopt.  The following code fragment will
> handle all of your examples.

Will it handle 'foo -g: file1 -g% file2 -sothg: file3'?

> Secondly, the size differences are negligible.  On a PDP or anywhere else.
> Getopt doesn't use stdio, therefore your code isn't going to improve it
> a lot.

I never said it did use stdio. All I said was that it's not of negligable size.

> Getopt is a good idea, folks.
> 	-- it provides consistent syntax error messages
> 	-- most programmers don't handle bizarre flag/argument combinations;
> 		getopt takes care of that problem.
> 	-- simplifies the effort of writing a command interface to the
> 		copying of a while loop from your last program and editing
> 		a couple of lines.

	Well, the program I provided does all these things too, and allows you
to handle multiple sets of options, variant option flags, and so on.

> Keith Bostic

Peter da Silva

keith@seismo.CSS.GOV (Keith Bostic) (10/22/85)

References: <910@utcs.uucp> <306@graffiti.UUCP> <444@seismo.CSS.GOV> <324@graffiti.UUCP>

> But according to the docs & both versions of getopt that have shown up on the
> net that won't do the same thing. According to them, you need:
> 
> 	sort -mubdfincr -tx

> Now then: you may have an improved version of getopt, or the versions posted
> to the net may be incomplete or innacurate. In either case you still can't use
> *AVAILABLE* versions of getopt to parse those args.

There have been several versions of getopt(3) running around the public domain.
The one I'm talking about here I have posted to the net at least 3 times, once
to net.bugs, once to mod.sources, and once somewhere else.  It is fully S5
compatible and handles the above case.

> Why? It's an unabiguous parse, and doesn't break anything to leave it in.
> I can see a situation where you have 2 flags like that: -tx -sx. Someone's
> going to type 'foo -s:t:' and get hit with an un-necessary error message.

This is a special case that just doesn't occur.  You're stipulating that a
program takes two arguments of one character apiece, no more, no less.  That's
the *only* way the above example becomes relevant.  Since I can't think of a
single program with such an interface, I'm forced to conclude that its
sacrifice is a small price to pay for command line consistency.

> Not according to what I've seen. Getopt requires that flags with arguments
> stand alone.

No, it requires flags with arguments to be *followed* by whitespace.  This is
standard in most command interfaces, since it can only be avoided by exact
knowledge of argument length.

> A counterexample to show you what I'm talking about:
> 
> 	connect: a UNIX modem program that I wrote. It allows a series of
> phone numbers on the command line & keeps trying them until it gets one that
> works. Handy for calling bbs-es:
> 	usage: connect -s<baud> -l<line> number...
> 		Note: direct is considered a number for compatibility with cu.
> 
> 	connect -s 1200 4445555 4446666 -s300 5556666 6667777 -l tty1 direct
> 
> How would you deal with that using getopt, which seems to require that all
> options be before all arguments?

The key is your usage statement.  Why doesn't ls allow "ls foo bar -l"?  What's
wrong with expecting "connect -s<baud> -l<line> number..."?  Answer: Nothing,
and it's easier.  After all, that's what your usage statement says.  Yes, we
could rewrite the UNIX application software universe so that programs parsed
their entire argv array *before* handling any of their arguments, but think how
much slower "ls /sys/sys/* -l" is going to be.  Besides, the only real value
would accrue to programs that want to allow flags *per* argument, e.g. "nm -n
/vmunix -p /old_vmunix".  And that too, has hidden problems; note in the
example I just gave, the flags 'p' and 'n' are contradictory -- how are you
going to handle that?  Exactly what relationship are the flags going to have?
Do they apply to the entire command string, the command string after they
appear, or the command string until the next flag shows up?  It's just not
worth the effort, especially since the problem can be solved without any
further effort by separating the commands, e.g. "nm -m /vmunix; nm -p
/old_vmunix".  It should also be noted that the latter approach is much simpler
for Joe User to cope with.

> But sometimes it's necessary. Like the above example. Or like any reasonable
> permutation of "find".

No, not true.  In either case.  For connect it's no more necessary than it's
necessary for ls.  And, on the basis of the 30 seconds of thought I've just
devoted to the problem, find doesn't need it either.

> The "tail" on the Tek development system I've been using has exactly that
> change, and it causes much heartbreak & swearing every time I forget and
> type "tail -60" instead of "tail -e 60".

A problem.  For some reason UNIX decided early on that numbers didn't need
flags, while other arguments did, and people are used to that.  Perhaps an
alias would be a nice solution here.  I suspect that after a little practice
you'd become comfortable entering "tail -e60"; after all, you aren't suprised
when "mt /dev/rmt0 off" fails, are you?  Why should tail be any different, just
because it's argument is numeric.  It's the price you pay for not having to
list arguments in a specific order.

> I never said it did use stdio. All I said was that it's not of negligable
> size.

OK, I'll rephrase my answer.  It's not significantly bigger than the code
you're going to have to write to parse the same arguments.  And it's going
to be consistent, and it's going to be bug free, blah, blah, blah, ad nauseum.

> 	Well, the program I provided does all these things too, and allows you
> to handle multiple sets of options, variant option flags, and so on.

No, your program handled a special case.  And I'll have to rewrite it each
time, twitching it just a little, to fit each new special case.  I'm not
saying that you're never going to have to write such a beast.  getopt just
makes those joyful occasions a rarity.

Keith Bostic

mike@whuxl.UUCP (BALDWIN) (10/24/85)

Not this again!

> But according to the docs & both versions of getopt that have shown up on the
> net that won't do the same thing. According to them, you need:
> 
> 	sort -mubdfincr -tx

> Now then: you may have an improved version of getopt, or the versions posted
> to the net may be incomplete or innacurate. In either case you still can't use
> *AVAILABLE* versions of getopt to parse those args.

The most standard version I can think of is the one with System V.  IT CAN
PARSE "sort -mubdfnicrtx" JUST FINE.  And it certainly is not only AVAILABLE,
it is in the public domain.

> Not according to what I've seen. Getopt requires that flags with arguments
> stand alone.

You are confusing the "Proposed Syntax Standard for UNIX System Commands"
with getopt(3C).  Getopt only enforces SOME of those rules.  In particular,
it does NOT enforce Rule 6: "The first option-argument following an option
must be preceded by white space" or Rule 5: "Options with no arguments may
be grouped behind one delimiter."  That is, it allows options with arguments
to be grouped with other options.  The getopt man page doesn't say much at
all about whitespace, except for this: "if a letter is followed by a colon,
the option is expected to have an argument that may or may not be separated
from it by white space."  That's what you want, RIGHT?

> A counterexample to show you what I'm talking about:
> 
> 	connect: a UNIX modem program that I wrote. It allows a series of
> phone numbers on the command line & keeps trying them until it gets one that
> works. Handy for calling bbs-es:
> 	usage: connect -s<baud> -l<line> number...
> 		Note: direct is considered a number for compatibility with cu.
> 
> 	connect -s 1200 4445555 4446666 -s300 5556666 6667777 -l tty1 direct
> 
> How would you deal with that using getopt, which seems to require that all
> options be before all arguments?

This is not a problem.  Use the documented optind external variable:

	while (optind < argc)
		switch (getopt(argc, argv, "s:l:")) {
		case 's':
			speed = atoi(optarg);
			break;
		case 'l':
			strcpy(line, optarg);
			break;
		default:
			call(argv[optind++], speed, line);
			break;
		}

In fact, this is how getopt is used for System V cc.  Also, you said
something about getopt ignoring the order of arguments.  Again, you're
confusing the Proposed Syntax with getopt!  Getopt just returns you the
options in the order they were given and you can do whatever you want
with them!!

> I never said it did use stdio. All I said was that it's not of negligable
> size.

But it's not anything to worry about.  It doesn't use stdio, and it is
smaller than, e.g., atof, qsort, malloc, crypt, and ctime.

I really wish you would read things more carefully and not get all
worked up over situations that don't exist.  Nearly everything you've
said about getopt has been just plain WRONG.
-- 
						Michael Baldwin
						{at&t}!whuxl!mike

levy@ttrdc.UUCP (Daniel R. Levy) (10/24/85)

In article <324@graffiti.UUCP>, peter@graffiti.UUCP (Peter da Silva) writes:
>	connect: a UNIX modem program that I wrote. It allows a series of
>phone numbers on the command line & keeps trying them until it gets one that
>works. Handy for calling bbs-es:
>	usage: connect -s<baud> -l<line> number...
>		Note: direct is considered a number for compatibility with cu.
>
>	connect -s 1200 4445555 4446666 -s300 5556666 6667777 -l tty1 direct
>
>How would you deal with that using getopt, which seems to require that all
>options be before all arguments?
>
>Peter da Silva

Maybe with a bit of change in the command line syntax, it would be amenable
to getopt.  Remember, that there is nothing keeping the same flag from being
used more than once:

   connect -s 1200 -n 4445555,4446666 -s300 -n 5556666,6667777 -l tty1 direct

If you MUST keep the original syntax (mixing flags with nonflag arguments)
you can still use getopt with a little bit of shimming.  Just increment
optind (presuming it is still smaller than argc) after getopt has returned
EOF, check that the first character of the corresponding argument is a '-'
(i.e., another flag, else handle the argument specially) then jump back
into the loop calling getopt.  It's still cleaner looking inside the program
than a brute force parse.

Of course someone is going to ask what if the argument was supposed to begin
with '-' and it is not a flag.  Oh well, life ain't easy....
-- 
 -------------------------------    Disclaimer:  The views contained herein are
|       dan levy | yvel nad      |  my own and are not at all those of my em-
|         an engihacker @        |  ployer or the administrator of any computer
| at&t computer systems division |  upon which I may hack.
|        skokie, illinois        |
 --------------------------------   Path: ..!ihnp4!ttrdc!levy

bc@cyb-eng.UUCP (Bill Crews) (10/26/85)

> > Getopt is a good idea, folks.
> > 	-- it provides consistent syntax error messages
> > 	-- most programmers don't handle bizarre flag/argument combinations;
> > 		getopt takes care of that problem.
> > 	-- simplifies the effort of writing a command interface to the
> > 		copying of a while loop from your last program and editing
> > 		a couple of lines.
> 
> 	Well, the program I provided does all these things too, and allows you
> to handle multiple sets of options, variant option flags, and so on.
> 
> > Keith Bostic
> 
> Peter da Silva

If you can get your getopt replacement approved by the ANSI Unix standards
committee, fine.  If it becomes popular and widely offered and used, fine.
Otherwise, all you are doing is providing yet another clever program whose
user interface is different from others in a fundamental way.  Any standard
function by definition limits one, but the existence of a standard has value
too, which must be weighed against the value of the proliferation of
cleverness.  I am not studied enough to have an opinion as to whether getopt
is currently comprehensive or flexible enough.  If it can be made more flexible
without leaving a user who hasn't used a given command before totally in the
dark as to how it might work, then let's do it, but let's do it soon and
then batten down the hatches, so we can have some consistency.  And more than
anything else, PLEASE support whatever standard the committee adopts by USING
whatever form of getopt is blessed!
-- 
	- bc -

..!{seismo,topaz,gatech,nbires,ihnp4}!ut-sally!cyb-eng!bc  (512) 835-2266

peter@graffiti.UUCP (Peter da Silva) (10/30/85)

> Not this again! ... you're
> confusing the Proposed Syntax with getopt!  Getopt just returns you the
> options in the order they were given and you can do whatever you want
> with them!!

OK. You win. Can I go back to flaming the proposed standard instead?
-- 
Name: Peter da Silva
Graphic: `-_-'
UUCP: ...!shell!{graffiti,baylor}!peter
IAEF: ...!kitty!baylor!peter