[comp.unix.programmer] Recognizing pathnames with lex/yacc

scs@adam.mit.edu (Steve Summit) (04/28/91)

In article <1991Apr28.015706.23505@agate.berkeley.edu> ilan343@violet.berkeley.edu (Geraldo Veiga) writes:
> What kind of lex rule would describe a general UNIX path
> name?  Should I just accept any string and pass it to yacc?

Absolutely.  Actually, you should pass it directly on to fopen,
neither attempting to do any interpretation of it at the lex nor
yacc level.

Only the operating system kernel should interpret filenames.  Any
time an application program attempts to do so, it invariably gets
some nuance of it wrong, or doesn't accommodate some later change
to the kernel's rules (perhaps some special syntax for network
files).

(As an example, a colleague uses a graphics package which wants
him to specify some directory where all his graphs will be
stored.  He didn't want to store all his graphs in one directory --
he wanted to store them in several project directories, along
with their data.  The vendor said this couldn't be done.  He hit
upon the idea of specifying "." as the graphics directory, which
worked for a while, except that it began to fail occasionally,
with charming results such as losing data that had taken half an
hour to generate.  The vendor said it was his fault for not
specifying a proper directory.)

Sometimes you have to play with filename syntax, usually to
separate a pathname into directory and filename components, or
to splice a directory and a filename back together to form a
filename.  It's best to relegate these to subroutines ("parsepath"
and "splicepath" or the like) in the operating-system-specific
portion of the code.  Implement them simply and straightforwardly,
and don't try to get clever or fancy.  (In my colleague's case,
the graphics package apparently couldn't splice the directory "."
and a filename together correctly.)

In the case of Unix, the definition of a legal filename really is
any string.  The only special characters are / and \0, but their
special-casedness doesn't make any strings "illegal."  The only
things you could ever want to check are:

     1.	Under some versions of Unix, the length of any single
	pathname component is limited, typically to 14
	characters.  You should only check this if you'll be
	appending some extension to the filename, and need to
	make sure there is "room" (perhaps to ensure that the
	name+extension is distinct, in 14 characters, from the
	name alone).  Don't check it just to generate an error
	message, if you would have passed it along to the kernel
	unchanged -- let the kernel generate that error, if it
	wants to.  (Usually, it truncates silently, so your
	application should reflect that behavior.)

     2.	Under some circumstances, you might want to issue a
	warning, or request confirmation, if the name is of a
	file to be created and if it contains spaces, or maybe
	also if it contains characters which are typically
	"special" to the shells or other file utilities: *, ?, -,
	', ", \, &, etc.  If you're too fussy, though, you'll
	aggravate expert users.

Before you do anything, though, ask yourself what would happen if
you passed the filename on to fopen unchanged.  If, when there is
anything "wrong" with the filename, fopen would return NULL and
perror would print a meaningful message, just let them take care
of it, since you have to check for fopen errors and diagnose them
correctly anyway.

Followups should probably go to comp.unix.programmer .

                                            Steve Summit
                                            scs@adam.mit.edu