scs@adam.mit.edu (Steve Summit) (04/28/91)
In article <1991Apr28.015706.23505@agate.berkeley.edu> ilan343@violet.berkeley.edu (Geraldo Veiga) writes: > What kind of lex rule would describe a general UNIX path > name? Should I just accept any string and pass it to yacc? Absolutely. Actually, you should pass it directly on to fopen, neither attempting to do any interpretation of it at the lex nor yacc level. Only the operating system kernel should interpret filenames. Any time an application program attempts to do so, it invariably gets some nuance of it wrong, or doesn't accommodate some later change to the kernel's rules (perhaps some special syntax for network files). (As an example, a colleague uses a graphics package which wants him to specify some directory where all his graphs will be stored. He didn't want to store all his graphs in one directory -- he wanted to store them in several project directories, along with their data. The vendor said this couldn't be done. He hit upon the idea of specifying "." as the graphics directory, which worked for a while, except that it began to fail occasionally, with charming results such as losing data that had taken half an hour to generate. The vendor said it was his fault for not specifying a proper directory.) Sometimes you have to play with filename syntax, usually to separate a pathname into directory and filename components, or to splice a directory and a filename back together to form a filename. It's best to relegate these to subroutines ("parsepath" and "splicepath" or the like) in the operating-system-specific portion of the code. Implement them simply and straightforwardly, and don't try to get clever or fancy. (In my colleague's case, the graphics package apparently couldn't splice the directory "." and a filename together correctly.) In the case of Unix, the definition of a legal filename really is any string. The only special characters are / and \0, but their special-casedness doesn't make any strings "illegal." The only things you could ever want to check are: 1. Under some versions of Unix, the length of any single pathname component is limited, typically to 14 characters. You should only check this if you'll be appending some extension to the filename, and need to make sure there is "room" (perhaps to ensure that the name+extension is distinct, in 14 characters, from the name alone). Don't check it just to generate an error message, if you would have passed it along to the kernel unchanged -- let the kernel generate that error, if it wants to. (Usually, it truncates silently, so your application should reflect that behavior.) 2. Under some circumstances, you might want to issue a warning, or request confirmation, if the name is of a file to be created and if it contains spaces, or maybe also if it contains characters which are typically "special" to the shells or other file utilities: *, ?, -, ', ", \, &, etc. If you're too fussy, though, you'll aggravate expert users. Before you do anything, though, ask yourself what would happen if you passed the filename on to fopen unchanged. If, when there is anything "wrong" with the filename, fopen would return NULL and perror would print a meaningful message, just let them take care of it, since you have to check for fopen errors and diagnose them correctly anyway. Followups should probably go to comp.unix.programmer . Steve Summit scs@adam.mit.edu