cfv@packet.UUCP (07/25/83)
*sigh* Its time for another in an unending series of flames on the
wonderfulness of Un*x and the way you can put lots of funny characters in
its filenames. Last week I had a program that I wrote a few months ago die
terribly and take the system with it. 7 hours and half a case of console
logs later, it turns out that the wonderful Un*x (TM B*ll L*bs) system did
it to me again...
Background: I have a program that generates a map of the Unix filesystem
and then passes part of that map along to another program. For various
reasons I did this by generating a string of the form '<cmd> <filename>
...' and giving it to system(). I learned very early in the process all
about control characters and white space (ingres is REAL good at putting
spaces in filenames... *sigh*) and to quote out those names, but last week
someone really pulled a winner and put a file named 'foo;init;bar' onto the
system (actually, it had been there but the program finally went after it
for the first time). The system proceeded to parse this as '<cmd>
<filename> ... foo ; init ; bar <filename> ...' and since the program runs
as root, it proceeded to start a second init, run /etc/rc, and all that
neat stuff.
Foreground: the fix on this specific problem is simple. I expanded the
quoting mechanism for control characters and things to all files. This
means that it takes more system calls to do the same work, but it is much
safer. It doesn't solve the problem, however. I really believe that there
either needs to be a way to run the shell without any parsing or Un*x needs
to restrict the use of some of its more dangerous characters (such as
control characters, spaces, and the set [*;./{}] from being used as a file
name on the system. How many times have you had to help someone access a
file that had a wierd character in it? From what I have seen, they create
many more problems than they solve.....
--
>From the dungeons of the Warlock:
Chuck Von Rospach
ucbvax!amd70!packet!cfv
(chuqui@mit-mc) <- obsolete!
jbray@bbn-unix@sri-unix.UUCP (07/27/83)
From: James Bray <jbray@bbn-unix> [Special characters in filenames] sure as heck do cause problems. I was just talking to someone who was having what appeared to be filesystem problems: it ended up after a lot of other weirdness that there were apparently two distinct live entries in a directory with the same name. I od'd the thing and the mystery became clear: one of them had a mispelling and a couple of backspaces in it. The victim's .profile had gotten lost somehow, and they were set up with default characters. I myself would be quite happy if filenames were as restricted as C identifiers, altho' a lot of stuff would have to be rewritten... In addition to alphanumerics and underbar, it might be a good idea to allow a few things like ".+-", and perhaps even "~". But I would like to see them much more restricted than they are now. --Jim Bray
jfw@mit-ccc@mit-mc@sri-unix.UUCP (07/28/83)
Gee, why don't we just go back to radix-50(8) filenames? The problem is not excessive generality (gee, why don't we refer to files by their filesystem number/inode number), but the poor state of tools which accidentally create file names that other poorly designed tools cannot handle or mishandle. File names need not be confusing; Berkeley's `ls' can optionally show ``bizarre'' characters in octal escape form (perhaps, in dumping to tty's, it should normally do so?); perhaps the shell should accept octal escapes. (Let us defer the human engineering aspects of /bin/sh for a later date.) Restricting the operating system to the point where mistakes are impossible is, I feel, the wrong attitude (ask Kurt Goedel about it, too). Rather, let us endeavor to ensure that programs utilize the generality well. We'll never eliminate bugs, no matter how few characters are in the space of file-name characters...
joe@cvl.UUCP (Joseph I. Pallas) (07/28/83)
Why, pray tell, are you using system() to do what you're doing? It seems grossly unfair to blame the UNIX (tm) system for your misusage of one of its facilities. Clearly, if you'd already run into problems with quoting filenames, you should have been using fork/execv. If that's not good enough, then execvp, which gives all the power that system() does for finding your command, WITHOUT doing what you didn't want it to do. Furthermore, it's more efficient. System() does exactly what the manual says: "pass a command to the shell." On the other side, I've seen lots of problems with naive users making files they can't seem to remove, because they start with spaces, or contain (this was the most common) '^[[D' (the left arrow key on most of our terminals). I've heard about a no-funny-file-names mode that some people have implemented, which wouldn't put space, control chars, or meta chars (8th bit set) into file names. I'm sure that the 'bad' characters you mention could be included. The idea goes against my grain, but then, I'm not a neophyte. My solution to our problem was simply to turn on ctlecho in the tty driver in everyone's .login (4.1bsd). Usually, someone has to work hard to create filenames that contain a semicolon, a slash, or a bracket. When he does, he generally has to ask for help to get rid of them. That seems alright to me. Because he'd probably have to ask for help anyway for an explanation of the funny error he would get in novice-mode. One other thing...why does expanding your quoting mechanism to quote all files instead of certain ones "take more system calls to do the same work," when it seems to me that it should use the same number of system calls in the shell, and be more efficient in your program (no tests for special characters)? Joseph Pallas rlgvax!cvl!joe
ian@utcsstat.UUCP (07/29/83)
Background: I have a program that generates a map of the Unix filesystem and then passes part of that map along to another program. For various reasons I did this by generating a string of the form '<cmd> <filename> ...' and giving it to system(). I learned very early in the process all about control characters and white space (ingres is REAL good at putting spaces in filenames... *sigh*) and to quote out those names, but last week someone really pulled a winner and put a file named 'foo;init;bar' onto the system (actually, it had been there but the program finally went after it for the first time). The system proceeded to parse this as '<cmd> <filename> ... foo ; init ; bar <filename> ...' and since the program runs as root, it proceeded to start a second init, run /etc/rc, and all that neat stuff. Foreground: the fix on this specific problem is simple. I expanded the quoting mechanism for control characters and things to all files. This means that it takes more system calls to do the same work, but it is much safer. It doesn't solve the problem, however. I really believe that there either needs to be a way to run the shell without any parsing or Un*x needs to restrict the use of some of its more dangerous characters (such as control characters, spaces, and the set [*;./{}] from being used as a file name on the system. How many times have you had to help someone access a file that had a wierd character in it? From what I have seen, they create many more problems than they solve..... The fix is more powerful than you can imagine. Just quote every file name that you pass to the shell, with the single quote character. The shell will not expand characters which are quoted thusly. For example, rm '*' (remove quote star quote) will remove a file whose name consists of an asterisk, rather than all the files in your directory. I just tried it on a fairly standard V7 system. I think it's unfair to say that UNIX did it to you again. I think you did it to yourself this time. Ian F. Darwin, Toronto utcsstat!ian
akmal%nosc@syte.UUCP (07/29/83)
The solution to your problem is simple: use execv() instead of system() !
guy@rlgvax.UUCP (Guy Harris) (07/29/83)
Well, the argument could also be made that unused generality that confuses users who haven't learned the ropes isn't worth the cost. The only subsets of 0x00-0xFF I could see are: 1) 0x00-0x7F - unless you're doing something *really* obscure you don't need the parity bit. I've heard that some systems used it to supply file version numbers ala TENEX (a wonderful way to fill a disk without even trying; I remember seeing a VMS system where somebody had *43* or so versions of one source file. If your OS supports them it should support auto-purge as well, so you can have only the last N versions around. VMS, I'm told, supports it, but I don't know if they make it easy for the user to get at it.), but that the long filenames in 4.2BSD obviated the need for that. Note that even 4.1BSD won't allow you to reference or create a file with ('/'|0200) in it, and 4.1cBSD won't allow you to create a file with any character with the parity bit on in it. 2) printable characters, including space, only - see previous argument. Not quite as nasty as characters with the 200 bit on, because you can at least type them into the shell between quotes. 3) alphanumerics plus underscore only (i.e., like C identifiers) - well, you certainly won't get nailed by metacharacters. However, you really can't leave out ".", for obvious reasons, and RCS users like us will scream bloody murder if you eliminate ",", and.... I suspect this might be *too* restrictive. Restricting only *some* of the metacharacters merely protects against being unable to delete files from *existing* shells; if you've protected against the Bourne shell metacharacters, what about the C shell metacharacters? (Note that I consider space a shell metacharacter in this sense; also note that "ed" does NOT consider space a legal character in a filename). I think that 1) and 2) aren't too controversial - I don't see a compelling reason to allow those characters, and I don't know of any systems that would be seriously inconvenienced by those restrictions. (If anybody does, speak up.) 3) is a different matter. A lot of systems find various special characters useful in file names, so I think a case can be made that all printable ASCII characters plus space should remain legal. Guy Harris {seismo,mcnc,we13,brl-bmd,allegra}!rlgvax!guy
guy@rlgvax.UUCP (Guy Harris) (07/30/83)
I just noticed that the original article included "." in the set of anathematized characters. Almost EVERY operating system in the world allows "." in a file name, for obvious reasons; it's the character that has become the conventional separator between file name and extension! (Yes, I know about systems like CTSS, ITS, etc. that do it with spaces.) "/" was also included, but depending on your point of view that is already absolutely forbidden in file names (because it separates file names within a path name) or it must be allowed in path names (because it separates...). Any PROGRAM which forbade the use of "/" and "." in file names would become very unpopular very quickly; if you couldn't give "/etc/passwd" to an editor it wouldn't become a very popular editor.... It's probably not wise to put restrictions on printable ASCII characters in file names, because SOMEBODY out there might have to use that character in a file name, or want to use it. Hell, in our office automation system (where the end-user interacts by filling in forms to provide command arguments, so we don't need token separator characters) I frequently create files with blanks in their name just because it's a more logical separator than "_" or "-". Most of the cases I've seen where a filename truly caused problems were cases where a character had the eighth bit on or was a control character; only the first can't be solved using the shell's quoting mechanism (ironically, it's because that very quoting mechanism uses the eighth bit internally - in all the major UNIX shells, I believe - to indicate quoted characters). Files with other characters don't cause problems for UNIX per se, just for users using a shell that gives that character significance. Just saying "if you have to deal with a file which contains these characters in their name, remember to put the name in quotes in the shell" should suffice for most of these cases. If the user isn't even using a standard UNIX shell the problem may not even occur. For programs, the problem either won't occur because you aren't using something like "system" or, if you *really* need to use "system", the problem can be prevented by using the same quote characters as you would use when typing the command yourself. Guy Harris {seismo,mcnc,we13,brl-bmd,allegra}!rlgvax!guy
nrh@inmet.UUCP (07/30/83)
#R:packet:-31700:inmet:10300005:000:137 inmet!nrh Jul 29 14:39:00 1983 Didn't I hear something about "The operating system should make as few real choices as possible...". Oh? that's "old hat"? Oh well....
jbray@bbn-unix@sri-unix.UUCP (08/01/83)
From: James Bray <jbray@bbn-unix> I was discussing this with my collegue Ralph Muha, who suggested simply disallowing anything <= ' ' or =>= \177 (<delete>). This is delightfully straightforward, and I don't think there could be any legitimate complaints about such restriction. Even if it should inconvenience someone, I would still be more concerned with the people like the poor user who called me up with what certainly looked like a filesystem problem until we saw that what had happened was that her tty was set to default special characters and she had one file called "broadcast" and another called "brao\b\boacast" (in the same directory) which looked the same until I od'd it. Now we can say that next time we will od the directory the very first thing something looks like this, and I am sure I would, but not all users are hackers... I think I'll do it, stick the test in wdir() or some such place. --Jim
scw@ucla-locus@cepu.UUCP (08/01/83)
From: Steve Woods <cepu!scw@ucla-locus> I was discussing this with my collegue Ralph Muha, who suggested . . . stick the test in wdir() or some such place. --Jim I would agree with you except that ' ' is a valid char in a file name (as someone said earler ingres generates almost all of its names with spaces in them). Alos there are other thing running around that need/want filenames to have spaces in them (such as protection files to keep *DUMB* people/prolllllers from 'rm *' (a file in that directory named ' protect' with mode a-w will make rm pause (hopefully ' protect' will be the first file that is found)).
phil.rice@rand-relay@sri-unix.UUCP (08/02/83)
From: Bill.LeFebvre <phil.rice@rand-relay> This is in response to all the articles that have been suggesting alternate sets of legal characters for file names, and specifically in response to James Bray @ bbn-unix. 1) Restricting the characters to the range of printable ASCII characters (>= ' ' and <= \177) does not solve the original problem that started this whole discussion. If you recall, that discussion dealt with giving filenames that had shell meta-characters to a `system'. 2) The UCB `ls' (as has been pointed out before) has an option (`-b' if you're interested) so that an `od' of a directory is not necessary to see unusual file names. I have no idea what system you normally use. 3) There are THREE illegal characters in 4.1BSD filenames. Two have been mentioned previously: '\0' and '/'. '\257' is also illegal (this is a slash with the eighth bit on). 4) I have always been disgusted with operating systems that restrict file names in unnecessary manners. Unix is the ONLY operating system that I have found that places no arbitrary restrictions on file names. The three restricted characters are forbidden for very obvious reasons. Anything else would be unnecessary. I may agree with disallowing characters with the eighth bit on, but all the other restrictions seem totally arbitrary and unnecessary. Unix has never taken the "protect the moron" attitude before, let's not start it now! 5) I'm getting sick and tired of seeing messages about file names. We have beaten the topic to death. Shall we go on to something a little more interesting? William LeFebvre ARPANet: phil.rice@Rand-Relay CSNet: phil@rice USENet: ...!lbl-csam!rice!phil
jbray@bbn-unix@sri-unix.UUCP (08/03/83)
From: James Bray <jbray@bbn-unix> 1) Restricting the characters to the range of printable ASCII characters (>= ' ' and <= \177) does not solve the original problem that started this whole discussion. If you recall, that discussion dealt with giving filenames that had shell meta-characters to a `system'. I could care less about the origin of the discussion. I got involved because a user, specifically a user at the Network Operations Center, had this problem of having --unknown to her-- her .profile blown away, and having gotten a few '\b's into a filename before she realized this. This user is not a hacker, but she is no "moron". I am sick and tired of elitist hackers who think that everyone who is not one of them is a "moron". And I think we can all agree that we have a certain vested interest in the proper operation of the Network Operations Center... 2) The UCB `ls' (as has been pointed out before) has an option (`-b' if you're interested) so that an `od' of a directory is not necessary to see unusual file names. I have no idea what system you normally use. We support extended USG. In any case post-mortems do little for the patient. 4) I have always been disgusted with operating systems that restrict file names in unnecessary manners. Unix is the ONLY operating system that I have found that places no arbitrary restrictions on file names. The three restricted characters are forbidden for very obvious reasons. Anything else would be unnecessary. I may agree with disallowing characters with the eighth bit on, but all the other restrictions seem totally arbitrary and unnecessary. Unix has never taken the "protect the moron" attitude before, let's not start it now! Unix is getting taken seriously as an operating system, and not just a hacker's toy. Some changes may well be necessary to accomodate the real people who use it. So far we have heard no one claim to actually want to put control characters in filenames. A capability that no one wants to use but that can cause trouble if anyone inadvertently takes advantage of it is a decided misfeature. I would be most interested to hear from the Founding Fathers if they allowed control characters by design, by omission, or by pdp11 address-space saving... I would also be most interested to hear from anyone who thinks there is a legitimate use for control characters in filenames, because if there is I will not disallow them. But we sell our Unix, we don't just play with it, and I feel a certain responsibility to those users out there who are much less experienced than our NOC controller, and who don't have a handy hacker just four digits away... 5) I'm getting sick and tired of seeing messages about file names. We have beaten the topic to death. Shall we go on to something a little more interestinbeaver Organization: University of Washington, Dept. of Computer Science Contact: James Rees Phone: (206) 545-0912 Postal-Address: FR-35, Seatt
guyton@rand-unix@sri-unix.UUCP (08/03/83)
James Bray asks: But please tell me, anyone, if you can think of a good use for unprintables in filenames. I have three examples that come to mind: 1) USC-ISI used control characters to fake version numbers in their implementation of Interlisp. 2) T[w]enex often puts control characters in filenames to make them hard to delete (the archive directory used to be kept in the user's directory with a bizarre filename). 3) Our IBM folks put a blank and a backspace in the name of our Wylbur keyword file (to try and make it harder for hackers to try and crack it). One point is clear from the above, Unix isn't the only OS that allows bizarre characters in filenames. -- Jim
ron@brl-bmd@sri-unix.UUCP (08/04/83)
From: Ron Natalie <ron@brl-bmd> I remember back in the days of early computer days when we got our first HP timesharing system at school (I had gotten pretty good at repairing card punches and wiring up 403 program boards). Originally it was possible to use all kinds of characters in the file names. I always thought that "MAGIC[]" was a marvelous name for a magic square program. I was very disheartened when a new release came out forbidding anything other than alphanumerics in the file names. -Ron
smh@mit-eddie.UUCP (Steven M. Haflich) (08/05/83)
All this discussion about allowing/disallowing funny chars in filenames has missed that sometimes (rarely) one really wants a funny char in a filename: for security. For example, system administrators at academic sites with limited disk space can reduce the number of private copies of games by protecting sources (and even executables) behind `untypable' pathnames. It doesn't work against real gurus, but it can make a big difference. Depending on the uniformity and baudrate of the local terminals, a filename with a CURSOR_UP at the end of a filename can be essentially *invisible* to many forms of the ls command (alas, it was so on V7, but not so on 4.1, etc.). The above may seem somewhat frivolous, but I could also imagine similarly protecting a cron-invoked security demon (you know, one of those things that searches for strange setuid files, etc) behind such protection. Why make it easy? No, I don't use this method myself, but I have seen it used elsewhere. Steve Haflich, ...!genrad!mit-eddie!smh
ka@spanky.UUCP (08/06/83)
As far as I know, ed(1) is the only standard UN*X program which relies on funny file names. (It uses bel characters to generate unique file names.) Anybody out there still use ed? Kenneth Almquist
thomas@utah-gr.UUCP (Spencer W. Thomas) (08/06/83)
"Twenex used to put funny characters in file names ..." This is true, but in order to put a funny character into a filename, it MUST be preceded by a ^V, the monitor will not accept it otherwise. Not quite the same as the Unix situation. 'Course, you've got to quote all sorts of characters over there, right off the top of my head, the list includes []<>.:,@ as well as more esoteric characters. (Actually, one dot per filename may be left unquoted.) =Spencer
greep@su-dsn@sri-unix.UUCP (08/06/83)
Re: your suggestion to enclose file names in single quotes, that works as long as the file name has no single quote characters in it. If it does, they can be quoted with the double quote character. For example, a file named "' (i.e. a double quote character followed by a single quote character) could be deleted with the command: rm '"'"'"
hamilton@uiucuxc.UUCP (08/07/83)
#R:sri-arpa:-352400:uiucuxc:5500065:000:126 uiucuxc!hamilton Aug 5 21:31:00 1983 control characters are useful too; in my bin directory, i keep clones of "clear" and "uptime" in files named ^L and ^T, resp.
ka@spanky.UUCP (08/10/83)
Re: Where ed(1) uses bell characters Try looking at around line 2125 in the routine mkfunny: *p2 = '\007'; /* add unprintable char to make funny a unique name */ Kenneth Almquist
gwyn@brl-vld@sri-unix.UUCP (08/10/83)
From: Doug Gwyn (VLD/VMB) <gwyn@brl-vld> Gee, you're right. Starting with System III (apparently) there is this kludge to handle files with 1 link differently from files with multiple links when the file is written out. In the process they invent a temporary file in the same directory as the real file, and a BEL character is used to ensure that the temp name does not conflict with any possible real filenames in the directory. This does reinforce the argument that "normal" user filenames do not have control characters. I had thought you were talking about the /tmp files; my apologies.
emrath@uiuccsb.UUCP (08/16/83)
#R:mit-eddi:-54600:uiuccsb:14900002:000:695 uiuccsb!emrath Aug 16 00:03:00 1983 I remember dec's DOS opsys for the pdp-11. The open system call took 3 words of "rad50" (48 bits) as the file name. However, all bit patterns except all 0s were valid as far as the filing system calls were concerned. All 0s meant the directory entry was free. Sure, U could write your own program to create weird file names, but there also were system calls used for parsing file specifiers. All the system programs, and most of the user stuff we wrote, used the same parsing routine, so everybody stayed happy, most of the time. The restrictions on file names were imposed by essentially a library routine (though this happened to be part of the "kernel", god forbid). Sounds all too familiar.