[net.unix-wizards] funny characters in filenames

cfv@packet.UUCP (07/25/83)

*sigh* Its time for  another  in  an  unending  series  of  flames  on  the
wonderfulness  of  Un*x and the way you can put lots of funny characters in
its filenames.  Last week I had a program that I wrote a few months ago die
terribly  and  take  the system with it. 7 hours and half a case of console
logs later, it turns out that the wonderful Un*x (TM B*ll L*bs) system  did
it to me again...

Background:  I have a program that generates a map of the  Unix  filesystem
and  then  passes  part  of that map along to another program.  For various
reasons I did this by generating a string of  the  form  '<cmd>  <filename>
...'  and  giving  it to system().  I learned very early in the process all
about control characters and white space (ingres is REAL  good  at  putting
spaces  in filenames... *sigh*) and to quote out those names, but last week
someone really pulled a winner and put a file named 'foo;init;bar' onto the
system  (actually,  it had been there but the program finally went after it
for the  first  time).  The  system  proceeded  to  parse  this  as  '<cmd>
<filename>  ... foo ; init ; bar <filename> ...' and since the program runs
as root, it proceeded to start a second init, run  /etc/rc,  and  all  that
neat stuff.

Foreground: the fix on this specific problem  is  simple.  I  expanded  the
quoting  mechanism  for  control  characters and things to all files.  This
means that it takes more system calls to do the same work, but it  is  much
safer.  It doesn't solve the problem, however.  I really believe that there
either needs to be a way to run the shell without any parsing or Un*x needs
to  restrict  the  use  of  some  of its more dangerous characters (such as
control characters, spaces, and the set [*;./{}] from being used as a  file
name  on  the system.  How many times have you had to help someone access a
file that had a wierd character in it?  From what I have seen, they  create
many more problems than they solve.....
-- 
>From the dungeons of the Warlock:
					      Chuck Von Rospach
					      ucbvax!amd70!packet!cfv
					      (chuqui@mit-mc)  <- obsolete!

jbray@bbn-unix@sri-unix.UUCP (07/27/83)

From:  James Bray <jbray@bbn-unix>

[Special characters in filenames] sure as heck do cause problems. I was just
talking to someone who was having what appeared to be filesystem problems:
it ended up after a lot of other weirdness that there were apparently two
distinct live entries in a directory with the same name. I od'd the thing
and the mystery became clear: one of them had a mispelling and a couple of
backspaces in it. The victim's .profile had gotten lost somehow, and they
were set up with default characters. I myself would be quite happy if filenames
were as restricted as C identifiers, altho' a lot of stuff would have to be
rewritten... In addition to alphanumerics and underbar, it might be a good
idea to allow a few things like ".+-", and perhaps even "~". But I would
like to see them much more restricted than they are now.
--Jim Bray

jfw@mit-ccc@mit-mc@sri-unix.UUCP (07/28/83)

Gee, why don't we just go back to radix-50(8) filenames?  The problem is
not excessive generality (gee, why don't we refer to files by their
filesystem number/inode number), but the poor state of tools which accidentally
create file names that other poorly designed tools cannot handle or mishandle.
File names need not be confusing; Berkeley's `ls' can optionally show
``bizarre'' characters in octal escape form (perhaps, in dumping to tty's,
it should normally do so?); perhaps the shell should accept octal escapes.
(Let us defer the human engineering aspects of /bin/sh for a later date.)

Restricting the operating system to the point where mistakes are impossible
is, I feel, the wrong attitude (ask Kurt Goedel about it, too).  Rather,
let us endeavor to ensure that programs utilize the generality well.  We'll
never eliminate bugs, no matter how few characters are in the space of
file-name characters...

joe@cvl.UUCP (Joseph I. Pallas) (07/28/83)

Why, pray tell, are you using system() to do what you're doing?  It
seems grossly unfair to blame the UNIX (tm) system for your misusage of
one of its facilities.  Clearly, if you'd already run into problems
with quoting filenames, you should have been using fork/execv.  If that's
not good enough, then execvp, which gives all the power that system()
does for finding your command, WITHOUT doing what you didn't want it to
do.  Furthermore, it's more efficient.  System() does exactly what the
manual says:  "pass a command to the shell."

On the other side, I've seen lots of problems with naive users making
files they can't seem to remove, because they start with spaces, or
contain (this was the most common) '^[[D' (the left arrow key on most
of our terminals).  I've heard about a no-funny-file-names mode that
some people have implemented, which wouldn't put space, control chars,
or meta chars (8th bit set) into file names.  I'm sure that the 'bad'
characters you mention could be included.  The idea goes against my
grain, but then, I'm not a neophyte.  My solution to our problem was
simply to turn on ctlecho in the tty driver in everyone's .login
(4.1bsd).  Usually, someone has to work hard to create filenames that
contain a semicolon, a slash, or a bracket.  When he does, he
generally has to ask for help to get rid of them.  That seems alright
to me.  Because he'd probably have to ask for help anyway for an
explanation of the funny error he would get in novice-mode.

One other thing...why does expanding your quoting mechanism to quote
all files instead of certain ones "take more system calls to do the
same work,"  when it seems to me that it should use the same number of
system calls in the shell, and be more efficient in your program (no
tests for special characters)?


Joseph Pallas
rlgvax!cvl!joe

ian@utcsstat.UUCP (07/29/83)

	Background:  I have a program that generates a map of the
	Unix  filesystem and  then  passes  part  of that map along to
	another program.  For various reasons I did this by generating
	a string of  the  form  '<cmd>  <filename>
	...'  and  giving  it to system().  I learned very early in the
	process all
	about control characters and white space (ingres is REAL  good
	at  putting spaces  in filenames... *sigh*) and to quote out
	those names, but last week someone really pulled a winner and
	put a file named 'foo;init;bar' onto the system  (actually,  it
	had been there but the program finally went after it for the
	first  time).  The  system  proceeded  to  parse  this  as
	'<cmd> <filename>  ... foo ; init ; bar <filename> ...' and
	since the program runs as root, it proceeded to start a second
	init, run  /etc/rc,  and  all  that neat stuff.

	Foreground: the fix on this specific problem  is  simple.  I
	expanded  the quoting  mechanism  for  control  characters and
	things to all files.  This means that it takes more system
	calls to do the same work, but it  is  much safer.  It doesn't
	solve the problem, however.  I really believe that there either
	needs to be a way to run the shell without any parsing or Un*x
	needs to  restrict  the  use  of  some  of its more dangerous
	characters (such as control characters, spaces, and the set
	[*;./{}] from being used as a  file name  on  the system.  How
	many times have you had to help someone access a file that had
	a wierd character in it?  From what I have seen, they
	create many more problems than they solve.....

The fix is more powerful than you can imagine.  Just quote every file name
that you pass to the shell, with the single quote character. The shell
will not expand characters which are quoted thusly. For example,
	rm '*'  (remove quote star quote)
will remove a file whose name consists of an asterisk, rather than all
the files in your directory. I just tried it on a fairly standard V7 system.

I think it's unfair to say that UNIX did it to you again. I think you did it
to yourself this time.

Ian F. Darwin, Toronto
utcsstat!ian

akmal%nosc@syte.UUCP (07/29/83)

The solution to your problem is simple: use execv() instead of system() !

guy@rlgvax.UUCP (Guy Harris) (07/29/83)

Well, the argument could also be made that unused generality that confuses
users who haven't learned the ropes isn't worth the cost.  The only subsets
of 0x00-0xFF I could see are:

1) 0x00-0x7F - unless you're doing something *really* obscure you don't need
the parity bit.  I've heard that some systems used it to supply file version
numbers ala TENEX (a wonderful way to fill a disk without even trying; I
remember seeing a VMS system where somebody had *43* or so versions of one
source file.  If your OS supports them it should support auto-purge as well,
so you can have only the last N versions around.  VMS, I'm told, supports it,
but I don't know if they make it easy for the user to get at it.), but that
the long filenames in 4.2BSD obviated the need for that.  Note that even
4.1BSD won't allow you to reference or create a file with ('/'|0200) in it,
and 4.1cBSD won't allow you to create a file with any character with the
parity bit on in it.

2) printable characters, including space, only - see previous argument.
Not quite as nasty as characters with the 200 bit on, because you can at
least type them into the shell between quotes.

3) alphanumerics plus underscore only (i.e., like C identifiers) - well,
you certainly won't get nailed by metacharacters.  However, you really
can't leave out ".", for obvious reasons, and RCS users like us will scream
bloody murder if you eliminate ",", and....  I suspect this might be *too*
restrictive.  Restricting only *some* of the metacharacters merely protects
against being unable to delete files from *existing* shells; if you've
protected against the Bourne shell metacharacters, what about the C shell
metacharacters?  (Note that I consider space a shell metacharacter in this
sense; also note that "ed" does NOT consider space a legal character in a
filename).

I think that 1) and 2) aren't too controversial - I don't see a compelling
reason to allow those characters, and I don't know of any systems that
would be seriously inconvenienced by those restrictions.  (If anybody does,
speak up.)  3) is a different matter.  A lot of systems find various special
characters useful in file names, so I think a case can be made that all
printable ASCII characters plus space should remain legal.

	Guy Harris
	{seismo,mcnc,we13,brl-bmd,allegra}!rlgvax!guy

guy@rlgvax.UUCP (Guy Harris) (07/30/83)

I just noticed that the original article included "." in the set of
anathematized characters.  Almost EVERY operating system in the world allows
"." in a file name, for obvious reasons; it's the character that has become
the conventional separator between file name and extension!  (Yes, I know about
systems like CTSS, ITS, etc. that do it with spaces.)  "/" was also included,
but depending on your point of view that is already absolutely forbidden in
file names (because it separates file names within a path name) or it must
be allowed in path names (because it separates...).  Any PROGRAM which forbade
the use of "/" and "." in file names would become very unpopular very quickly;
if you couldn't give "/etc/passwd" to an editor it wouldn't become a very
popular editor....  It's probably not wise to put restrictions on printable
ASCII characters in file names, because SOMEBODY out there might have to use
that character in a file name, or want to use it.  Hell, in our office
automation system (where the end-user interacts by filling in forms to provide
command arguments, so we don't need token separator characters) I frequently
create files with blanks in their name just because it's a more logical
separator than "_" or "-".

Most of the cases I've seen where a filename truly caused problems were cases
where a character had the eighth bit on or was a control character; only the
first can't be solved using the shell's quoting mechanism (ironically, it's
because that very quoting mechanism uses the eighth bit internally - in all
the major UNIX shells, I believe - to indicate quoted characters).  Files
with other characters don't cause problems for UNIX per se, just for users
using a shell that gives that character significance.  Just saying "if you
have to deal with a file which contains these characters in their name, remember
to put the name in quotes in the shell" should suffice for most of these
cases.  If the user isn't even using a standard UNIX shell the problem may
not even occur.  For programs, the problem either won't occur because you
aren't using something like "system" or, if you *really* need to use "system",
the problem can be prevented by using the same quote characters as you would
use when typing the command yourself.

	Guy Harris
	{seismo,mcnc,we13,brl-bmd,allegra}!rlgvax!guy

nrh@inmet.UUCP (07/30/83)

#R:packet:-31700:inmet:10300005:000:137
inmet!nrh    Jul 29 14:39:00 1983

Didn't I hear something about "The operating system should make
as few real choices as possible...".  Oh? that's "old hat"?  Oh well....

jbray@bbn-unix@sri-unix.UUCP (08/01/83)

From:  James Bray <jbray@bbn-unix>

I was discussing this with my collegue Ralph Muha, who suggested simply
disallowing anything <= ' ' or =>= \177 (<delete>). This is delightfully
straightforward, and I don't think there could be any legitimate complaints
about such restriction. Even if it should inconvenience someone, I would still
be more concerned with the people like the poor user who called me up with
what certainly looked like a filesystem problem until we saw that what had
happened was that her tty was set to default special characters and she
had one file called "broadcast" and another called "brao\b\boacast" (in
the same directory) which looked the same until I od'd it. Now we can say
that next time we will od the directory the very first thing something looks
like this, and I am sure I would, but not all users are hackers... I think I'll
do it, stick the test in wdir() or some such place.
--Jim

scw@ucla-locus@cepu.UUCP (08/01/83)

From:  Steve Woods <cepu!scw@ucla-locus>

    I was discussing this with my collegue Ralph Muha, who suggested
    .
    .
    .
    stick the test in wdir() or some such place.  --Jim
I would agree with you except that ' ' is a valid char in a file name
(as someone said earler ingres generates almost all of its names with
spaces in them). Alos there are other thing running around that
need/want filenames to have spaces in them (such as protection files to
keep *DUMB* people/prolllllers from 'rm *' (a file in that directory
named ' protect' with mode a-w will make rm pause (hopefully ' protect'
will be the first file that is found)).

phil.rice@rand-relay@sri-unix.UUCP (08/02/83)

From:  Bill.LeFebvre <phil.rice@rand-relay>

This is in response to all the articles that have been suggesting
alternate sets of legal characters for file names, and specifically
in response to James Bray @ bbn-unix.

1)  Restricting the characters to the range of printable ASCII
characters (>= ' ' and <= \177) does not solve the original problem
that started this whole discussion.  If you recall, that discussion
dealt with giving filenames that had shell meta-characters to a
`system'.

2)  The UCB `ls' (as has been pointed out before) has an option (`-b'
if you're interested) so that an `od' of a directory is not necessary
to see unusual file names.  I have no idea what system you normally use.

3)  There are THREE illegal characters in 4.1BSD filenames.  Two have
been mentioned previously: '\0' and '/'.  '\257' is also illegal (this
is a slash with the eighth bit on).

4)  I have always been disgusted with operating systems that restrict
file names in unnecessary manners.  Unix is the ONLY operating system
that I have found that places no arbitrary restrictions on file names.
The three restricted characters are forbidden for very obvious reasons.
Anything else would be unnecessary.  I may agree with disallowing
characters with the eighth bit on, but all the other restrictions seem
totally arbitrary and unnecessary.  Unix has never taken the "protect
the moron" attitude before, let's not start it now!

5)  I'm getting sick and tired of seeing messages about file names.  We
have beaten the topic to death.  Shall we go on to something a little
more interesting?

				William LeFebvre
				ARPANet: phil.rice@Rand-Relay
				CSNet:   phil@rice
				USENet:  ...!lbl-csam!rice!phil

jbray@bbn-unix@sri-unix.UUCP (08/03/83)

From:  James Bray <jbray@bbn-unix>


    
    1)  Restricting the characters to the range of printable ASCII
    characters (>= ' ' and <= \177) does not solve the original problem
    that started this whole discussion.  If you recall, that discussion
    dealt with giving filenames that had shell meta-characters to a
    `system'.

I could care less about the origin of the discussion. I got involved because  a
user, specifically a user at the Network Operations Center, had this problem of
having --unknown to her-- her .profile blown away,  and  having  gotten  a  few
'\b's  into a filename before she realized this. This user is not a hacker, but
she is no "moron". I am sick and  tired  of  elitist  hackers  who  think  that
everyone  who  is  not  one of them is a "moron".  And I think we can all agree
that we have a certain vested interest in the proper operation of  the  Network
Operations Center...

    2)  The UCB `ls' (as has been pointed out before) has an option (`-b'
    if you're interested) so that an `od' of a directory is not necessary
    to see unusual file names.  I have no idea what system you normally use.
 
We support extended USG. In any case post-mortems do little for the patient.
    
    4)  I have always been disgusted with operating systems that restrict
    file names in unnecessary manners.  Unix is the ONLY operating system
    that I have found that places no arbitrary restrictions on file names.
    The three restricted characters are forbidden for very obvious reasons.
    Anything else would be unnecessary.  I may agree with disallowing
    characters with the eighth bit on, but all the other restrictions seem
    totally arbitrary and unnecessary.  Unix has never taken the "protect
    the moron" attitude before, let's not start it now!
    
Unix is getting taken seriously as an operating system, and not just a hacker's
toy.  Some changes may well be necessary to accomodate the real people who use
it. So far we have heard no one claim to actually want to put control
characters in filenames. A capability that no one wants to use but that can
cause trouble if anyone inadvertently takes advantage of it is a decided
misfeature. I would be most interested to hear from the Founding Fathers if
they allowed control characters by design, by omission, or by pdp11
address-space saving... I would also be most interested to hear from anyone
who thinks there is a legitimate use for control characters in filenames,
because if there is I will not disallow them. But we sell our Unix, we don't
just play with it, and I feel a certain responsibility to those users out there
who are much less experienced than our NOC controller, and who don't have a
handy hacker just four digits away...

    5)  I'm getting sick and tired of seeing messages about file names.  We
    have beaten the topic to death.  Shall we go on to something a little
    more interestinbeaver
Organization: University of Washington, Dept. of Computer Science
Contact: James Rees
Phone: (206) 545-0912
Postal-Address: FR-35, Seatt

guyton@rand-unix@sri-unix.UUCP (08/03/83)

James Bray asks:

        But please tell me, anyone, if you can think of
        a good use for unprintables in filenames.

I have three examples that come to mind:

	1) USC-ISI used control characters to fake version numbers
	   in their implementation of Interlisp.

	2) T[w]enex often puts control characters in filenames to make
	   them hard to delete (the archive directory used to be kept
	   in the user's directory with a bizarre filename).

	3) Our IBM folks put a blank and a backspace in the name of our
	   Wylbur keyword file (to try and make it harder for hackers
	   to try and crack it).

One point is clear from the above, Unix isn't the only OS that allows
bizarre characters in filenames.

-- Jim

ron@brl-bmd@sri-unix.UUCP (08/04/83)

From:      Ron Natalie <ron@brl-bmd>

I remember back in the days of early computer days when we got
our first HP timesharing system at school (I had gotten pretty
good at repairing card punches and wiring up 403 program boards).
Originally it was possible to use all kinds of characters in the
file names.  I always thought that "MAGIC[]" was a marvelous name
for a magic square program.  I was very disheartened when a new
release came out forbidding anything other than alphanumerics in
the file names.

-Ron

smh@mit-eddie.UUCP (Steven M. Haflich) (08/05/83)

     All this discussion about allowing/disallowing funny chars in
filenames has missed that sometimes (rarely) one really wants
a funny char in a filename:  for security.  For example,
system administrators at academic sites with limited disk
space can reduce the number of private copies of games by
protecting sources (and even executables) behind `untypable'
pathnames.  It doesn't work against real gurus, but it can
make a big difference.  Depending on the uniformity and
baudrate of the local terminals, a filename with a CURSOR_UP
at the end of a filename can be essentially *invisible* to
many forms of the ls command (alas, it was so on V7, but not
so on 4.1, etc.).
     The above may seem somewhat frivolous, but I could also imagine
similarly protecting a cron-invoked security demon (you know, one
of those things that searches for strange setuid files, etc)  behind
such protection.  Why make it easy?
     No, I don't use this method myself, but I have seen it used elsewhere.
				Steve Haflich, ...!genrad!mit-eddie!smh

ka@spanky.UUCP (08/06/83)

As far as I know, ed(1) is the only standard UN*X program which relies
on funny file names.  (It uses bel characters to generate unique file
names.)  Anybody out there still use ed?
				Kenneth Almquist

thomas@utah-gr.UUCP (Spencer W. Thomas) (08/06/83)

	"Twenex used to put funny characters in file names ..."
This is true, but in order to put a funny character into a filename,
it MUST be preceded by a ^V, the monitor will not accept it otherwise.
Not quite the same as the Unix situation.  'Course, you've got to quote
all sorts of characters over there, right off the top of my head, the
list includes []<>.:,@ as well as more esoteric characters.  (Actually,
one dot per filename may be left unquoted.)

=Spencer

greep@su-dsn@sri-unix.UUCP (08/06/83)

Re: your suggestion to enclose file names in single quotes, that works
as long as the file name has no single quote characters in it.  If it
does, they can be quoted with the double quote character.  For example,
a file named "' (i.e. a double quote character followed by a single
quote character) could be deleted with the command:
  rm '"'"'"

hamilton@uiucuxc.UUCP (08/07/83)

#R:sri-arpa:-352400:uiucuxc:5500065:000:126
uiucuxc!hamilton    Aug  5 21:31:00 1983

control characters are useful too; in my bin directory, i keep
clones of "clear" and "uptime" in files named ^L and ^T, resp.

ka@spanky.UUCP (08/10/83)

Re: Where ed(1) uses bell characters

Try looking at around line 2125 in the routine mkfunny:
	*p2 = '\007';	/* add unprintable char to make funny a unique name */

Kenneth Almquist

gwyn@brl-vld@sri-unix.UUCP (08/10/83)

From:      Doug Gwyn (VLD/VMB) <gwyn@brl-vld>

Gee, you're right.  Starting with System III (apparently) there is this
kludge to handle files with 1 link differently from files with multiple
links when the file is written out.  In the process they invent a
temporary file in the same directory as the real file, and a BEL
character is used to ensure that the temp name does not conflict with
any possible real filenames in the directory.  This does reinforce the
argument that "normal" user filenames do not have control characters.

I had thought you were talking about the /tmp files; my apologies.

emrath@uiuccsb.UUCP (08/16/83)

#R:mit-eddi:-54600:uiuccsb:14900002:000:695
uiuccsb!emrath    Aug 16 00:03:00 1983

I remember dec's DOS opsys for the pdp-11.
The open system call took 3 words of "rad50" (48 bits) as the file
name. However, all bit patterns except all 0s were valid as far as
the filing system calls were concerned. All 0s meant the directory
entry was free. Sure, U could write your own program to create weird
file names, but there also were system calls used for parsing file
specifiers. All the system programs, and most of the user stuff we
wrote, used the same parsing routine, so everybody stayed happy, most
of the time. The restrictions on file names were imposed by essentially
a library routine (though this happened to be part of the "kernel", god
forbid). Sounds all too familiar.