[net.unix-wizards] filename hoopla

mo%lbl-csam@sri-unix.UUCP (07/31/83)

From:  Mike O'Dell [system] <mo@lbl-csam>

(1) Dennis Ritchie has blessed the notion that meta-bit (parity bit)
characters in path components are evil and 4.1a and later systems
produce error returns from creat() when given them.  This solves
the usual problem of giving garbage to creat() and getting
a piece of core image treated as a filename.

(2) The only magic characters in filenames are NULL (ascii 0) and
/ (ascii 057).  NULL terminates strings, and / separates components.
I know of no other system with such a civilized and unobtrusive
notion of filename.

(3) Suggestions to return to the days of RAD-50 filenames (even
RAD-50 with lower case (RAD-76?)) should earn the suggestor instant,
public evisceration.  Even VMS is about to take that step into the
20th century (however, the list of magic characters is MUCH larger).

(4) Sharp tools insist upon craftsmen, so read the instructions
before using that entrenching tool.

	-Mike

guy@rlgvax.UUCP (08/02/83)

Just out of curiosity, what was considered the difference between characters
with the eighth bit on and characters which weren't printable ASCII (i.e.,
SP to '~')?  Anything more restrictive than that is more trouble than it's
worth, but is there great utility (other than amusement value) of files with
names like "Hi!\007\b\b\bBye!"?  Idiot-proof tools are only usable by the
people they're designed to be proof against, but most sharp tools have blade
guards, just to make sure....  It is, admittedly, harder to deal with files
with 8-bit characters in their names from the various UNIX shells than to
deal with files with control characters (the latter can be quoted, but not
the former), but should the way the various shells are implemented internally
govern a decision as to what file names are considered legal?  Also, the
problem of people hitting the left-arrow key on their terminal (assuming they
don't have a terminal which transmits BS for the left-arrow key) instead of
backspace can be dealt with via the "ctlecho" mode, although suggesting that
the TTY driver should echo control characters as "^<whatever>" would probably
provoke a holy war in itself...

	Guy Harris
	{seismo,mcnc,we13,brl-bmd,allegra}!rlgvax!guy

eric@cit-vax@sri-unix.UUCP (08/04/83)

You can always get rid of a filename  with junk in it by doing "rm -i *"
and answering 'y' at the right moment. Some  programs  (our  version  of
Zimmerman's emacs was one) set high order bits in file names exactly for
the reason that they don't want these files casually deleted.  One  user
level scheme for handling version numbers a la TOPS-20 or VMS used  high
order bits. So there are reasons for keeping the  capability.  I see  no
reason to impose unnecessary restrictions on filename characters, or  on
anything else. So what if once in a blue moon a novice user comes to you
and can't delete a file.

	       * Eric Holstege  Caltech, Pasadena, CA.
	       * eric@cit-vax   ...!{ucbvax!cithep,research}!citcsv!eric

phil.rice%rand-relay@sri-unix.UUCP (08/04/83)

From:  Bill.LeFebvre <phil.rice@rand-relay>

I know I said I didn't want to see more messages about filename
characters, but ...

     You can always get rid of a filename  with junk in it by doing "rm -i *"
     and answering 'y' at the right moment....

Unfortunately, this is not completely true.  With the c-shell and the
Bourne shell, the eighth bit of an argument always gets masked out
(unless there is some bizarre syntax that I am unaware of) so that the
command "rm -i *" will not work with files that have eight bit
characters.  When `rm' gets the argument, the eighth bit will be 0 and
it won't find the file.  But this trick will work for a file that has
control characters in the name (in fact, I have used it upon occasion).

                                William LeFebvre
                                ARPANet: phil.rice@Rand-Relay
                                CSNet:   phil@rice
                                USENet:  ...!lbl-csam!rice!phil

gwyn%brl-vld@sri-unix.UUCP (08/04/83)

From:      Doug Gwyn (VLD/VMB) <gwyn@brl-vld>

Ok, so type
	rm -ri .
instead of
	rm -i *
if you need to catch chars with bit 7 set.

JPAYNE@BBNG.ARPA (08/04/83)

Beware of creating files with the 200 bit set in v7 UNIX, and then changing
over to 2.81BSD.  You can't unlink the file!!!!  (Of course you can adb the
device to write a normal character on the 200 bit character ... which we did).

mark@laidbak.UUCP (08/05/83)

The Bourne shell mangles names with eight-bit characters by stripping
high-order bits. This is why "rm -i *" doesn't always work on garbage
filenames. Perhaps this (or the fact that ASCII is a seven-bit code)
is why USG disallows filenames containing eight-bit characters.

Assuming that only printable characters were allowed within filenames,
what about those characters which are special to the shell? For example,
I once created a file named "*". Imagine the consequences of a naive user's
attempts at removing this from a directory full of important files. What
about a dash ("-") at the beginning of a name? Just try to manipulate (or
remove) that with a program which recognizes options!

Also, let's not forget that the shell is just another user-level process
(as witnessed by users of csh, vsh, ...), and that each new shell defines
several additional special characters (such as "~" in csh).

Numerics are another special case. There are already several programs
(mkfs, icheck, dcheck and ncheck come to mind immediately) which
distinguish arguments from filenames by recognizing numeric characters.

Even such accepted punctuation as "_" and "." would have to go, since a
future shell might easily adopt them as special characters (don't laugh;
the designers of uucp's "...!..." syntax and the "C" shell's history
mechanism obviously weren't thinking of each other).

Personally, I dislike the idea of protecting users from themselves. The
only way to "cover all of the bases" would be to restrict qualifiers to
beginning with an alphabetic characters and containing nothing but alpha-
numerics. Many people would certainly find this solution unacceptable.

A proper solution probably lies in the tty handler, which now prevents
users from seeing things as they are. We must be wary of programs which
send escape sequences to terminals, as these frequenly contain unprintable
characters (beyond the escape itself). At least here we are dealing with
a single problem, rather than the untold number associated with filename
restrictions.

				Mark Brukhartz
				..!{ihnp4,allegra,trsvax}!laidbak!mark

sjh@csnet-purdue@sri-unix.UUCP (08/05/83)

From:  Steven J Holmes <sjh@csnet-purdue>

You cannot always use rm -i *. I have accidentally created a file named '-mab'
which causes an unkown option message from rm when rm -i * is tried.

Steve Holmes
sjh@purdue

SJOBRG.ANDY%MIT-OZ@mit-ml@sri-unix.UUCP (08/06/83)

golly, remember dsw(1)? well, a friend of mine has re-written it so that you
can say something like "dsw -f [ dirname ]" and it will only ask you about
the files in that directory that has "f"unny characters in it (control and
meta-bitted chars).

Anyone want a copy?

wartik@trwspp.UUCP (08/07/83)

In response to Guy Harris' article about the difference between
non-printables and non-ASCII's: Guy, it isn't correct to call
the non-ASCII's "characters", because they aren't part of the
standard character set, and therefore the shells aren't obligated
to treat them as such (this in turn leads to commands such as "dsw"...).
My personal view is that I would like to see Unix(tm) not accept
file names that contain non-ASCII's (most likely just by stripping
the eigth bit off -- the alternative is to add a new error number,
and I'd hate to have to change all existing software).  However,
I agree with you that there are uses for file names with control
characters and spaces.

				-- Steve Wartik
				decvax!trw-unix!trwspp!wartik

guy@rlgvax.UUCP (Guy Harris) (08/07/83)

The Berkeley version of "rm" (and of "mv") has an option "-" which causes the
scan for options to be terminated.  This way,

	rm -i - *

would cause the argument scanner in "rm" to stop before it gets to the arguments
that came from the expansion of the "*", so they wouldn't be checked to see
if they began with "-".  This should be put into other releases of UNIX.
(Are you listening, USG?)

While we're on the subject, how about putting the code in from Berkeley's
"mv" to permit directories to be moved around, and not just renamed?  Requiring
you to be super-user (so you can use the "/etc/mvdir" shell file) is sort of
a nuisance; the Berkeley "mv" does check to make sure the move is legal.
We're running the System III "rm" and "cp"/"mv"/"ln" here, with all the bug
fixes and enhancements from the 4.1BSD version folded in.  (The original
motivation for this was that the error messages from "ln" in certain cases were
less than informative; this has been corrected too.)

	Guy Harris
	{seismo,mcnc,we13,brl-bmd,allegra}!rlgvax!guy

gwyn@brl-vld@sri-unix.UUCP (08/07/83)

From:      Doug Gwyn (VLD/VMB) <gwyn@brl-vld>

Any program that uses getopt() to scan its command-line arguments will
be able to handle "--" as an "end of options" marker.

I really don't like the use of "-" for this, as that is conventionally
an indication that stdin is to be used where a file would normally be
named.  Things are confusing enough without doubling up on meanings.

israel@umcp-cs.UUCP (08/09/83)

'rm -i *' may not work for files beginning with '-', but
'rm -ri .' should work in those situations.
-- 

~~~ Bruce
...!seismo!umcp-cs!israel (Usenet)
israel.umcp-cs@Udel-Relay (Arpanet)

jdd@allegra.UUCP (08/09/83)

To get rid of file "-foo", one can always "rm ./-foo".

Cheers,
John ("D.") DeTreville
Bell Labs, Murray Hill

lund@ucla-ats@sri-unix.UUCP (08/09/83)

From:            Laurence G. Lundblade <lund@ucla-ats>

If rm fails to get rid of a file regardless of the options you 
can always do find the inode number with ls -i and then 
do a "find -inum xxx -exec rm {} \;".

		.....Larry

SJOBRG.ANDY%MIT-OZ@mit-ml@sri-unix.UUCP (08/12/83)

This message is empty.

pdl@root44.UUCP (08/12/83)

But what's wrong with "rm -ri ." ???

		From the dungeons of the overworked keyboard of the warlock
		(or somesuch gibberish)

		Dave Lukes (...!vax135!ukc!root44!pdl)

SJOBRG.ANDY%MIT-OZ@mit-mc@sri-unix.UUCP (08/12/83)

but find (v7) uses sh to execute its command, so you luse
the 8th bit once more!
sorry...
	-andy
p.s. your find will also remove any files linked to the one you
	really want to remove...
-------

wartik@trwspp.UUCP (08/14/83)

"rm" (on 4.1BSD) has a special option, "-", to allow removing file names
beginning with a dash.  I.e.,

	rm - -mab

					-- Steve