[net.unix] Funny filenames: in the nick of time!

jwp@uwmacc.UUCP (jeffrey w percival) (06/17/85)

Thanks for all the replies; I'll summarize here and add a wrinkle.
Basically, if you have file names with the eighth bit set, then nothing
except a clri followed by a fsck will get rid of it.  In 2.8BSD, none
of the various versions of "rm -i", "rm -f", "rm *", "rm -r", etc. work.

I learned this just in time, though.  My original posting was prompted
by the existence of 2 or 3 funny files that had been hanging around for
months.  Well, no sooner did I learn how to get rid of them, when a user
decided she wanted to run "split" on her mail box, and she wanted it
split into 2 pieces, so she typed "split -2 mbox".  Her mbox was 6000
lines long.  split started out with "xaa", went through to "xzz", then
proceeded with x{a, x|a, x}a, x~a, and so on off the ascii sequence, into
8-bit integers, finally stopping after creating over 2 *thousand* files
with the 8th bit set in file names.  Shell script time.  The only
problem with "ls -i > foo" to generate a command script to do a lot
of clri's is that on a '70, ls runs out of memory at around 470 files.
So we made a shell script to do 470 clri's, then a fsck -y, reboot,
new shell script, fsck -y, reboot, and so on, and so on, and so on...

Needless to say, we fixed split to stop when it ran out of legal file
names.  My only remaining question:  why in the world does UNIX allow
file names to be created that cannot be referred to?  Some of unlink's
fancy protection should be copied over to whatever creates directory
entries.
-- 
	Jeff Percival ...!uwvax!uwmacc!jwp

guy@sun.uucp (Guy Harris) (06/20/85)

> Thanks for all the replies; I'll summarize here and add a wrinkle.
> Basically, if you have file names with the eighth bit set, then nothing
> except a clri followed by a fsck will get rid of it.  In 2.8BSD, none
> of the various versions of "rm -i", "rm -f", "rm *", "rm -r", etc. work.
> ... My only remaining question:  why in the world does UNIX allow
> file names to be created that cannot be referred to?  Some of unlink's
> fancy protection should be copied over to whatever creates directory
> entries.

The UNIX kernel doesn't allow file names to be created that cannot be
referred to.  Except in 4.2BSD, however, it *does* allow file names to be
created that cannot be conveniently referred to using any of the current
UNIX shells.  Both the Bourne and C shells (and, I think, the V6 shell; I
don't know about the Korn shell) use the eighth bit internally as a quoting
indicator, and strip it off before passing arguments to commands.  As such,
the only way to access those files is to get the pathname into a program
using something other than the shell.

There may be a problem in 2.9BSD that permits you to create files with
characters in their names with their eighth bit on, but not to delete them.
V7, 4.1BSD, System III, and System V permit you to create them and delete
them.

If a 8-bit character set is chosen for UNIX's use in international
environments, somebody's going to have to fix the shells (and rip the code
out of 4.2BSD that disallows characters with their eighth bit on in file
names).

	Guy Harris

alan@drivax.UUCP (Alan Fargusson) (06/25/85)

>         My only remaining question:  why in the world does UNIX allow
> file names to be created that cannot be referred to?  Some of unlink's
> fancy protection should be copied over to whatever creates directory
> entries.
> -- 
> 	Jeff Percival ...!uwvax!uwmacc!jwp

System V release 2 version 2 (or something like that) will not let you
create files that have the most significant bit set in any character
within the name.
-- 

Alan Fargusson.

{ ihnp4, amdahl, mot }!drivax!alan

mrl@drutx.UUCP (LongoMR) (06/28/85)

>> Thanks for all the replies; I'll summarize here and add a wrinkle.
>> Basically, if you have file names with the eighth bit set, then nothing
>> except a clri followed by a fsck will get rid of it. 

	not true. I originally replied with this one by mail, but wasn't 
too specific. You can edit the directory entry with fsdb, if you have it.
Here's the procedure:
	+ as root, envoke fsdb with the block device name of the file system
	  as an argument
	+ print the directory inode containing the bad file name
	  (ex. if the file name is /usr/bin/<garbage> and the inode number
	  of directory /usr/bin is 500, type "500i" to see the inode)
	+ look at the directory entries 1 at a time until you find
	  the bad file.
	  "f0d" will list the 0th block in directory format
	  "f1d" will list the next block, etc. 
	  Each entry will be proceded with a d[number].
	+ change the name of the file by typing
	     d[number].name=new_file_name
		(ex. d3.name=XXX1)
	+ exit fsdb with a ^D or quit
	+ remove /usr/bin/new_file_name
	     (in the above example use "rm /usr/bin/XXX1")

I know this works on AT&T Unix since I have been using it for years. I am
sorry that I can't say I am as certain that it works on Berkeley Unix. I
would be curious to know whether or not it does.

BTW - It is possible to create file names which cannot be
refered to by the shell in AT&T Unix. It is usually accomplished
by accident when garbage is given to a file descriptor
name within a c program.  If the eighth bit is set
and you use any command to the shell, the shell will expand the
name to a string containing only 7 bit ascii characters. When
this name is fed to the command, the kernal will recognize the eighth
bit and come up with a non-match. The usual sequence is
  # od -c .
  xxxxxxxx  ?  > 307   p  \0  \0  \0  \0  \0...
  xxxxxxxx  3  5   f   i   l   e   1  \0  \0...
			.
			.	
			.
  # rm -i *
  p: not found		(the file name is not "p", but that is what
  file1:		 the shell passed to the command. The garbage
  file2:      		 isn't printed)
    .
    .
    .
  filen:

		Mark Longo	AT&T ISL DENVER
		(..!ihnp4!drutx!mrl)

<clever saying follows>
-----------------
	Don't worry whether I'm right or wrong. Someone will
	correct me either way!

guy@sun.uucp (Guy Harris) (07/02/85)

> System V release 2 version 2 (or something like that) will not let you
> create files that have the most significant bit set in any character
> within the name.

Hmmm... AT&T picks up one of the more debatable 4.2BSD features.  At some
point, presumably, UNIX will support an 8-bit character set (in order to
include characters in non-English-language alphabets without stealing the
positions that rightfully belong to things like '{').  When that happens,
*somebody's* going to have to fix "sh" (and "csh", and...) not to use the
MSB as a quoting bit, and the shell will finally allow you to talk about
those files.  Somebody's also going to have to remove that restriction from
the kernel...

	Guy Harris