[comp.unix.wizards] Kernel Hacks & Weird Filenames

FRAZIER%AFGLSC.SPAN@star.stanford.edu (04/20/88)

From Scott Preece:
-------------------------------------------------
>> I'm not saying I support the idea of prohibiting filenames with embedded
>> special characters (my immediate question is "Special to whom?" -- what
>> if those "non-printable" characters are codes for icons or Kanji or
>> something we just haven't though about yet), but if you DO ban them it
>> sounds to me like exactly the kind of thing you DO want the kernel to do:
>> it's a policy decision that you want to be able to guarantee is
>> enforced everywhere.  Return EINVAL.

>> scott preece
>> gould/csd - urbana
>> uucp:	ihnp4!uiucdcs!ccvaxa!preece
>> arpa:	preece@Gould.com
-------------------------------------------------

      This is a very valid point.  To hack the kernel to prohibit
special characters in filenames would create more headaches than
it's worth.  Especially with the new wave of Unix workstations that
are quickly becoming application machines.  I think it's reasonable
to assume that this by and large is not so great a problem.  Even
if a person creates a file starting with a "-",  the root can purge
the file. (Albeit it's somewhat of a bother but the process could be hacked
into either script or program.   
      Another point.  If someone decides that they want the special characters,
they have to go hack the kernel to remove the prohibition.  This may not be
a problem for some of the more experienced Unix people,  but a great many
others would not attempt such a task and would run the risk of screwing up
his kernel if s/he did.  For this reason,  I would suggest hacking sh, csh,
and ksh to reflect this change.  (The shell is much easier to hack,  would
you agree.)  There isn't really any need to increase kernel overhead as it is.
The kernel is suffering from Over-Hacking, or at leat appears to be.  Every
time someone wants to make a system-wide change,  "Hey,  let's hack the kernel
so we won't have to change all thos programs!"   ;-)

-Frazier
Frazier%afgl.span@star.stanford.edu

guy@gorodish.Sun.COM (Guy Harris) (04/21/88)

> For this reason,  I would suggest hacking sh, csh, and ksh to reflect this
> change.

How would you suggest doing this?  Except in I/O redirection, none of the
aforementioned shells have any idea what is or isn't a file name, except for
arguments using wildcards, ~ expansion, etc..  Are you going to forbid
non-printable characters in *all* arguments?  OK, now how do I write an "echo"
command that sends a BEL (^G) character?

Furthermore, what happens if a program goes bonkers and creates a filename
containing such characters?  If the shell won't let you type an "rm" command
with such a filename as argument, how can you get rid of such a file?

More and more UNIX systems running as application machines won't have typical
users running *any* of the above shells; they'll be in some full-screen or
graphical user interface.  In *those* user interfaces, typing ^A in a file-name
field is more likely to send you to the beginning of that field than to insert
a ^A into that field.

les@chinet.UUCP (Leslie Mikesell) (04/21/88)

In article <13041@brl-adm.ARPA> FRAZIER%AFGLSC.SPAN@star.stanford.edu writes:
>From Scott Preece:
>-------------------------------------------------
>>> I'm not saying I support the idea of prohibiting filenames with embedded
>>> special characters (my immediate question is "Special to whom?" -- what
>
>      This is a very valid point.  To hack the kernel to prohibit
>special characters in filenames would create more headaches than
>...
>For this reason,  I would suggest hacking sh, csh,
>and ksh to reflect this change. 


Great.. Applications can then create files that the standard tools can't
touch.  Did you ever have a filename with an imbedded null?  Oh, is null
a special character...??

The most common problem that I have seen is with applications that use
function keys to respond to menu choices to get to the place where you
would type in a filename to create.  If the response is a bit slow, the
user hits the key again so the function key output becomes part of
the name (ESC-something-something). If they are paying attention, they
might backspace and correct the visible characters, but the ESC is likely
to remain, and unix happily creates the file.  Then they wonder (a) why
they can't access that file again, (b) why the columns don't line up
in a directory listing, and sometimes (c) why some of their other files
don't show up in a directory listing.  I realize that this kind of
behaviour guarantees that I will have a job for a while, but otherwise
it is pretty silly to allow non-printable characters in a filename.
Keeping the restriction out of the kernal would mean that every application
would have to duplicate the code, and if one didn't, nothing else would
touch the file.  Actually, the restriction should only be against creating
files with odd characters; there would be no reason to do a special check
when trying to open an existing file.

  Les Mikesell

chris@mimsy.UUCP (Chris Torek) (04/21/88)

In article <4895@chinet.UUCP> les@chinet.UUCP (Leslie Mikesell) writes:
>it is pretty silly to allow non-printable characters in a filename.

This statement can be true, but note that it makes two assumptions:
first, that the file names are to be printed; second, that there are
non-printable characters.  Are these assumptions true?  Let us take
them in reverse.

First, `non-printable characters'.  Well, there are certainly numerous
characters that cannot be printed on the terminal I am using at the
moment (namely my H19).  But this is not precisely the same set as are
non-printable on other displays.  One notable exception is a display
that implements ISO Latin 1; another is a Japanese terminal that
displays Kanji.  One could make the set of allowed characters
terminal-dependent.  Somehow that sounds like an IBM solution.

Second: are file names to be printed?  Certainly most are.  But there
are some that are not---for instance, the lock files used by this very
network news system are formed by putting `L' in front of the message
ID of each article; to lock the quoted article inews creates the file
`/tmp/L<4895@chinet.UUCP>', which is by no means a convenient handle.
A database system might lock records by creating temporary files formed
by converting the record index to a radix-254 name ('/' and '\0' are
taken) (use radix-126 on 4.2/4.3BSD, unless you remove the kernel
restriction on valid ISO Latin 1 characters first; be sure to prefix
or affix some character to avoid clashes with `.' and `..').

So should the Unix kernel make the (relatively) irrevocable decision
to disallow locally-non-printing characters?  Maybe---but I doubt it.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

decot@hpisod2.HP.COM (Dave Decot) (04/22/88)

Several of you have expressed the objection "What if I REALLY wanted to
create filenames with funny characters?  The system won't let me do what
I want!  CENSORSHIP!!"  Well, maybe not quite like that... :-)  More like...

> So should the Unix kernel make the (relatively) irrevocable decision
> to disallow locally-non-printing characters?  Maybe---but I doubt it.

However, note that I originally suggested that it be configurable
(possibly per-filesystem) and that the default should be "no-funnies":

> ->Why don't we bite the bullet and change our kernels to refuse
> ->to create files whose names begin with a hyphen or contain
> ->non-printing characters (unless special arrangements are made
> ->by the user to permit it)?

> First, `non-printable characters'.  Well, there are certainly numerous
> characters that cannot be printed on the terminal I am using at the
> moment (namely my H19).  But this is not precisely the same set as are
> non-printable on other displays.  One notable exception is a display
> that implements ISO Latin 1; another is a Japanese terminal that
> displays Kanji.  One could make the set of allowed characters
> terminal-dependent.  Somehow that sounds like an IBM solution.

There are already multibyte character sets (e.g., Taiwanese) for which the
representations of some characters contain an ASCII '/' as the second
byte.  Some sort of kernel hack configurable for different languages
is necessary already.  Different terminals is another matter; I don't
think it should depend on the terminals, only the information about
the character set being used (such as what characters are letters,
numeric, punctuation, printable, etc.).

I would think that most UNIX users would want their filenames to be portable.

> Second: are file names to be printed?  Certainly most are.  But there
> are some that are not---for instance, the lock files used by this very
> network news system are formed by putting `L' in front of the message
> ID of each article; to lock the quoted article inews creates the file
> `/tmp/L<4895@chinet.UUCP>', which is by no means a convenient handle.

Certainly isn't.  There are now much better ways to do file locking
than with those blasted lock files.

> A database system might lock records by creating temporary files formed
> by converting the record index to a radix-254 name ('/' and '\0' are
> taken) (use radix-126 on 4.2/4.3BSD, unless you remove the kernel
> restriction on valid ISO Latin 1 characters first; be sure to prefix
> or affix some character to avoid clashes with `.' and `..').

And with `-' !  This seems like a somewhat strange way to name temporary
files, but if you want to do that, what's wrong with radix 93
(i.e., only printable characters without - and / and probably space)?

Dave Decot
hpda!decot

les@chinet.UUCP (Leslie Mikesell) (04/22/88)

In article <11153@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>>it is pretty silly to allow non-printable characters in a filename.
>
>This statement can be true, but note that it makes two assumptions:
>first, that the file names are to be printed; second, that there are
>non-printable characters.  Are these assumptions true?  Let us take
>them in reverse.
>
>First, `non-printable characters'.  Well, there are certainly numerous
>characters that cannot be printed on the terminal I am using at the
>moment (namely my H19).  But this is not precisely the same set as are
>non-printable on other displays.  One notable exception is a display

How many terminals do anything reasonable with ESC in random places? 

>
>Second: are file names to be printed?  Certainly most are.  But there
>are some that are not---for instance, the lock files used by this very
>network news system are formed by putting `L' in front of the message
>ID of each article; to lock the quoted article inews creates the file

Would you enjoy debugging the operation of said locks if displaying
the filenames performed random cursor motions or cleared the screen
before you could see them.  How about when you print a listing of the
files on a backup set and have a few hundred page-ejects imbedded?
Or is this "feature" to be used as a form of security?

>So should the Unix kernel make the (relatively) irrevocable decision
>to disallow locally-non-printing characters?  Maybe---but I doubt it.

It just adds another reason to place some sort of silly user agent
between the user and the system.  Besides, this is something that
should be standardized to whatever extent possible to avoid trouble
with networked files systems.

  Les Mikesell 

guy@gorodish.Sun.COM (Guy Harris) (04/23/88)

> There are already multibyte character sets (e.g., Taiwanese) for which the
> representations of some characters contain an ASCII '/' as the second
> byte.  Some sort of kernel hack configurable for different languages
> is necessary already.

Assuming you use those character sets.  Are there no plans for an EUC character
set for Chinese?  (I have heard that AT&T's EUC scheme is conformant to some
sort of ISO standard.)  Such a character set would use only bytes with the 8th
bit set as bytes in such a two-character sequence.

bzs@bu-cs.BU.EDU (Barry Shein) (04/23/88)

Many moons ago I wrote a program called "rmf" that went through a
directory and looked for files with "funny" names and prompted the
user to either change the name, remove the file or leave it as is.

The criteria I used were just a bunch of heuristics that I would
accept suggestions on from the user community, funny chars, blanks,
very long (tolerance was settable from the command line or defaulted
to something like 32 chars), stuff like that.

Anyhow, it's a simple exercise, and it might be a good answer to what
seems to be motivating this conversation (I can't find a copy of the
program right now.) No, you're not going to write a set of criteria to
universally satisfy everyone, it's also version dependant, but you can
come quite close to perfect for your local systems without too much
effort.

	-Barry Shein, Boston University

wcs@skep2.ATT.COM (Bill.Stewart.<ho95c>) (04/25/88)

:>From Scott Preece:
:>-------------------------------------------------
:>>> I'm not saying I support the idea of prohibiting filenames with embedded
:>>> special characters (my immediate question is "Special to whom?" -- what
:>      This is a very valid point.  To hack the kernel to prohibit
:>special characters in filenames would create more headaches than
:>...
:>For this reason,  I would suggest hacking sh, csh,
:>and ksh to reflect this change. 

	Arrgh!  Korn et al. went to a lot of work to get *rid* of silly
	restrictions on filenames in ksh (8th-bit trashing in particular)
	specifically so you could do foreign characters in filenames.
	In the process, this gets rid of most objections to them - the
	main reason you couldn't rm \366\322\333 was because the shell
	trashed your command to  rm \166\122\133 , which of course rm
	couldn't find.  There are still a few insidious names left
	(such as names with nulls in them), but a lot fewer.
-- 
#				Thanks;
# Bill Stewart, AT&T Bell Labs 2G218, Holmdel NJ 1-201-949-0705 ihnp4!ho95c!wcs
# skep2 is a local machine I'm trying to turn into a server.  Please send
# mail to ho95c or ho95e instead.  Thanks.

chris@mimsy.UUCP (Chris Torek) (04/25/88)

>In article <11153@mimsy.UUCP> I wrote:
>>First, `non-printable characters'.  Well, there are certainly numerous
>>characters that cannot be printed on the terminal I am using at the
>>moment (namely my H19).  But this is not precisely the same set as are
>>non-printable on other displays.

In article <4911@chinet.UUCP> les@chinet.UUCP (Leslie Mikesell) writes:
>How many terminals do anything reasonable with ESC in random places? 

Some of them print `EC' (in one character space, raised E, lowered C).
On those, ESC is a printing character, and if you want to use it in a
file name, that is fine with me.

>>Second: are file names to be printed?  Certainly most are.  But there
>>are some that are not---for instance, the lock files used by ....

>Would you enjoy debugging the operation of said locks if displaying
>the filenames performed random cursor motions or cleared the screen
>before you could see them.

But it does not.  (`ls' prints `?' for control characters; `ls|cat -v'
expands them; other programs have other means of displaying them.)

>It just adds another reason to place some sort of silly user agent
>between the user and the system.

There is *always* a user agent (often more than one) between the
user and the system.  I do not know what you mean here.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

les@chinet.UUCP (Leslie Mikesell) (04/26/88)

In article <11204@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>>In article <11153@mimsy.UUCP> I wrote:
>>>First, `non-printable characters'.  Well, there are certainly numerous
>
>But it does not.  (`ls' prints `?' for control characters; `ls|cat -v'
>expands them; other programs have other means of displaying them.)

What? How can ls|cat -v display control characters if ls changes them?
Perhaps someone has fixed your utilities and that is the reason this
problem doesn't bother you.  On out-of-the-box SysVr3, if I create
a file named zESC[2J, ls will happily clear a vt100-ish screen before
you can read any of the other filenames.  I added some other ESC sequences
and a few form-feeds and sent the the listing to a couple of different printers
with the expected bizarre results.  A cpio -it listing does likewise (as
it must if the names are to be of any use).  I didn't feel up to dealing
with the problems that imbedding a ctl-S in a name would cause (left as
an exercise for the reader...)

I did know about ls|cat -v (or od -x if you believe that form-feeds are
control characters).  Probably everyone who has used unix more than a day or
two knows about this. My point is that there are more productive things to do.

>
>>It just adds another reason to place some sort of silly user agent
>>between the user and the system.
>There is *always* a user agent (often more than one) between the
>user and the system.  I do not know what you mean here.

Have you ever seen a 3B1?  I mean the sort of thing where the user agent
"knows" when you are typing a filename, generally by associating the
file with the application that uses it. And I call it silly because it
precludes the concept of software tools.

lrj@batcomputer.tn.cornell.edu (Lewis R. Jansen) (04/26/88)

In article <4965@chinet.UUCP> les@chinet.UUCP (Leslie Mikesell) writes:
}In article <11204@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
}}}In article <11153@mimsy.UUCP> (Chris Torek) wrote:
}}}}First, `non-printable characters'.  Well, there are certainly numerous
}}
}}But it does not.  (`ls' prints `?' for control characters; `ls|cat -v'
}}expands them; other programs have other means of displaying them.)
}
}What? How can ls|cat -v display control characters if ls changes them?

	  The ls command checks to see if its output is going to
	a terminal or to some other place.  If it's writing to a
	terminal, ls changes the various unprintable characters
	to be ``?'', otherwise it leaves them alone.

	  This is on SunOS 3.4, and probably evolved from 4BSD.

-- 
				-- Lewis R. Jansen, LASSP Systems Grunt
					lrj@helios.tn.cornell.edu
					  ...!cornell!lassp!lrj
	    "You can't fight in here, this is the War Room!"

chris@mimsy.UUCP (Chris Torek) (04/27/88)

>In article <11204@mimsy.UUCP> I wrote:
>>(`ls' prints `?' for control characters; `ls|cat -v'
>>expands them; other programs have other means of displaying them.)

In article <4965@chinet.UUCP> les@chinet.UUCP (Leslie Mikesell) writes:
>What? How can ls|cat -v display control characters if ls changes them?
>Perhaps someone has fixed your utilities and that is the reason this
>problem doesn't bother you.  On out-of-the-box SysVr3 ....

Well, there is the problem!  You are using the Other Leading Brand! :-)
4BSD ls, whatever its faults, does this magic translation, but only
if isatty(1), which is sometimes bizarre, and is why ls|... works.

>>There is *always* a user agent (often more than one) between the
>>user and the system.  I do not know what you mean here.

>I mean the sort of thing where the user agent "knows" when you are
>typing a filename, generally by associating the file with the application
>that uses it.  And I call it silly because it precludes the concept of
>software tools.

That sounds like what is usually called a `naive user interface'.
If you have naive users, you give them one of these interfaces, and
they happily sit inside their naive little sub-world doing restricted
operations.  As the user wants to do more, you open the interface
wider and let them see all the horrors :-) of the real implementation
---in this case, including unusual characters in file names.

As a general rule, the closer you get to the real implementation,
the more you should be able to do.  I think that includes funny
characters in file names.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

les@chinet.UUCP (Leslie Mikesell) (04/29/88)

In article <11238@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>>>(`ls' prints `?' for control characters; `ls|cat -v'
>
>Well, there is the problem!  You are using the Other Leading Brand! :-)
>4BSD ls, whatever its faults, does this magic translation, but only

Ok, I'll take that to mean that you agree that there is a problem
with displaying possibly random control characters that might be in
filenames, and that it needs to be fixed somewhere.  That leaves the
question of where...
 a) all user programs under all conditions.
 b) some user programs under all conditions.
 c) some user programs under some conditions.
 d) the kernel.
It is obviously too late for (a) unless open() and creat() become library
routines and everything is re-compiled, (b) would create a situation where
some programs could not access some files. (c) apparently "works for
you" but I don't agree that it is a good solution.  What if my "terminal"
is actually another computer that is logged on and executing a script?
If I want filenames that I can actually use to access files, I would
have to execute ls|cat.  Isn't that more than a little obtuse?  This is
much more likely for me than a situation where I would want to draw
smiley-faces with a certain terminal's character set in my filenames.

I am not talking about high-bit characters here, just the ones that have
an ascii-defined meaning of something other than a printable character.
As a practical matter, I avoid trouble with high-bits by setting everything
to ignore parity.  There is no such simple solution to control characters
since they do have defined and useful purposes.

 Les Mikesell 

jc@minya.UUCP (John Chambers) (05/01/88)

> 				...  I realize that this kind of
> behaviour guarantees that I will have a job for a while, but otherwise
> it is pretty silly to allow non-printable characters in a filename.

The problem with this argument is:  Just what is a printable character?

You might have a clear idea while sitting at your ANSI terminal, but try
imagining me sitting at my terminal with lots of special characters for
doing all those funny diacriticals they use over in Europe, and maybe
also a Greek or Katakana character set.  These are added to the usual
ASCII by using the 128 unused codes starting at 0x80.  Or maybe I
have one of the new NLS terminals that put out 2-byte codes for some
characters. 

While you are debating decreasing the usable character set from 7 bits
to 6.7 bits, others (at such insignificant companies as IBM, ATT, etc.)
are working with "8-bit clean" versions of Unix that allow free use of
any 8-bit characters in file names (with '/' and null being special, but
NO others).

You may be part of the "English-only" crowd, but there are lots of us
who aren't, and we badly need those extra character codes.  The fact
that you can't type them on your silly ANSI terminal is of no concern
to us.







-- 
John Chambers <{adelie,ima,maynard,mit-eddie}!minya!{jc,root}> (617/484-6393)

jc@minya.UUCP (John Chambers) (05/01/88)

| it is pretty silly to allow non-printable characters in a filename.

Silly, perhaps, but also sometimes necessary.

I am reminded of a project a couple years back, that consisted of 
writing code to be downloaded onto a single-board computer (which
shall mercifully remain nameless).  The board came with a tiny
ROM monitor of a common sort:  It had two serial ports, into which
you were supposed to plug cables leading to a terminal and to a
computer.  You could enter the usual sort of debugging commands
from the terminal, to display or alter memory, set breakpoints,
start running from an address, and so on.

There was also a download command.  You typed something like
	DL <command>
and the <command> would be sent out the other port to the computer,
which would do something interesting with it.  If it was a Unix
system, you would type something like:
	DL cat foo.obj
where foo.obj was a file full of the appropriately-encoded data
records.

The fun thing with the DL command was that, before sending the
<command>, it sent a ^X (CTRL-X) character.  So if you typed the
above 'cat' command, what Unix saw was:
	^Xcat foo.obj

Ignore the silliness of this strange behavior; we just had to live
with it.  How we handled it was very straightforward.  We had a
download shell script whose name was "^X".  We could then just
type
	DL  foo.obj
and the script would be downloaded.  (Note the TWO spaces; you can
probably imagine how often we typed only one :-).

Without the ability to include a ^X in the filename, we would have
had to write a special program to read the input and strip out the
garbage character.  Not a big deal, perhaps, but then, every little
barrier put in your way just makes the job harder.  We appreciated
the fact that Unix wasn't fazed by a funny character in the file
name.

The world has lots of systems that can't handle simple things like
filenames with weird characters.  One of Unix's strengths is that
it generally doesn't impose silly restrictions.  Let's keep it that
way.

-- 
John Chambers <{adelie,ima,maynard,mit-eddie}!minya!{jc,root}> (617/484-6393)

les@chinet.UUCP (Leslie Mikesell) (05/02/88)

In article <574@minya.UUCP> jc@minya.UUCP (John Chambers) writes:
>> it is pretty silly to allow non-printable characters in a filename.
>
>The problem with this argument is:  Just what is a printable character?
>....These are added to the usual
>ASCII by using the 128 unused codes starting at 0x80.

I should have been more specific.  Characters above the ASCII defined
range don't concern me.  I can avoid ever accidently putting one into
a filename by setting all my equipment to ignore parity.  What does
bother me is allowing characters that have an ASCII-defined meaning
other than a printable character to be in filenames.  This means the
values below a space which includes all sorts of device controls that
I would prefer not to happen accidently.

> Or maybe I
>have one of the new NLS terminals that put out 2-byte codes for some
>characters. 

Great.. Do you expect the kernal (and everything else that has fixed
length buffers) to magically accomodate the extra characters
transparently?

>You may be part of the "English-only" crowd, but there are lots of us
>who aren't, and we badly need those extra character codes.  The fact
>that you can't type them on your silly ANSI terminal is of no concern
>to us.
>
Do your devices not require device control characters (carriage-return,
line-feed, form-feed, flow control, pad control, and the like?  Do you
like having them in filenames?

  Les Mikesell

naim@eecs.nwu.edu (Naim Abdullah) (05/02/88)

As the person who started this unprintable filenames discussion
by asking why BSD disallows certain file names, let me say
what my *fundamental* objection to this business of "protecting"
the user is.

I think the most wonderful thing about UNIX is it's generality
and elegance. Sadly, people who don't appreciate that, add
gratuitous options to cat(1) and ls(1), and now (horrors!)
make proposals that unprintable filenames be disallowed.

To understand the UNIX philosophy, we must go back to the original
source. The following is from the paper, "UNIX Implementation"
by Ken Thompson:

>The kernel is the only UNIX code that cannot be substituted
>by a user to his own liking. For this reason, the kernel
>should make as few real decisions as possible. This does not 
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Important point!
>mean to allow the user a million options to do the same thing.
>Rather, it means to allow only one way to do one thing, but have
>that way be the least-common divisor of all the options that
>might have been provided.
>
>What is or is not implemented in the kernel represents both
>a great responsibility and a great power. It is a soap-box
>platform on "the way things should be done." Even so, if
>"the way" is too radical, no one will follow it. Every
>important decision was weighed carefully. Throughout, simplicity
>has been substitued for efficiency. Complex algorithms are used
>only if their complexity can be localized.
>

I think the proposal that unprintable filenames be disallowed by the
kernel follows from a VMS/IBM mentality. Sorry to be so harsh, but
it totally goes against the whole spirit of UNIX.

If you want to protect naive users from unprintable filenames, then
by all means write a special shell or relink your utilities with
a special version of the C library, but don't put that decision
in the kernel.

The kernel should allow very general things. It is all the crap
above it, that should "protect" the user. This is because, you
and I differ whether this "protection" is needed or not. You
can have all the "protection" you want, but don't force it
on me!

This is why I am disappointed that 4.3BSD disallows filenames
with the high bit set. I am glad that SunOS 4.0 lifts this
restriction. Guy mentioned that the reason for this is to support
other languages. I think the reason should have been simply because
of generality/elegance or because it is The Right Thing.

Does anybody know whether V9 or 4.3-tahoe allow arbitrary bytes
in filename (excepting ofcourse, '\0' and '/') ?

		      Naim Abdullah
		      Dept. of EECS,
		      Northwestern University

		      Internet: naim@eecs.nwu.edu
		      Uucp: {ihnp4, chinet, gargoyle}!nucsrl!naim

les@chinet.UUCP (Leslie Mikesell) (05/02/88)

In article <575@minya.UUCP> jc@minya.UUCP (John Chambers) writes:
>| it is pretty silly to allow non-printable characters in a filename.
>Silly, perhaps, but also sometimes necessary.

Not this time..
>...
>There was also a download command.  You typed something like
>	DL <command>
>and the <command> would be sent out the other port to the computer,
>....
>The fun thing with the DL command was that, before sending the
><command>, it sent a ^X (CTRL-X) character.  So if you typed the
>above 'cat' command, what Unix saw was:
>	^Xcat foo.obj
>
>Ignore the silliness of this strange behavior; we just had to live
>with it.

Given that the ASCII definition for control-X is CAN (as in cancel-line)
the purpose of the character was most likely to erase any garbage that
might have previously appeared on the port.  It is unusual to have such
a character forced into a stream, but in this contex it is not unreasonable
or entirely silly.  If your .profile | .login contained "stty kill '^x'
(as mine does), the CAN would have served its (probably) intended purpose.

>How we handled it was very straightforward.  We had a
>download shell script whose name was "^X".  We could then just
>type
>	DL  foo.obj
>and the script would be downloaded.  (Note the TWO spaces; you can
>probably imagine how often we typed only one :-).
>

Confusing device controls with filenames does not sound straighforward
to me.
>Without the ability to include a ^X in the filename, we would have
>had to write a special program to read the input and strip out the
>garbage character.  Not a big deal, perhaps, but then, every little
>barrier put in your way just makes the job harder.

But suppose that the job was passed to someone who uses CAN for its
defined purpose.  How long will it take him to figure out the the
^X file exists and how to invoke it?

Even if you don't want to let the tty drivers do their job, it is not all
that difficult to deal with an unwanted character.  If the error message
doesn't bother you or you can send stderr to /dev/null, you could type
DL ;cat foo.obj (no harder to remember than 2 spaces).  Or, just
execute:
tr -d '\030' |sh

>The world has lots of systems that can't handle simple things like
>filenames with weird characters.  One of Unix's strengths is that
>it generally doesn't impose silly restrictions.  Let's keep it that
>way.

As long as people are more interested in magic than in standards, it
will stay that way, and it will require a wizard to manipulate it.
Perhaps ASCII is a bit too provincial a standard.  What else is there?

  Les Mikesell

decot@hpisod2.HP.COM (Dave Decot) (05/04/88)

> I think the most wonderful thing about UNIX is it's generality
> and elegance. Sadly, people who don't appreciate that, add
> gratuitous options to cat(1) and ls(1), and now (horrors!)
> make proposals that unprintable filenames be disallowed.

I suppose I am the one who first proposed this here (at least recently).  
If you read it, you saw that what I proposed is that it be POSSIBLE TO ASK
THE KERNEL to prevent a user-specified set of characters from being allowed,
*not* that the kernel be hardwired to make this decision once and for all.

> To understand the UNIX philosophy, we must go back to the original
> source. The following is from the paper, "UNIX Implementation"
> by Ken Thompson:

I am familiar with this paper and agree substantially with what it says.

> >The kernel is the only UNIX code that cannot be substituted
> >by a user to his own liking. For this reason, the kernel
> >should make as few real decisions as possible. This does not 
>                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Important point!

I agree that this is important.  Currently, the kernel has decided
that any old random garbage may be used for file names.  This allows for
the needs of all users, certainly.  It also GETS IN PEOPLE'S WAY more
often than it HELPS ANYONE.  My proposal was to be able to ask the kernel
to be more restrictive.  Many people have pointed out that hacking up 
all the shells and libraries and application programs to prevent
garbage file names is the wrong way to do this, since it spreads the
enforcement all over the system instead of localizing the policy.

> >Even so, if "the way" is too radical, no one will follow it. Every
> >important decision was weighed carefully. Throughout, simplicity
> >has been substitued for efficiency. Complex algorithms are used
> >only if their complexity can be localized.

Note that my suggestion was not for a "complex algorithm"; once installed,
it requires one additional array access and one bit-masking per character
to test the basename of files being created for the first time.

This is localization of complexity.

> If you want to protect naive users from unprintable filenames, then
> by all means write a special shell or relink your utilities with
> a special version of the C library, but don't put that decision
> in the kernel.

Putting enforcement of usable file names everywhere else in the system
does NOT localize complexity.  The only reasonable place to prevent
the existence of garbage file names, if this is desired, is to enforce
it where all new file names get added to the system.

> I think the proposal that unprintable filenames be disallowed by the
> kernel follows from a VMS/IBM mentality. Sorry to be so harsh, but
> it totally goes against the whole spirit of UNIX.

I have never used a VMS or IBM system, and am therefore unfamiliar with
this mentality to which you refer.  I have used UNIX and designed, written,
and tested system software for UNIX for nine years.  I am thus familiar
with the "spirit of UNIX", and know that it means slightly different
things to different people.  We agree that there are certain
principles as you have quoted from Ken Thompson that should be preserved,
and, as Ken says, "weighed carefully".  We apparently do not agree on the
relative weights of purity of design against usability of the system.

> The kernel should allow very general things. It is all the crap
> above it

You seem to take a rather dim view of user-space software in general.

> that should "protect" the user. This is because, you
> and I differ whether this "protection" is needed or not. You
> can have all the "protection" you want, but don't force it
> on me!

I don't think anyone intends to force it on anyone.  We just think it
should be configurable so that those who desire to use UNIX can do so,
and those who would rather preserve its moral purity can also do so.

> 		      Naim Abdullah
> 		      Dept. of EECS,
> 		      Northwestern University

Dave Decot
Hewlett-Packard Company
hpda!decot

morrell@hpsal2.HP.COM (Michael Morrell) (05/05/88)

/ hpsal2:comp.unix.wizards / jc@minya.UUCP (John Chambers) /  5:55 pm  Apr 30, 1988 /

The problem with this argument is:  Just what is a printable character?

John Chambers <{adelie,ima,maynard,mit-eddie}!minya!{jc,root}> (617/484-6393)
----------

HP-UX has the routine isprint (most likely all other Un*xes have it too).
So it is not too hard to determine what a printable character is (HP-UX's
implementation includes NLS as well).

As to the whole topic of what belongs in a valid filename, it seems to me
that if you could truly have ANY character in a filename, then things would
be ok, but that isn't the case.  First of all, as others have pointed out,
you have to exclude '\0' and '/'.  In addition, most (all?) shells use some
characters as metacharacters.  To handle filenames which might contain these
characters, the shells had to come up with a host of weird quoting rules.
These rules tend to be imprecisely explained in man pages, are very hard for
new users to grasp (even many experienced users!), and are sufficiently complex
that the implementation of these rules is invariably full of bugs.

In short, I see no gain and many drawbacks to allowing arbitrary characters in
filenames.

   Michael Morrell

guy@gorodish.Sun.COM (Guy Harris) (05/06/88)

> HP-UX has the routine isprint (most likely all other Un*xes have it too).

One should hope so; HP certainly didn't invent it, the people at BTL Research
did.

> So it is not too hard to determine what a printable character is (HP-UX's
> implementation includes NLS as well).

Given that it includes NLS, there is no single answer to that question.  The
answer depends on the character set you select.

This brings up another question: should the answer depend on the type of
terminal you're currently logged in on?  I.e., if you're on a VT100, should the
upper half of ISO Latin #1 be excluded, while if you're on a VT220 it's
included?

Another question: what does "isprint" do about "wide" character sets such as
various Kanji character sets?

> As to the whole topic of what belongs in a valid filename, it seems to me
> that if you could truly have ANY character in a filename, then things would
> be ok, but that isn't the case.  First of all, as others have pointed out,
> you have to exclude '\0' and '/'.  In addition, most (all?) shells use some
> characters as metacharacters.

Most, not all.  The major conventional UNIX command-line shells do; however,
you could have a "fill in the form" shell, or a "desktop metaphor" shell, that
doesn't.

> In short, I see no gain and many drawbacks to allowing arbitrary characters
> in filenames.

OK, what does "allowing" mean here?  There *might* be some merit to disallowing
the creation of path names containing certain bytes (note, as per the prvious
mention of Kanji character sets, that a "character" is not necessarily a single
byte).  Disallowing *all* pathnames containing these bytes would be wrong,
however, as it would prohibit you from referring to some of those files if your
session weren't configured to allow all characters in file names.  (No, you
can't say "you're on a terminal that doesn't support 8-bit characters, you
wouldn't be able to refer to them anyway"; consider a user logged in on a 7-bit
terminal doing an "rm -rf" on a directory containing files with 8-bit
characters in their names - or just with blanks in their names, if you choose
to disallow them.)

And, once again, I bring up the question of character sets such as various
Kanji sets.  If not all 16-bit combinations are valid Kanji, how can you
disallow "invalid" characters if each of the two bytes in such a character is
valid in some other character?

Sure, it sounds nice to say "make life easier for the user by preventing
hard-to-reference filenames from being used".  It's not clear that it's really
that easy.  Obviously, the kernel should not provide any policy here; I'm not
sure you can even provide a reasonable policy-free mechanism atop which the
desired policies can be implemented.

BTW, note that Draft 12 of POSIX says:

	filename
	     Names consisting of 1 to {NAME_MAX} bytes may be used to name a
	     file.  The characters composing the name may be selected from the
	     set of all characters excluding the slash character and those
	     containing the null byte (octal zero).

From this, I infer that no POSIX-conformant system will prohibit me from using
^A or '\353' in a file name; there may well be application writers who, for
whatever reason (bad or good), decide to do so.  Turning on filename
restrictions might conceivably break these applications; before you add such
restrictions, make sure either that this won't break any important applications
or that you can live with them not working.

gwyn@brl-smoke.ARPA (Doug Gwyn ) (05/07/88)

In article <2630003@hpsal2.HP.COM> morrell@hpsal2.HP.COM (Michael Morrell) writes:
>/ hpsal2:comp.unix.wizards / jc@minya.UUCP (John Chambers) /  5:55 pm  Apr 30, 1988 /
>The problem with this argument is:  Just what is a printable character?
>HP-UX has the routine isprint (most likely all other Un*xes have it too).
>So it is not too hard to determine what a printable character is (HP-UX's
>implementation includes NLS as well).

What constitutes a "printable character" is inherently locale-dependent.
Since the OS kernel has to support multiple concurrent locales, it is not
in a position to make a correct determination about character printability.

People who argue that the kernel should "help" the user seem to have
missed out on what is REALLY helpful and to have instead settled on an
overly restricted model of what users need to do.  The UNIX designers
got this exactly right.  The "value added resellers" on the other hand
seem to let marketing yoyos make their technical decisions for them.

jc@minya.UUCP (John Chambers) (05/07/88)

> >You may be part of the "English-only" crowd, but there are lots of us
> >who aren't, and we badly need those extra character codes.  The fact
> >that you can't type them on your silly ANSI terminal is of no concern
> >to us.
> >
> Do your devices not require device control characters (carriage-return,
> line-feed, form-feed, flow control, pad control, and the like?  Do you
> like having them in filenames?

Not particularly, but I like the alternatives even less.

One of the nice things about Unix is that I can explain to a novice 
that the kernel (i.e., open() and exec()) only know about '/' and
null, and ALL other characters are equally acceptable.  This is a
very simple rule, and eliminates any further questions.

"Ah, but why can't I type some names?" I hear you asking.  Well, 
that's simply not a Unix (i.e., kernel) question.  Again, it's
easy to explain to a novice.  A program (such as a shell) may
well implement any rules its author desires.  But since it's
outside the kernel, you don't have to live with rules you don't
like.  If you don't like a file's name, well, a trivial C program
can change it.

If the kernel imposed complicated rules on file names, none of
this would be true, and every programmer's job would be made that
much harder.  

One of Unix's virtues is that there are great simplicities like
this.  Let's try to keep it that way.  Our job is too complicated
as it is, without having to discover the funny naming rules that
each vendor's release has imposed on us.

[Now if there were something I could do about that file whose
name, due to a disk error, has a null or a '/' in its name. ;-]

-- 
John Chambers <{adelie,ima,maynard,mit-eddie}!minya!{jc,root}> (617/484-6393)

You can't make a turtle come out.
	-- Malvina Reynolds

idall@augean.OZ (Ian Dall) (05/09/88)

"Unprintable" characters in file names would be less of a problem if they
could be readilly be seen. It seems to me that the best solution is to
allow file names to be as general as possible but to modify either the
terminal driver or ls to display unprintable characters in some unambiguous
form.

If done in the terminal driver, new stty options could be added to say
what characters are to be considered unprintable. This could break
some things which like to setup terminals by using "cat foo" where foo
is a file full of special characters but this already has potential
problems if foo has ^M, ^J or ^I in it (perhaps we need a r(aw)cat).
Yes I know the terminal stuff is already a mess but at least this
concentrates the mess in one place :-). This scheme handles European,
Japanese or whatever character sets fairly well.

I think modifying ls to display filenames with unprintable character
sets is a bit less tidy. Perhaps it could pick up a printable character
set from an environment variable. One certainly wouldn't want umpteen new
options to specify which character set your terminal can print.

-- 
 Ian Dall           "In any argument there will be people on your
                     side who you wish were on the other side."
idall@augean.oz

paul@csnz.nz (Paul Gillingwater) (05/10/88)

In article <582@minya.UUCP> jc@minya.UUCP (John Chambers) writes:
>One of the nice things about Unix is that I can explain to a novice 
>that the kernel (i.e., open() and exec()) only know about '/' and
>null, and ALL other characters are equally acceptable. 
===========^^^
>-- 
>John Chambers <{adelie,ima,maynard,mit-eddie}!minya!{jc,root}> (617/484-6393)

Does that mean that a naive user can make a file with a <SPACE>
in the name?   e.g.   "John Doe" or "Job Cost" or other equally
"intuitively correct" but WRONG names...:-)
-- 
Paul Gillingwater, Senior Consultant   Call my private BBS - Magic Tower,
Computer Sciences of New Zealand Ltd   NZ +64 4 753561 V21/V23 8N1 24hrs
P.O.Box 929, Wellington, NEW ZEALAND   Soon: V22/V22bis/Bell 103/Bell 212A
Vox: +64 4 846194, Fax: +64 4 843924  "Scott me up, Beamie!"-Lounge Suit Larry

guy@gorodish.Sun.COM (Guy Harris) (05/11/88)

> If done in the terminal driver, new stty options could be added to say
> what characters are to be considered unprintable. This could break
> some things which like to setup terminals by using "cat foo" where foo
> is a file full of special characters

It also breaks just about any program that uses "termcap", "terminfo", or
"curses", unless all such programs turn off all at special processing.

> I think modifying ls to display filenames with unprintable character
> sets is a bit less tidy.

Perhaps, although it's already been done both in the 4BSD "ls" and the S5R2
"ls".

> Perhaps it could pick up a printable character set from an environment
> variable.

This functionality is specified by the current ANSI C drafts; you do

	setlocale(LC_CTYPE, "");

or

	setlocale(LC_ALL, "");

and the <ctype.h> macros switch to using the appropriate locale's rules.
According to the last POSIX draft, the appropriate locale can be specified by
setting the LANG, LC_ALL, or LC_CTYPE environment variables.

guy@gorodish.Sun.COM (Guy Harris) (05/11/88)

> Does that mean that a naive user can make a file with a <SPACE>
> in the name?

Yup.

> e.g.   "John Doe" or "Job Cost" or other equally "intuitively correct" but
> WRONG names...:-)

Yup.  As I presume you mean to imply, those names would be considered quite
valid by some users.  Many such users won't necessarily be using the
traditional UNIX shells, so it might not be at all inconvenient to use names
containing spaces.

ccs@lazlo.UUCP (Clifford C. Skolnick) (05/11/88)

In article <326@augean.OZ> idall@augean.OZ (Ian Dall) writes:
...
:If done in the terminal driver, new stty options could be added to say
:what characters are to be considered unprintable. This could break
:some things which like to setup terminals by using "cat foo" where foo
:is a file full of special characters but this already has potential
:problems if foo has ^M, ^J or ^I in it (perhaps we need a r(aw)cat).
:Yes I know the terminal stuff is already a mess but at least this
:concentrates the mess in one place :-). This scheme handles European,
:Japanese or whatever character sets fairly well.

This solution will also handle any control strings for the terminal.  I
wonder what "vi" would look like on his terminal :-).  Many things
would break if you put the stuff in the kernel tty driver, let's leave
it in the "ls" or "cat" command.  By the way, isn't there a Berkleyish
type command "see" which does expand these things?  I seem to remember
"ls | see" from somewhere, maybe it was Xenix.

:-- 
: Ian Dall           "In any argument there will be people on your
:                     side who you wish were on the other side."
:idall@augean.oz


-- 
Clifford C. Skolnick    | - Never insult 7 men while carrying a six shooter -
Phone: (716) 427-8046   |                       /!kodak!pcid!gizzmo! \
PACKET: N1DPH@WB2VPH    |  ...!rutgers!rochester                      lazlo!ccs
BITNET: CCS6277@RITVAX  |                       \!ritcv!ritcsh!sabin!/

gwyn@brl-smoke.ARPA (Doug Gwyn ) (05/11/88)

In article <326@augean.OZ> idall@augean.OZ (Ian Dall) writes:
>"Unprintable" characters in file names would be less of a problem if they
>could be readilly be seen. It seems to me that the best solution is to
>allow file names to be as general as possible but to modify either the
>terminal driver or ls to display unprintable characters in some unambiguous
>form.

Geez.  Haven't you heard of "pipes and filters"?  Pipe the output of "ls"
into a filter that converts whatever your notion of unprintable characters
may be into whatever you think the corresponding printable equivalent
should be.  Don't try to muck around with the basic system kernel and
utilities!

gwyn@brl-smoke.ARPA (Doug Gwyn ) (05/11/88)

In article <24@csnz.nz> paul@csnz.UUCP (Paul Gillingwater) writes:
>In article <582@minya.UUCP> jc@minya.UUCP (John Chambers) writes:
>>One of the nice things about Unix is that I can explain to a novice 
>>that the kernel (i.e., open() and exec()) only know about '/' and
>>null, and ALL other characters are equally acceptable. 
>Does that mean that a naive user can make a file with a <SPACE>
>in the name?   e.g.   "John Doe" or "Job Cost" or other equally
>"intuitively correct" but WRONG names...:-)

Of COURSE "ALL other" means "ALL other"!  Since a space character
is not '/' and is not a null character, then it is covered by the
phrase.  How hard IS this to figure out?

Not only are such strings as "John Doe" "intuitively correct",
they are useful file names.  Some Bourne, Korn, or C-shell operations
are a bit more difficult to express correctly if some of your file
names contain special characters such as spaces, but that's all.
You shouldn't turn "naive users" loose in such an environment anyway.
Menu-driven applications could well allow these as "natural" file names.

I've found good use for this capability on several occasions.
Your notion of "WRONG" obviously does not fit my needs.

vandys@hpindda.HP.COM (Andy Valencia) (05/11/88)

/ hpindda:comp.unix.wizards / paul@csnz.nz (Paul Gillingwater) /  9:36 am  May 10, 1988 /
>Does that mean that a naive user can make a file with a <SPACE>
>in the name?   e.g.   "John Doe" or "Job Cost" or other equally
>"intuitively correct" but WRONG names...:-)
    Yes, it does:

Script started on Wed May 11 08:34:56 1988
$ cat > "John Doe"
This is stuff on J.D.
$ cat "John Doe"
This is stuff on J.D.
$ rm "John Doe"
$ exit

script done on Wed May 11 08:35:26 1988

    You DO need the quotes, but saying it's WRONG is misleading;
I've used such file names in many applications to good effect.

				Andy Valencia
				vandys%hpindda.UUCP@hplabs.hp.com

roy@phri.UUCP (Roy Smith) (05/12/88)

gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
> Geez.  Haven't you heard of "pipes and filters"?  Pipe the output of "ls"
> into a filter [...]

	In this case, it's a little bit complicated since the filter would
have to be syntax-sensitive.  Just doing "ls | cat -v" is no good because you
want to escape \n in file names but not at the end of lines.  You probably
want to escape spaces in file names but no place else, etc.  I'm sure it's
possible to write some sort of sed command which takes:

-rw-r--r--  1 roy           491 Apr 27 14:01 calendar
-rw-r--r--  1 roy           817 May 11 13:15  foo bar

and correctly figures out that the second file name is " foo bar" and only
escapes those two spaces, but it would be ugly and difficult.  Try and make
that filter general enough to deal with the varient formats of "ls", "ls -l",
"ls -ls", "ls -lsi", and "ls -lsig" and it sure starts to look like building
control-character escapes into ls isn't such a bad idea after all.
-- 
Roy Smith, System Administrator
Public Health Research Institute
455 First Avenue, New York, NY 10016
{allegra,philabs,cmcl2,rutgers}!phri!roy -or- phri!roy@uunet.uu.net

les@chinet.UUCP (Leslie Mikesell) (05/12/88)

In article <52768@sun.uucp> guy@gorodish.Sun.COM (Guy Harris) writes:
>> Does that mean that a naive user can make a file with a <SPACE>
>> in the name?
>Yup.  As I presume you mean to imply, those names would be considered quite
>valid by some users.  Many such users won't necessarily be using the
>traditional UNIX shells, so it might not be at all inconvenient to use names
>containing spaces.

Users with a "point-and-click" interface wouldn't be bothered at all by
that.  Never mind that it breaks a lot of the administrators file management
shell scripts...
Off the subject, but an interesting side effect I have noticed of the
point-and-click type interface is that users tend to use longer filenames
and more mixed-case names.  Being harder to type is not a real disadvantage
since the name is only typed when the file is created.  Very interesting
things happen when you try to put directories full of long mixed-case
names into an MSDOS network-server environment.
  
  Les Mikesell

rbj@icst-cmr.arpa (Root Boy Jim) (05/12/88)

   "Unprintable" characters in file names ...

This solution is not strict enuf. In addition to the kernel supressing
unprintable filenames, it should also disallow *unpronouncable* file names!

	(Root Boy) Jim Cottrell	<rbj@icst-cmr.arpa>
	National Bureau of Standards
	Flamer's Hotline: (301) 975-5688
	The opinions expressed are solely my own
	and do not reflect NBS policy or agreement
I hope you millionaires are having fun!  I just invested half
 your life savings in yeast!!

jc@minya.UUCP (John Chambers) (05/13/88)

In article <24@csnz.nz>, paul@csnz.nz (Paul Gillingwater) writes:
> In article <582@minya.UUCP> jc@minya.UUCP (John Chambers) writes:
> >One of the nice things about Unix is that I can explain to a novice 
> >that the kernel (i.e., open() and exec()) only know about '/' and
> >null, and ALL other characters are equally acceptable. 
> ===========^^^
> >-- 
> >John Chambers <{adelie,ima,maynard,mit-eddie}!minya!{jc,root}> (617/484-6393)
> 
> Does that mean that a naive user can make a file with a <SPACE>
> in the name?   e.g.   "John Doe" or "Job Cost" or other equally
> "intuitively correct" but WRONG names...:-)

Yes, of course.  A call of open("John Doe",...) wouldn't faze any Unix that
I've worked with.  An frankly, I don't think it is the kernel's business
interfering with such.

The Bourne/C/Korn shell aren't the only Unix user interfaces in the world
any more.  Window/menu-driven shells are becoming quite common.  With such
shells, it is quite reasonable to let users "fill in the blanks" in a menu,
and use what is typed as a file name.  In such cases, there is no good
reason to ban such file names.  On the contrary, they would be intuitively
obvious to a great many users.

Forcing users to use names like "JohnDoe" or "JOHN_DOE" is a computer
silliness that is just that to many non-computer people.  It's about
time we started building software that can accept the names that real
live humans like to use.  "John Doe" is a perfectly good name to most
Americans, and it should be a perectly good name to a sensibly-designed
computer.  Similarly, "T'^Here`^Hse" (where '^H' is a backspace) is a 
perfectly good name to most Frenchmen, and it should be acceptable to 
a computer.

The fact is that Unix accepts such names without complaint.  It's only
the stupid shells that cause problems.  Rather than forcing your naming
rules on the users, why not build shells that accept names that the users
like?  We already have a kernel that does so.

(Ooops; I forgot to say "FLAME ON" a couple of paragraphs back.  Oh, well;
"FLAME OFF". :-)

-- 
John Chambers <{adelie,ima,maynard,mit-eddie}!minya!{jc,root}> (617/484-6393)

You can't make a turtle come out.
	-- Malvina Reynolds

idall@augean.OZ (Ian Dall) (05/14/88)

In article <56@lazlo.UUCP> ccs@lazlo.UUCP (Clifford C. Skolnick) writes:
>In article <326@augean.OZ> idall@augean.OZ (Ian Dall) writes:
>...
>:If done in the terminal driver, new stty options could be added to say
>:what characters are to be considered unprintable. This could break
>:some things which like to setup terminals by using "cat foo" where foo
>:is a file full of special characters but this already has potential
>:problems if foo has ^M, ^J or ^I in it (perhaps we need a r(aw)cat).
>:Yes I know the terminal stuff is already a mess but at least this
>:concentrates the mess in one place :-). This scheme handles European,
>:Japanese or whatever character sets fairly well.
>
>This solution will also handle any control strings for the terminal.  I
>wonder what "vi" would look like on his terminal :-).

I always did think it was a kludge the way the termcap routines
carefully avoid outputing certain characters. I really think that
programs which want to send terminal escape sequences should use an
appropriate terminal mode, because it is conceivable that someone could
design a terminal whose set of control codes made it impossible to
avoid sending ^J, ^M or ^I.

"vi" on my terminal? The only thing I use vi for is to install emacs :-)

However I acknowledge that many programs would break if the new
terminal modes I proposed were set (note though that you only need to
set these modes if you have funny file names).

> Many things
> would break if you put the stuff in the kernel tty driver, let's leave
> it in the "ls" or "cat" command.  By the way, isn't there a Berkleyish
> type command "see" which does expand these things?  I seem to remember
> "ls | see" from somewhere, maybe it was Xenix. 

Another advantage of putting this in "ls" is that the terminal driver
has know way of knowing it has been asked to output a file name. If a
file name has ^J in it "ls" could display it appropriately whereas the
driver would have to just assume it was supposed to start a new line
or whatever.

Some one else did say that both the Berkley and the SysV ls already
display "unprintable" filenames. I wasn't aware of this. Must be
because I don't make a habit of creating file names with "odd"
characters in them :-).

It seems like we have agreement here that nothing actually needs to
be done!
-- 
 Ian Dall           "In any argument there will be people on your
                     side who you wish were on the other side."
idall@augean.oz

ok@quintus.UUCP (Richard A. O'Keefe) (05/14/88)

In article <24@csnz.nz>, paul@csnz.nz (Paul Gillingwater) writes:
> Does that mean that a naive user can make a file with a <SPACE>
> in the name?   e.g.   "John Doe" or "Job Cost" or other equally
> "intuitively correct" but WRONG names...:-)

Yes you can make such file names (I often do), and what's wrong with them?
	cat >"Job Cost" <<EOF
	...
	EOF
	ed "Job Cost"
	...
	lpr Job\ Cost
Works fine.  Why should a shell's lexical conventions dictate what sort of
file names I can use in window-based tools?  (The program I use most often
on a Mac has a space in its name...)

gordon@sneaky.UUCP (05/15/88)

To all the proponents of hacking the kernel and/or the shell to protect the 
user from wierd file names:

Please do the world a favor and limit your "file-name sanity protection"
to creation of files.  Do not prohibit opening files with strange characters
in their names.  And especially, do not prohibit deleting them.  And 
yes, I mean permit this kind of access regardless of what the
user says his terminal is capable of.  (By the way, what kind of terminal
does cron use?)  

Yes, I know this messes up the obvious place to put the check in namei().
If doing it right is too complicated, maybe checking it at all is too
complicated?

Most of the problem is caused by the fact that the characters in the filename 
are treated specially.  Don't mess things up worse by making all "funny" 
characters as bad as embedded nulls or '/' characters in a directory entry.

Right now, there are several types of special characters that cause 
problems because they are special, the worst ones first:

'\0' and '/':  If you get these embedded in a filename, you have to go
	in through the disk device.  How did these ever get into
	a file name?  Ever have disk errors?  Overheated memory?
	Buggy programs that access disk devices?  Administrators
	that run disk-patch programs and goof?  Boot up after a crash
	with a scrambled free list and forget to/decide not to fsck?

characters with the high bit on:  on those systems where the shell strips
	high bit from filenames, use "rm -ri ." (not "rm -ri *"), to avoid 
	letting the shell get its hands on the name.  On other systems, 
	these characters are merely unreadable and/or untypable.

unprintable characters you can't read:  'od' the directory (easiest on SysV 
	systems), or use ls | cat -v to figure out what the name is.
	Also, if the characters are just invisible (as opposed to 'clear
	screen' or something), "rm -ri ." or rm -i with shell metacharacters
	filling in for the unprintable ones can work.

untypable characters:  use "rm -ri .", or rm -i with shell metacharacters
	to match the offending filename.

shell metacharacters:  use backslashes to quote the metacharacters and
	backslashes.  (This may vary with what shell you are using.  Also,
	using single quotes may be easier.)

'-': if it's the first character, prefix ./ to the filename.

By the way, I know there are lots of programs around like "rmfile" that
may make the job easier.  "rm" is pretty standard on UN*X/Xen*x systems
and "rmfile", etc. aren't.


					Gordon Burditt
					...!ihnp4!sys1!sneaky!gordon

P.S. How long before /bin/shell is an illegal filename for English-speaking
users only, because of the embedded cussword?

bzs@bu-cs.BU.EDU (Barry Shein) (05/16/88)

The problem is that people are making an unnecessary distinction
between the data contained in a file name and the data contained
within the file it indicates.

A couple of years ago, in response to someone claiming that the 255
character file names of BSD are not useful I proposed a data base
design where all the data is kept in the FILE NAMES (after all, 255
bytes/record is reasonably generous under most data bases.)

The file contents would only hold other info like accessibilty,
journaling, audits etc.

The argument was that this provides a lot of integrity via
write-through and all sorts of data base tools for dealing with this
CODASYL (basically, hierarchical/tree) data base. LS becomes a way to
dump a view, 'find' is a powerful search operator, mv, rm, touch etc
are data base fundamentals, wild-cards are very useful also. The
primary data recovery tool is fsck. The inode info (stat info)
provides accessibility and ownership discipline on every record,
sounds pretty good to me.

Hey, use your imagination.

	-Barry Shein, Boston University

root@conexch.UUCP (Larry Dighera) (05/16/88)

In article <3267@phri.UUCP> roy@phri.UUCP (Roy Smith) writes:
>gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>> Geez.  Haven't you heard of "pipes and filters"?  Pipe the output of "ls"
>> into a filter [...]
>
>	In this case, it's a little bit complicated since the filter would
>have to be syntax-sensitive.  Just doing "ls | cat -v" is no good because you
>want to escape \n in file names but not at the end of lines.  You probably
>want to escape spaces in file names but no place else, etc.  I'm sure it's
>possible to write some sort of sed command which takes:
>
>-rw-r--r--  1 roy           491 Apr 27 14:01 calendar
>-rw-r--r--  1 roy           817 May 11 13:15  foo bar
>
>and correctly figures out that the second file name is " foo bar" and only
>escapes those two spaces, but it would be ugly and difficult.  Try and make
>that filter general enough to deal with the varient formats of "ls", "ls -l",
>"ls -ls", "ls -lsi", and "ls -lsig" and it sure starts to look like building
>control-character escapes into ls isn't such a bad idea after all.
>-- 
>Roy Smith, System Administrator
>Public Health Research Institute
>455 First Avenue, New York, NY 10016
>{allegra,philabs,cmcl2,rutgers}!phri!roy -or- phri!roy@uunet.uu.net

I must not understand the problem here, because this all looks very simple
to me.  

First, SCO Xenix's ls command supports the -b option which forces printing of
non-graphic characters in file names to be in the octal \ddd notaion.

Secondly, ls * | od -c essentially does the same.

Thirdly, here's a filter that displays all characters printable or not:


/* see.c    04/30/1986  20:14:14    Steve Kirby  */
 
#include <stdio.h>
#include <ctype.h>
 
main(argc,argv)
int argc;  char **argv;
{
  char *prog = argv[0];
 
  FILE *fopen(), *iFP;
 
  if (++argv, --argc)  
    for ( ; argc; ++argv, --argc)
    {
      if ( ( iFP = fopen(*argv,"r") ) == NULL )
        fprintf(stderr, "%s: can't open %s \n", prog, *argv ),  exit(1);
 
      see(iFP),   fclose(iFP);
    }
  else
    see(stdin);
 
  exit(0);
 
}
 
 
see(iFP)
FILE *iFP;
{
  int c;
 
  while ( ( c = getc(iFP) ) != EOF )
    if ( iscntrl(c) )
      printf("^%c%s", c + '@', c=='\n' ? "\n" : "" );
    else
      putchar(c);
 
}
 
/* eof see.c  */
 
Specifically what are you trying to do?  Can you give an example of a
command that you are trying to execute that prompted you to post your article?

Best Regards,
Larry Dighera


-- 
USPS: The Consultants' Exchange, PO Box 12100, Santa Ana, CA  92712
TELE: (714) 842-6348: BBS (N81); (714) 842-5851: Xenix guest account (E71)
UUCP: conexch Any ACU 2400 17148425851 ogin:-""-ogin:-""-ogin: nuucp
UUCP: ...!ucbvax!ucivax!icnvax!conexch!root || ...!trwrb!ucla-an!conexch!root

paul@csnz.nz (Paul Gillingwater) (05/16/88)

In article <972@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
>In article <24@csnz.nz>, paul@csnz.nz (Paul Gillingwater) writes:
>> Does that mean that a naive user can make a file with a <SPACE>
>> in the name?   e.g.   "John Doe" or "Job Cost" or other equally
>> "intuitively correct" but WRONG names...:-)
>
>Yes you can make such file names (I often do), and what's wrong with them?

OK guys, flame taken!  I have known since 1979 that *nix has had the
ability to store nearly any characters in file names, and I think it's
a good idea - depending upon the shell you're using.  Most of us out
here are _not_ using windows yet, and must live with [sh,csh,ksh].

The wrongness comes in how the shell treats requests from naive users
who don't know about quoted strings or the fact that John Doe is quite
different from john doe.  What I'd like to see is a more forgiving,
or perhaps a 'cleverer' filename parser, that can decide that John Doe
(without quotes) is similar to john doe (in .) - and ask the user to confirm
(unless they're running in a script) that that's what they meant - 
"Do what I mean, not what I say...".  SCO Xenix seems to go some of
the way towards this, but not enough - and the place for it is in
the SHELL, not the kernel.

-- 
Paul Gillingwater, Senior Consultant   Call my private BBS - Magic Tower,
Computer Sciences of New Zealand Ltd   NZ +64 4 753561 V21/V23 8N1 24hrs
P.O.Box 929, Wellington, NEW ZEALAND   Soon: V22/V22bis/Bell 103/Bell 212A
Vox: +64 4 846194, Fax: +64 4 843924  "Scott me up, Beamie!"-Lounge Suit Larry

karish@denali.stanford.edu (Chuck Karish) (05/17/88)

In article <339@conexch.UUCP> root@conexch.UUCP (Larry Dighera) writes:
>I must not understand the problem here, because this all looks very simple
>to me.  
>
>First, SCO Xenix's ls command supports the -b option which forces printing of
>non-graphic characters in file names to be in the octal \ddd notaion.
>
>Secondly, ls * | od -c essentially does the same.

I use `od -c .'  The entries in my System V directories (AIX) are 16 bytes
wide, and just fit in one line of `od -c' output.


Chuck Karish		ARPA:	karish@denali.stanford.edu
			BITNET:	karish%denali@forsythe.stanford.edu
			UUCP:	{decvax,hplabs!hpda}!mindcrf!karish
			paper:	1825 California St. #5

rbj@icst-cmr.arpa (Root Boy Jim) (05/18/88)

   From: Barry Shein <bzs@bu-cs.bu.edu>

   [Using the file system as a database]

   The primary data recovery tool is fsck.

Fsck would rather clear zero length entrys than relink them. Better
put *some* data in it.

   Hey, use your imagination.

I tried, but I couldn't find the manpage for it :-)

	   -Barry Shein, Boston University

	(Root Boy) Jim Cottrell	<rbj@icst-cmr.arpa>
	National Bureau of Standards
	Flamer's Hotline: (301) 975-5688
	The opinions expressed are solely my own
	and do not reflect NBS policy or agreement
	My name is in /usr/dict/words. Is yours?

peter@ficc.UUCP (Peter da Silva) (05/19/88)

In article <56@lazlo.UUCP>, ccs@lazlo.UUCP writes:
> In article <326@augean.OZ> idall@augean.OZ (Ian Dall) writes:
> :If done in the terminal driver, new stty options could be added to say
> :what characters are to be considered unprintable.

You would also have to provide a reverse mapping (a-la lcase) that let
you do this:

% ls foo*
foo^G^?bar
% rm foo^G^?bar

and have it work right.

> Many things
> would break if you put the stuff in the kernel tty driver, let's leave
> it in the "ls" or "cat" command...

The easy solution is to have this stuff active on line discipline 1 or
something (if your system is already using 1). The best way to do this
in general would be to have a couple of mapping tables (yes, in the tty
driver!) that allowed you to map arbitrary strings. I believe Microport
already does this...

How's this for a fantasy...

% stty -map lcase # clear lcase/ucase default map
% stty map cntrl # set control character map
% stty mapi '^[[K' '^U' # map CEOL to kill on input
% stty mapo '^U' '^M^[[K' # map kill character to CR plus erase line
% stty mapo '^H' '^H ^H' # map erase character to BS-space-BS

I'm sure we can beat the TOPS-20 terminal driver if we REALLY try :->.
-- 
-- Peter da Silva, Ferranti International Controls Corporation.
-- Phone: 713-274-5180. Remote UUCP: uunet!nuchat!sugar!peter.