[comp.unix.wizards] FileNames with the high bit set.

naim@eecs.nwu.edu (Naim Abdullah) (04/09/88)

On our 4.3+NFS (Mt. Xinu) system on a Vax780 and also on a Sun 3/60
running SunOS 3.5, open(2) and creat(2) return EINVAL if the pathname
supplied to them has a character with the high order bit set.

Why is this ? Has this behaviour been added by Berkeley Unix or has
it "always" been there in Unix ? Is it because sh(1) uses the parity
bit for it's own purposes and the kernel does not want to create
files that the shell might not be able to handle in this manner ?
(or is it, that sh(1) knows about this kernel idiosychracy and exploits
this behaviour for it's own advantage..). In other words, is the
kernel behaviour driven by the shell implementation or is the
shell implementation driven by the kernel behaviour? (is this
a chicken and egg question ?)

In any case, this seems like an arbitrary restriction. I can
imagine applications which might want to create files that have
names with arbitrary bytes in them (if you used a hashing function
on some key to come up with a filename, you can get an "invalid"
pathname).

		      Naim Abdullah
		      Dept. of EECS,
		      Northwestern University

		      Internet: naim@eecs.nwu.edu
		      Uucp: {ihnp4, chinet, gargoyle}!nucsrl!naim
  

gwyn@brl-smoke.ARPA (Doug Gwyn ) (04/11/88)

In article <8120010@eecs.nwu.edu> naim@eecs.nwu.edu (Naim Abdullah) writes:
>... open(2) and creat(2) return EINVAL if the pathname
>supplied to them has a character with the high order bit set.

I don't recall exactly which release of 4BSD introduced this "yet
another better idea", somewhere around 4.1cBSD I think.  Yes, it
is a bogus feature.  Note that the latest Bourne shells from AT&T
no longer mess around with the high bits of characters in names.
(The latest releases of the Korn shell also have this fixed.)  I
think vendors have finally realized that 7-bit ASCII is parochial.

mike@turing.UNM.EDU (Michael I. Bushnell) (04/11/88)

Disallowing the high order bit in filenames was done in 4BSD.  I think
the reason had something to do with printability--a desire to limit
the filesystem namespace to ASCII codes.

                N u m q u a m   G l o r i a   D e o 

			Michael I. Bushnell
			HASA - "A" division
14308 Skyline Rd NE				Computer Science Dept.
Albuquerque, NM  87123		OR		Farris Engineering Ctr.
	OR					University of New Mexico
mike@turing.unm.edu				Albuquerque, NM  87131
{ucbvax,gatech}!unmvax!turing.unm.edu!mike

guy@gorodish.Sun.COM (Guy Harris) (04/11/88)

> On our 4.3+NFS (Mt. Xinu) system on a Vax780 and also on a Sun 3/60
> running SunOS 3.5, open(2) and creat(2) return EINVAL if the pathname
> supplied to them has a character with the high order bit set.
> 
> Why is this ? Has this behaviour been added by Berkeley Unix or has
> it "always" been there in Unix ?

It was added in 4.2BSD.

> Is it because sh(1) uses the parity bit for it's own purposes and the
> kernel does not want to create files that the shell might not be able
> to handle in this manner ?

In addition to pre-S5R3 "sh", the C shell also uses the parity bit for this.
The 8th bit stuff was probably thrown in for precisely the reason you list.

> In any case, this seems like an arbitrary restriction.

It is.

> I can imagine applications which might want to create files that have
> names with arbitrary bytes in them (if you used a hashing function
> on some key to come up with a filename, you can get an "invalid"
> pathname).

Hell, I have a symbolic link to "/vmunix" on my machine named "/UNIX(r)", where
"(r)" refers to the ISO Latin #1 "registered trademark" character, which has
the hexadecimal code 0xAE.  SunOS 4.0 removed the restriction in question; it
uses the S5R3 Bourne shell as its Bourne shell, and that shell doesn't have
problems with file names containing 8-bit characters, so if you have files like
that lying around "rm -i *" (or "rm -i .*" if the file name begins with ".")
can clean them up from the Bourne shell.  The 4.0 C shell still can't handle
filenames such as that; this is a restriction we currently plan to lift in a
future release.

Creating file names containing arbitrary character codes is probably not a good
idea; if you have an OS and file system that allow you to create very long file
names, you should use that capability.  The reason we removed the restriction
was not so that you could create files with binary names; it was as a first
step towards supporting larger character sets than ASCII, such as the ISO 8859
chraracter sets and the various EUC-derived Asian character sets, in file
names.

(BTW, you *can't* create files that have names with truly arbitrary bytes in
them; '/' and '\0' are not valid in UNIX file names - '/' separates *file*
names in a *path* name, and '\0' terminates a path name.)

bzs@bu-cs.BU.EDU (Barry Shein) (04/11/88)

From Doug Gwyn...
>In article <8120010@eecs.nwu.edu> naim@eecs.nwu.edu (Naim Abdullah) writes:
>>... open(2) and creat(2) return EINVAL if the pathname
>>supplied to them has a character with the high order bit set.
>
>I don't recall exactly which release of 4BSD introduced this "yet
>another better idea", somewhere around 4.1cBSD I think.  Yes, it
>is a bogus feature.  Note that the latest Bourne shells from AT&T
>no longer mess around with the high bits of characters in names.
>(The latest releases of the Korn shell also have this fixed.)  I
>think vendors have finally realized that 7-bit ASCII is parochial.

Yes, I agree it's bogus and interferes severely with some
internationalization schemes. I believe vendors are starting to remove
it from their 4BSD based systems.

Didn't that start because you couldn't rm or otherwise name 8-bit
files from the shells which eventually proved a nuisance? I believe
that some earlier versions of Emacs used this to create backup files
for just this reason (I remember groan comments in CCA Emacs about
this going away near some ifdef's.)

	-Barry Shein, Boston University

daveb@geac.UUCP (David Collier-Brown) (04/11/88)

In article <8120010@eecs.nwu.edu> naim@eecs.nwu.edu (Naim Abdullah) writes:
| On our 4.3+NFS (Mt. Xinu) system on a Vax780 and also on a Sun 3/60
| running SunOS 3.5, open(2) and creat(2) return EINVAL if the pathname
| supplied to them has a character with the high order bit set.
| 
| Why is this ? Has this behaviour been added by Berkeley Unix or has
| it "always" been there in Unix ? Is it because sh(1) uses the parity
| bit for it's own purposes and the kernel does not want to create
| files that the shell might not be able to handle in this manner ?
| (or is it, that sh(1) knows about this kernel idiosychracy and exploits
| this behaviour for it's own advantage..).

  I suspect its an accident, and know it can be removed: we're using
a "8-bit clean" environment here, with the exception of vi. Neither
the shell(s) nor the kernel cares any more.  Various programs have
problems, though...
-- 
 David Collier-Brown.                 {mnetor yunexus utgpu}!geac!daveb
 Geac Computers International Inc.,   |  Computer Science loses its
 350 Steelcase Road,Markham, Ontario, |  memory (if not its mind) 
 CANADA, L3R 1B3 (416) 475-0525 x3279 |  every 6 months.

guy@gorodish.Sun.COM (Guy Harris) (04/11/88)

> Disallowing the high order bit in filenames was done in 4BSD.  I think
> the reason had something to do with printability--a desire to limit
> the filesystem namespace to ASCII codes.

"printable" != "ASCII".  ^A is ASCII (it's the SOH control character), but it's
not printable; most terminals just ignore it.  0xC4 is printable on some
terminals (e.g. DEC VT200 series, and workstations with a character in that
position in their fonts), being "capital-A-with-a-diaresis" in ISO Latin #1,
but it's not ASCII.

Limiting the filesystem namespace to ASCII codes doesn't guarantee that all
file names will be printable, and guarantees that some names that are printable
on some machines are disallowed.

wesommer@athena.mit.edu (William E. Sommerfeld) (04/12/88)

In article <48993@sun.uucp> guy@gorodish.Sun.COM (Guy Harris) writes:
>(BTW, you *can't* create files that have names with truly arbitrary bytes in
>them; '/' and '\0' are not valid in UNIX file names - '/' separates *file*
>names in a *path* name, and '\0' terminates a path name.)

Yes, but...

If you're running NFS, the NFS _server_ (at least the one we're
running here) will let you put `/' in filenames, since it works at the
inode & filename level, not the pathname level.

To get it to do this, you have to write a user-level program which
sends RPC requests directly to the NFS server.

Of course, you then have to write another one to get rid of it, or
resort to using clri.

				- Bill Sommerfeld
				wesommer@athena.mit.edu

guy@gorodish.Sun.COM (Guy Harris) (04/12/88)

> >(BTW, you *can't* create files that have names with truly arbitrary bytes in
> >them; '/' and '\0' are not valid in UNIX file names - '/' separates *file*
> >names in a *path* name, and '\0' terminates a path name.)
> 
> Yes, but...
> 
> If you're running NFS, the NFS _server_ (at least the one we're
> running here) will let you put `/' in filenames, since it works at the
> inode & filename level, not the pathname level.
> 
> To get it to do this, you have to write a user-level program which
> sends RPC requests directly to the NFS server.
> 
> Of course, you then have to write another one to get rid of it, or
> resort to using clri.

That's obviously a bug, not a feature.  You can't create files containing "/"
by using the official UNIX mechanisms for creating files.

david@elroy.Jpl.Nasa.Gov (David Robinson) (04/12/88)

In article <49108@sun.uucp>, guy@gorodish.Sun.COM (Guy Harris) writes:
< > >(BTW, you *can't* create files that have names with truly arbitrary bytes in
< > >them; '/' and '\0' are not valid in UNIX file names - '/' separates *file*
< > >names in a *path* name, and '\0' terminates a path name.)
< > 
< > If you're running NFS, the NFS _server_ (at least the one we're
< > running here) will let you put `/' in filenames, since it works at the
< > inode & filename level, not the pathname level.
< > 
< That's obviously a bug, not a feature.  You can't create files containing "/"
< by using the official UNIX mechanisms for creating files.

What if the NFS server is not a *Unix* machine?  What if the client
is not a Unix machine?  There is no NFS error to indicate an illegal
file name character!




-- 
	David Robinson		elroy!david@csvax.caltech.edu     ARPA
				david@elroy.jpl.nasa.gov	  ARPA
				{cit-vax,ames}!elroy!david	  UUCP
Disclaimer: No one listens to me anyway!

guy@gorodish.Sun.COM (Guy Harris) (04/13/88)

> < That's obviously a bug, not a feature.  You can't create files containing "/"
> < by using the official UNIX mechanisms for creating files.
> 
> What if the NFS server is not a *Unix* machine?

Then if the native file system supports "/" in file names, the server should
allow them.  UNIX clients will obviously not be able to get at such files,
unless the client code does some sort of file-name mapping, just as MS-DOS
clients have to do some sort of mapping to handle file names such as
"FoObAr_and_a_bunch_of_other_stuff.4.65.13", and VMS clients would presumably
have to do some sort of mapping to handle file names such as "[[[[]]]]..dir",
etc..

> What if the client is not a Unix machine?

If the client is not a UNIX machine, and the server is, the client just has to
lose or do file-name mapping if it wants to handle file names containing
slashes.  If the client is not a UNIX machine, and the server isn't, and the
server's native file system can handle "/" in file names, you win.  If it's not
a UNIX system, but it can't handle "/" in file names, you lose.

> There is no NFS error to indicate an illegal file name character!

Well, presumably the NFS servers written for VMS have stolen some other error
code to use to complain about attempts to e.g. create files with names
containing characters not considered kosher in VMS (I don't remember whether
ODS-2 directories contain file names in ASCII or RADIX-50; if the latter, there
are characters that are not only non-kosher but not representable).  Also,
if 4.3BSD servers reject file names containing characters with the 8th bit set,
they also have to choose some error code, since the error that the file system
code returns for this is EINVAL, which has no direct NFS equivalent.

It is not ideal that a server has to steal another error code for this.  Future
versions of the NFS protocol should probably include such an error code.

rbj@icst-cmr.arpa (Root Boy Jim) (04/15/88)

   From: Barry Shein <bzs@bu-cs.BU.EDU>

   > I think vendors have finally realized that 7-bit ASCII is parochial.

   Yes, I agree it's bogus and interferes severely with some
   internationalization schemes. I believe vendors are starting to remove
   it from their 4BSD based systems.

Well, I'm probably gonna really get flamed for this, but here goes...

Um, don't you guys realize that if you implement international
character sets people are gonna start USING them? That's right, the
real threat to American security is not the commies, not Japanese
technology, not cheap Korean or Yugoslavian automobiles, it's programs
in another language. You wanna try hacking hack in Dutch? I say NO!
English is already the second language in the world. After all, they're
used to learning foreign languages and using them, we're not. We gave
away the hydrogen bomb, let's not give away the whole store.

	   -Barry Shein, Boston University

	(Root Boy) Jim Cottrell	<rbj@icst-cmr.arpa>
	National Bureau of Standards
	Flamer's Hotline: (301) 975-5688
	The opinions expressed are solely my own
	and do not reflect NBS policy or agreement
	Uh-oh!!  I forgot to submit to COMPULSORY URINALYSIS!

mouse@mcgill-vision.UUCP (der Mouse) (04/23/88)

In article <4540@bloom-beacon.MIT.EDU>, wesommer@athena.mit.edu (William E. Sommerfeld) writes:
> In article <48993@sun.uucp> guy@gorodish.Sun.COM (Guy Harris) writes:
>> (BTW, you *can't* create files that have names with truly arbitrary
>> bytes in them; '/' and '\0' are not valid in UNIX file names [...].)

> Yes, but...

> If you're running NFS, the NFS _server_ (at least the one we're
> running here) will let you put `/' in filenames, since it works at
> the inode & filename level, not the pathname level.

> To get it to do this, you have to write a user-level program which
> sends RPC requests directly to the NFS server.

...and on a non-NFS system you can write a program which scribbles on
the raw disk and creates directory entries with slashes in them.  It's
fairly closely analogous.

And about equally useful (or, rather, equally useless).

					der Mouse

			uucp: mouse@mcgill-vision.uucp
			arpa: mouse@larry.mcrcim.mcgill.edu

mangler@cit-vax.Caltech.Edu (Don Speck) (04/24/88)

One of the beautiful things about the filename syntax of older
Unixes is that there was no such thing as an illegal filename.
Any string had the potential to be a filename, because namei
did something more-or-less sensible with any pattern of slashes
even when there were 0 or >14 characters between them.	Quite
a welcome relief from O.S's with strict punctuation rules, e.g.
    foovax::[000000.mydir.subdir]file.ext;32767

Alas, this changed in 4.2 BSD, and some filenames are now illegal.
Now some propose to add even more restrictions.  It's contagious
and pretty soon we'll be back to all those punctuation rules.

As the TCP people say, "be liberal in what input you accept".

Don Speck   speck@vlsi.caltech.edu  {amdahl,ames!elroy}!cit-vax!speck

daveb@geac.UUCP (David Collier-Brown) (04/25/88)

In article <6258@cit-vax.Caltech.Edu> mangler@cit-vax.Caltech.Edu (Don Speck) writes:
>One of the beautiful things about the filename syntax of older
>Unixes is that there was no such thing as an illegal filename.
>
>As the TCP people say, "be liberal in what input you accept".

  The "filenames with the high bit set" problem was seen and dealt
with, once upon a time, by both Multics and Unix, by permitting the
mv-equivalent command to accept **any** string as a "from" name, but
only a "legal" string as a "to" name.  This tends to decrease the
difficulty of switching to an 8- (or 9-)bit character set, as the
charset-sensitive code is rather centralized.

  (In Unix the problem was both easier and harder: almost any
character is legal in a "to" name, and the shell interferes in
typing some characters directly.  I confess I do not remember what
happens if the "from" name contains a slash or null.  My reading of
the kernel implementation implies that it REALLY "can't happen": you
get an invalid path to the file.)

--dave (but what if a filesystem does it "wrong"?) c-b
-- 
 David Collier-Brown.                 {mnetor yunexus utgpu}!geac!daveb
 Geac Computers International Inc.,   |  Computer Science loses its
 350 Steelcase Road,Markham, Ontario, |  memory (if not its mind) 
 CANADA, L3R 1B3 (416) 475-0525 x3279 |  every 6 months.