naim@eecs.nwu.edu (Naim Abdullah) (04/09/88)
On our 4.3+NFS (Mt. Xinu) system on a Vax780 and also on a Sun 3/60 running SunOS 3.5, open(2) and creat(2) return EINVAL if the pathname supplied to them has a character with the high order bit set. Why is this ? Has this behaviour been added by Berkeley Unix or has it "always" been there in Unix ? Is it because sh(1) uses the parity bit for it's own purposes and the kernel does not want to create files that the shell might not be able to handle in this manner ? (or is it, that sh(1) knows about this kernel idiosychracy and exploits this behaviour for it's own advantage..). In other words, is the kernel behaviour driven by the shell implementation or is the shell implementation driven by the kernel behaviour? (is this a chicken and egg question ?) In any case, this seems like an arbitrary restriction. I can imagine applications which might want to create files that have names with arbitrary bytes in them (if you used a hashing function on some key to come up with a filename, you can get an "invalid" pathname). Naim Abdullah Dept. of EECS, Northwestern University Internet: naim@eecs.nwu.edu Uucp: {ihnp4, chinet, gargoyle}!nucsrl!naim
gwyn@brl-smoke.ARPA (Doug Gwyn ) (04/11/88)
In article <8120010@eecs.nwu.edu> naim@eecs.nwu.edu (Naim Abdullah) writes: >... open(2) and creat(2) return EINVAL if the pathname >supplied to them has a character with the high order bit set. I don't recall exactly which release of 4BSD introduced this "yet another better idea", somewhere around 4.1cBSD I think. Yes, it is a bogus feature. Note that the latest Bourne shells from AT&T no longer mess around with the high bits of characters in names. (The latest releases of the Korn shell also have this fixed.) I think vendors have finally realized that 7-bit ASCII is parochial.
mike@turing.UNM.EDU (Michael I. Bushnell) (04/11/88)
Disallowing the high order bit in filenames was done in 4BSD. I think the reason had something to do with printability--a desire to limit the filesystem namespace to ASCII codes. N u m q u a m G l o r i a D e o Michael I. Bushnell HASA - "A" division 14308 Skyline Rd NE Computer Science Dept. Albuquerque, NM 87123 OR Farris Engineering Ctr. OR University of New Mexico mike@turing.unm.edu Albuquerque, NM 87131 {ucbvax,gatech}!unmvax!turing.unm.edu!mike
guy@gorodish.Sun.COM (Guy Harris) (04/11/88)
> On our 4.3+NFS (Mt. Xinu) system on a Vax780 and also on a Sun 3/60 > running SunOS 3.5, open(2) and creat(2) return EINVAL if the pathname > supplied to them has a character with the high order bit set. > > Why is this ? Has this behaviour been added by Berkeley Unix or has > it "always" been there in Unix ? It was added in 4.2BSD. > Is it because sh(1) uses the parity bit for it's own purposes and the > kernel does not want to create files that the shell might not be able > to handle in this manner ? In addition to pre-S5R3 "sh", the C shell also uses the parity bit for this. The 8th bit stuff was probably thrown in for precisely the reason you list. > In any case, this seems like an arbitrary restriction. It is. > I can imagine applications which might want to create files that have > names with arbitrary bytes in them (if you used a hashing function > on some key to come up with a filename, you can get an "invalid" > pathname). Hell, I have a symbolic link to "/vmunix" on my machine named "/UNIX(r)", where "(r)" refers to the ISO Latin #1 "registered trademark" character, which has the hexadecimal code 0xAE. SunOS 4.0 removed the restriction in question; it uses the S5R3 Bourne shell as its Bourne shell, and that shell doesn't have problems with file names containing 8-bit characters, so if you have files like that lying around "rm -i *" (or "rm -i .*" if the file name begins with ".") can clean them up from the Bourne shell. The 4.0 C shell still can't handle filenames such as that; this is a restriction we currently plan to lift in a future release. Creating file names containing arbitrary character codes is probably not a good idea; if you have an OS and file system that allow you to create very long file names, you should use that capability. The reason we removed the restriction was not so that you could create files with binary names; it was as a first step towards supporting larger character sets than ASCII, such as the ISO 8859 chraracter sets and the various EUC-derived Asian character sets, in file names. (BTW, you *can't* create files that have names with truly arbitrary bytes in them; '/' and '\0' are not valid in UNIX file names - '/' separates *file* names in a *path* name, and '\0' terminates a path name.)
bzs@bu-cs.BU.EDU (Barry Shein) (04/11/88)
From Doug Gwyn... >In article <8120010@eecs.nwu.edu> naim@eecs.nwu.edu (Naim Abdullah) writes: >>... open(2) and creat(2) return EINVAL if the pathname >>supplied to them has a character with the high order bit set. > >I don't recall exactly which release of 4BSD introduced this "yet >another better idea", somewhere around 4.1cBSD I think. Yes, it >is a bogus feature. Note that the latest Bourne shells from AT&T >no longer mess around with the high bits of characters in names. >(The latest releases of the Korn shell also have this fixed.) I >think vendors have finally realized that 7-bit ASCII is parochial. Yes, I agree it's bogus and interferes severely with some internationalization schemes. I believe vendors are starting to remove it from their 4BSD based systems. Didn't that start because you couldn't rm or otherwise name 8-bit files from the shells which eventually proved a nuisance? I believe that some earlier versions of Emacs used this to create backup files for just this reason (I remember groan comments in CCA Emacs about this going away near some ifdef's.) -Barry Shein, Boston University
daveb@geac.UUCP (David Collier-Brown) (04/11/88)
In article <8120010@eecs.nwu.edu> naim@eecs.nwu.edu (Naim Abdullah) writes: | On our 4.3+NFS (Mt. Xinu) system on a Vax780 and also on a Sun 3/60 | running SunOS 3.5, open(2) and creat(2) return EINVAL if the pathname | supplied to them has a character with the high order bit set. | | Why is this ? Has this behaviour been added by Berkeley Unix or has | it "always" been there in Unix ? Is it because sh(1) uses the parity | bit for it's own purposes and the kernel does not want to create | files that the shell might not be able to handle in this manner ? | (or is it, that sh(1) knows about this kernel idiosychracy and exploits | this behaviour for it's own advantage..). I suspect its an accident, and know it can be removed: we're using a "8-bit clean" environment here, with the exception of vi. Neither the shell(s) nor the kernel cares any more. Various programs have problems, though... -- David Collier-Brown. {mnetor yunexus utgpu}!geac!daveb Geac Computers International Inc., | Computer Science loses its 350 Steelcase Road,Markham, Ontario, | memory (if not its mind) CANADA, L3R 1B3 (416) 475-0525 x3279 | every 6 months.
guy@gorodish.Sun.COM (Guy Harris) (04/11/88)
> Disallowing the high order bit in filenames was done in 4BSD. I think > the reason had something to do with printability--a desire to limit > the filesystem namespace to ASCII codes. "printable" != "ASCII". ^A is ASCII (it's the SOH control character), but it's not printable; most terminals just ignore it. 0xC4 is printable on some terminals (e.g. DEC VT200 series, and workstations with a character in that position in their fonts), being "capital-A-with-a-diaresis" in ISO Latin #1, but it's not ASCII. Limiting the filesystem namespace to ASCII codes doesn't guarantee that all file names will be printable, and guarantees that some names that are printable on some machines are disallowed.
wesommer@athena.mit.edu (William E. Sommerfeld) (04/12/88)
In article <48993@sun.uucp> guy@gorodish.Sun.COM (Guy Harris) writes: >(BTW, you *can't* create files that have names with truly arbitrary bytes in >them; '/' and '\0' are not valid in UNIX file names - '/' separates *file* >names in a *path* name, and '\0' terminates a path name.) Yes, but... If you're running NFS, the NFS _server_ (at least the one we're running here) will let you put `/' in filenames, since it works at the inode & filename level, not the pathname level. To get it to do this, you have to write a user-level program which sends RPC requests directly to the NFS server. Of course, you then have to write another one to get rid of it, or resort to using clri. - Bill Sommerfeld wesommer@athena.mit.edu
guy@gorodish.Sun.COM (Guy Harris) (04/12/88)
> >(BTW, you *can't* create files that have names with truly arbitrary bytes in > >them; '/' and '\0' are not valid in UNIX file names - '/' separates *file* > >names in a *path* name, and '\0' terminates a path name.) > > Yes, but... > > If you're running NFS, the NFS _server_ (at least the one we're > running here) will let you put `/' in filenames, since it works at the > inode & filename level, not the pathname level. > > To get it to do this, you have to write a user-level program which > sends RPC requests directly to the NFS server. > > Of course, you then have to write another one to get rid of it, or > resort to using clri. That's obviously a bug, not a feature. You can't create files containing "/" by using the official UNIX mechanisms for creating files.
david@elroy.Jpl.Nasa.Gov (David Robinson) (04/12/88)
In article <49108@sun.uucp>, guy@gorodish.Sun.COM (Guy Harris) writes:
< > >(BTW, you *can't* create files that have names with truly arbitrary bytes in
< > >them; '/' and '\0' are not valid in UNIX file names - '/' separates *file*
< > >names in a *path* name, and '\0' terminates a path name.)
< >
< > If you're running NFS, the NFS _server_ (at least the one we're
< > running here) will let you put `/' in filenames, since it works at the
< > inode & filename level, not the pathname level.
< >
< That's obviously a bug, not a feature. You can't create files containing "/"
< by using the official UNIX mechanisms for creating files.
What if the NFS server is not a *Unix* machine? What if the client
is not a Unix machine? There is no NFS error to indicate an illegal
file name character!
--
David Robinson elroy!david@csvax.caltech.edu ARPA
david@elroy.jpl.nasa.gov ARPA
{cit-vax,ames}!elroy!david UUCP
Disclaimer: No one listens to me anyway!
guy@gorodish.Sun.COM (Guy Harris) (04/13/88)
> < That's obviously a bug, not a feature. You can't create files containing "/" > < by using the official UNIX mechanisms for creating files. > > What if the NFS server is not a *Unix* machine? Then if the native file system supports "/" in file names, the server should allow them. UNIX clients will obviously not be able to get at such files, unless the client code does some sort of file-name mapping, just as MS-DOS clients have to do some sort of mapping to handle file names such as "FoObAr_and_a_bunch_of_other_stuff.4.65.13", and VMS clients would presumably have to do some sort of mapping to handle file names such as "[[[[]]]]..dir", etc.. > What if the client is not a Unix machine? If the client is not a UNIX machine, and the server is, the client just has to lose or do file-name mapping if it wants to handle file names containing slashes. If the client is not a UNIX machine, and the server isn't, and the server's native file system can handle "/" in file names, you win. If it's not a UNIX system, but it can't handle "/" in file names, you lose. > There is no NFS error to indicate an illegal file name character! Well, presumably the NFS servers written for VMS have stolen some other error code to use to complain about attempts to e.g. create files with names containing characters not considered kosher in VMS (I don't remember whether ODS-2 directories contain file names in ASCII or RADIX-50; if the latter, there are characters that are not only non-kosher but not representable). Also, if 4.3BSD servers reject file names containing characters with the 8th bit set, they also have to choose some error code, since the error that the file system code returns for this is EINVAL, which has no direct NFS equivalent. It is not ideal that a server has to steal another error code for this. Future versions of the NFS protocol should probably include such an error code.
rbj@icst-cmr.arpa (Root Boy Jim) (04/15/88)
From: Barry Shein <bzs@bu-cs.BU.EDU>
> I think vendors have finally realized that 7-bit ASCII is parochial.
Yes, I agree it's bogus and interferes severely with some
internationalization schemes. I believe vendors are starting to remove
it from their 4BSD based systems.
Well, I'm probably gonna really get flamed for this, but here goes...
Um, don't you guys realize that if you implement international
character sets people are gonna start USING them? That's right, the
real threat to American security is not the commies, not Japanese
technology, not cheap Korean or Yugoslavian automobiles, it's programs
in another language. You wanna try hacking hack in Dutch? I say NO!
English is already the second language in the world. After all, they're
used to learning foreign languages and using them, we're not. We gave
away the hydrogen bomb, let's not give away the whole store.
-Barry Shein, Boston University
(Root Boy) Jim Cottrell <rbj@icst-cmr.arpa>
National Bureau of Standards
Flamer's Hotline: (301) 975-5688
The opinions expressed are solely my own
and do not reflect NBS policy or agreement
Uh-oh!! I forgot to submit to COMPULSORY URINALYSIS!
mouse@mcgill-vision.UUCP (der Mouse) (04/23/88)
In article <4540@bloom-beacon.MIT.EDU>, wesommer@athena.mit.edu (William E. Sommerfeld) writes: > In article <48993@sun.uucp> guy@gorodish.Sun.COM (Guy Harris) writes: >> (BTW, you *can't* create files that have names with truly arbitrary >> bytes in them; '/' and '\0' are not valid in UNIX file names [...].) > Yes, but... > If you're running NFS, the NFS _server_ (at least the one we're > running here) will let you put `/' in filenames, since it works at > the inode & filename level, not the pathname level. > To get it to do this, you have to write a user-level program which > sends RPC requests directly to the NFS server. ...and on a non-NFS system you can write a program which scribbles on the raw disk and creates directory entries with slashes in them. It's fairly closely analogous. And about equally useful (or, rather, equally useless). der Mouse uucp: mouse@mcgill-vision.uucp arpa: mouse@larry.mcrcim.mcgill.edu
mangler@cit-vax.Caltech.Edu (Don Speck) (04/24/88)
One of the beautiful things about the filename syntax of older Unixes is that there was no such thing as an illegal filename. Any string had the potential to be a filename, because namei did something more-or-less sensible with any pattern of slashes even when there were 0 or >14 characters between them. Quite a welcome relief from O.S's with strict punctuation rules, e.g. foovax::[000000.mydir.subdir]file.ext;32767 Alas, this changed in 4.2 BSD, and some filenames are now illegal. Now some propose to add even more restrictions. It's contagious and pretty soon we'll be back to all those punctuation rules. As the TCP people say, "be liberal in what input you accept". Don Speck speck@vlsi.caltech.edu {amdahl,ames!elroy}!cit-vax!speck
daveb@geac.UUCP (David Collier-Brown) (04/25/88)
In article <6258@cit-vax.Caltech.Edu> mangler@cit-vax.Caltech.Edu (Don Speck) writes: >One of the beautiful things about the filename syntax of older >Unixes is that there was no such thing as an illegal filename. > >As the TCP people say, "be liberal in what input you accept". The "filenames with the high bit set" problem was seen and dealt with, once upon a time, by both Multics and Unix, by permitting the mv-equivalent command to accept **any** string as a "from" name, but only a "legal" string as a "to" name. This tends to decrease the difficulty of switching to an 8- (or 9-)bit character set, as the charset-sensitive code is rather centralized. (In Unix the problem was both easier and harder: almost any character is legal in a "to" name, and the shell interferes in typing some characters directly. I confess I do not remember what happens if the "from" name contains a slash or null. My reading of the kernel implementation implies that it REALLY "can't happen": you get an invalid path to the file.) --dave (but what if a filesystem does it "wrong"?) c-b -- David Collier-Brown. {mnetor yunexus utgpu}!geac!daveb Geac Computers International Inc., | Computer Science loses its 350 Steelcase Road,Markham, Ontario, | memory (if not its mind) CANADA, L3R 1B3 (416) 475-0525 x3279 | every 6 months.