std-unix@ut-sally.UUCP (10/17/86)
From: <@SUMEX-AIM.ARPA:MRC@PANDA> (Mark Crispin) Date: Thu 16 Oct 86 23:13:06-PDT Postal-Address: 1802 Hackett Ave.; Mountain View, CA 94043-4431 Phone: +1 (415) 968-1052 I was hoping that the moderator would stay neutral in this. I encourage his subsequent neutrality. [ The moderator is neutral. A statement on editorial policy is forthcoming. If you want to discuss that issue, let's do it in that channel and not this. -mod ] It seems that the two sides in this issue boil down to this: . "gee, since we're defining a standard portable operating system that isn't necessarily the present de facto Unix, let's fix this case sensitivity cretinism" . "case sensitivity is what makes Unix better than any other operating system, and only a cretin can't understand why this is wonderful" Neither side is being very scientific. It's reminiscent of the "how many angels can dance on the head of a pin" debates. Let's start by discarding the arguments which are bogus. The most glaring of these has got to be the international compatibility argument. The only advocates of this argument seem to be pro case sensitivity Americans who have seized upon this as an argument to shore up their position without really thinking over the issue carefully. [ Perhaps you could elaborate on why X/OPEN (a group of European computer manufacterors), for instance, should not be concerned with international compatibility? -mod ] Unix does not allow arbitrary strings in filenames. Any number of "funny" characters must be within a quoted string. I can't say rm foo.bar;1 I have to say rm "foo.bar;1" Guess what. A number of foreign keyboards use those "funny" characters to be non-English glyphs. [ The *shell* interprets certain characters, causing them to have to be quoted if they are used in file names. The file system is perfectly happy to put just about anything but the slash and null characters in filenames. -mod ] I have yet to hear of any organization in Japan using kanzi or hirogana or katakana in filenames. There are good reasons for this! One is that there isn't a single way of representing written Japanese. In older terminals, the high order bit when set indicated katakana (much as DEC VT220's use the high order bit for their "international characters"). Modern Japanese terminals use the JIS (Japanese Industrial Standard) system of ESCAPE followed by two bytes to define a 14 bit character. There's a minor portability problem with all those escape characters (which, of course, must be displayed in image form). [ Perhaps someone from Japan could reply? -mod ] Some German keyboards use various 7-bit glyphs (I believe "@" is umlaut-a) for their umlauts and ess-tset. Or, there's the VT220 system. I just tried creating a file called Goethestrasse (using umlaut-o for "oe" and ess-tset for "ss") on my local Unix system using my VT220 clone. It made "GVthestra_e", the 7-bit form. Dare I mention that in German, only nouns (and the first word in a sentence) are capitalized? [ What is the "it" that made it? How did you create it? What were the character codes you tried to use for the characters? Don't forget the capitalization of sharp s problem: it's one character in lower case but two (SS) in upper case. -mod ] The point is that Unix does *not* support international character sets in filenames. It supports 7-bit USASCII. So let's leave that issue to rest. [ There is strong interest on the part of a number of UNIX vendors and users to make UNIX support international character sets in a number of areas. Anything that ties the filesystem or anything else in the system to USASCII is not a step in a direction they want. The file system at the moment supports uninterpreted strings of bytes as file names, with the exception of /, null character, names consisting solely of . and .., and those systems that too helpfully zero the high order bit (4.3BSD, if I'm not mistaken). -mod ] I haven't yet heard of any serious use of full 8-bit bytes for filenames on any other operating system, which, if you are serious about supporting international character sets, you must do. There's this small problem of getting 8-bit (as opposed to 7-bit) ASCII through various pieces of hardware and networks which think that the high order bit is parity... Can we now leave that particular argument to rest? [ The Japanese seem to manage the eight bit problem on UNIX, as modified both there and by AT&T to support the Japanese language. Most anyone who uses EMACS has the same problem, and many of them seem to manage. -mod ] Nobody has really answered the criticism that case sensitivity is poor human engineering. [ See Barry Shein's article, <6011@ut-sally.UUCP>. -mod ] Some people may disagree. The same people may also feel that single character switches are good human engineering. Well, a lot of us who haven't been Unix junkies for the past 15 years seem to feel otherwise. The fact that there is a controversy over the human engineering aspects of a facility should suffice to indicate that there is a problem! Let's discuss these classes of issues. [ If you have a survey that shows that most people find case insensitivity easier to deal with, please present it. If not, please refrain from proof by character assassination. -mod ] -- Mark -- ------- Volume-Number: Volume 7, Number 66
std-unix@ut-sally.UUCP (Moderator, John Quarterman) (10/26/86)
From: seismo!bellcore!jgs (Jeff Smits) Date: Tue, 21 Oct 86 16:32:57 edt One of the nice things about the UNIX system, is that the operating system doesn't try to define precise semantics of the use of its facilities. "A path-name is a null-terminated character string starting with an optional slash (/), followed by zero or more directory names separated by slashes, optional followed by a file-name." (From the System III reference manual) That is all the semantics attached to the concept of a path-name. By leaving the semantics simple, it makes it easy to support file/path-names with international characters in them. UNIX Pacific offers a source product called JAE 1.0 (Japanese Application Environment) based on System V, Release 2.1.0. Included in its features is support for Japanese file-names as documented in the Future Directions section of SVID Issue 2. The basic concept is that the US ASCII code-set is always present contained in the code-set range 001-0177(octal). The range above 0177 (0x200-0x377 on an eight-bit machine) is reserved for international characters. No changes were needed to the core operating system to support this. Many of the utilities function correctly with these code-sets. This is all because there were no additional semantics attached to the meaning of a character in a path-name. It is the terminal driver's responsibility to convert the data received from an international terminal into this internal representation. The important point is that the operating system has no knowledge of the code-set the path-names are written in. The only assumption made about a path-name is that the '/' character separates components in a path, and the NULL character terminates the path. If the standard changed to support case translation, it would be building an code-set bias into an operating system implementation. It would be difficult to support the variety of code-sets with that type of conversion being done in the operating system. Due to international considerations and the fact that current practice (both System V and BSD) support case sensitive file-names, I think the current P1003 draft is correct with respect to case sensitivity. Jeff Smits AT&T Information Systems ..!attunix!jgs (201)-522-6263 190 River Rd. Summit, NJ 07901 Volume-Number: Volume 7, Number 86