[mod.std.unix] case sensitivity

std-unix@ut-sally.UUCP (10/17/86)

From: <@SUMEX-AIM.ARPA:MRC@PANDA> (Mark Crispin)
Date: Thu 16 Oct 86 23:13:06-PDT
Postal-Address: 1802 Hackett Ave.; Mountain View, CA  94043-4431
Phone: +1 (415) 968-1052

     I was hoping that the moderator would stay neutral in this.
I encourage his subsequent neutrality.

[ The moderator is neutral.  A statement on editorial policy is
forthcoming.  If you want to discuss that issue, let's do it in
that channel and not this.  -mod ]

     It seems that the two sides in this issue boil down to this:
. "gee, since we're defining a standard portable operating system
  that isn't necessarily the present de facto Unix, let's fix
  this case sensitivity cretinism"
. "case sensitivity is what makes Unix better than any other
  operating system, and only a cretin can't understand why this
  is wonderful"

     Neither side is being very scientific.  It's reminiscent of
the "how many angels can dance on the head of a pin" debates.

     Let's start by discarding the arguments which are bogus.
The most glaring of these has got to be the international
compatibility argument.  The only advocates of this argument seem
to be pro case sensitivity Americans who have seized upon this as
an argument to shore up their position without really thinking
over the issue carefully.

[ Perhaps you could elaborate on why X/OPEN (a group of European
computer manufacterors), for instance, should not be concerned
with international compatibility?  -mod ]

     Unix does not allow arbitrary strings in filenames.  Any
number of "funny" characters must be within a quoted string.  I
can't say
	rm foo.bar;1
I have to say
	rm "foo.bar;1"
Guess what.  A number of foreign keyboards use those "funny"
characters to be non-English glyphs.

[ The *shell* interprets certain characters, causing
them to have to be quoted if they are used in file names.
The file system is perfectly happy to put just about anything
but the slash and null characters in filenames.  -mod ]

     I have yet to hear of any organization in Japan using kanzi
or hirogana or katakana in filenames.  There are good reasons for
this!  One is that there isn't a single way of representing
written Japanese.  In older terminals, the high order bit when
set indicated katakana (much as DEC VT220's use the high order
bit for their "international characters").  Modern Japanese
terminals use the JIS (Japanese Industrial Standard) system of
ESCAPE followed by two bytes to define a 14 bit character.
There's a minor portability problem with all those escape
characters (which, of course, must be displayed in image form).

[ Perhaps someone from Japan could reply?  -mod ]

     Some German keyboards use various 7-bit glyphs (I believe
"@" is umlaut-a) for their umlauts and ess-tset.  Or, there's the
VT220 system.  I just tried creating a file called Goethestrasse
(using umlaut-o for "oe" and ess-tset for "ss") on my local Unix
system using my VT220 clone.  It made "GVthestra_e", the 7-bit
form.  Dare I mention that in German, only nouns (and the first
word in a sentence) are capitalized?

[ What is the "it" that made it?  How did you create it?
What were the character codes you tried to use for the characters?
Don't forget the capitalization of sharp s problem:  it's
one character in lower case but two (SS) in upper case.  -mod ]

     The point is that Unix does *not* support international
character sets in filenames.  It supports 7-bit USASCII.  So
let's leave that issue to rest.

[ There is strong interest on the part of a number of UNIX vendors
and users to make UNIX support international character sets in
a number of areas.  Anything that ties the filesystem or anything
else in the system to USASCII is not a step in a direction they want.
The file system at the moment supports uninterpreted strings of
bytes as file names, with the exception of /, null character,
names consisting solely of . and .., and those systems that
too helpfully zero the high order bit (4.3BSD, if I'm not
mistaken).  -mod ]

     I haven't yet heard of any serious use of full 8-bit bytes
for filenames on any other operating system, which, if you are
serious about supporting international character sets, you must
do.  There's this small problem of getting 8-bit (as opposed to
7-bit) ASCII through various pieces of hardware and networks
which think that the high order bit is parity...

     Can we now leave that particular argument to rest?

[ The Japanese seem to manage the eight bit problem on UNIX,
as modified both there and by AT&T to support the Japanese
language.  Most anyone who uses EMACS has the same problem,
and many of them seem to manage.  -mod ]

     Nobody has really answered the criticism that case
sensitivity is poor human engineering.

[ See Barry Shein's article, <6011@ut-sally.UUCP>.  -mod ]

  Some people may disagree.
The same people may also feel that single character switches are
good human engineering.  Well, a lot of us who haven't been Unix
junkies for the past 15 years seem to feel otherwise.  The fact
that there is a controversy over the human engineering aspects of
a facility should suffice to indicate that there is a problem!

     Let's discuss these classes of issues.

[ If you have a survey that shows that most people find case insensitivity
easier to deal with, please present it.  If not, please refrain from
proof by character assassination.  -mod ]

-- Mark --
-------

Volume-Number: Volume 7, Number 66

std-unix@ut-sally.UUCP (Moderator, John Quarterman) (10/26/86)

From: seismo!bellcore!jgs (Jeff Smits)
Date: Tue, 21 Oct 86 16:32:57 edt

One of the nice things about the UNIX system, is that the operating system
doesn't try to define precise semantics of the  use of its facilities.
"A path-name is a null-terminated character string starting with an optional
slash (/), followed by zero or more directory names separated by slashes,
optional followed by a file-name." (From the System III reference manual)
That is all the semantics attached to the concept of a path-name.

By leaving the semantics simple, it makes it easy to support file/path-names
with international characters in them.  UNIX Pacific offers a source product
called JAE 1.0 (Japanese Application Environment) based on System V, Release
2.1.0.  Included in its features is support for Japanese file-names as
documented in the Future Directions section of SVID Issue 2.
The basic concept is that the US ASCII code-set is always present contained
in the code-set range 001-0177(octal).  The range above 0177 (0x200-0x377 on
an eight-bit machine) is reserved for international characters.  No changes
were needed to the core operating system to support this.  Many of the utilities
function correctly with these code-sets.  This is all because there were no
additional semantics attached to the meaning of a character in a path-name.

It is the terminal driver's responsibility to convert the data received from
an international terminal into this internal representation.

The important point is that the operating system has no knowledge of the
code-set the path-names are written in.  The only assumption made about a
path-name is that the '/' character separates components in a path, and
the NULL character terminates the path.

If the standard changed to support case translation, it would be building an
code-set bias into an operating system implementation.  It would be difficult
to support the variety of code-sets with that type of conversion being done
in the operating system.

Due to international considerations and the fact that current practice
(both System V and BSD) support case sensitive file-names, I think
the current P1003 draft is correct with respect to case sensitivity.


						Jeff Smits
						AT&T Information Systems
						..!attunix!jgs
						(201)-522-6263
						190 River Rd.
						Summit, NJ 07901

Volume-Number: Volume 7, Number 86