[mod.std.unix] case sensitive filenames

std-unix@ut-sally.UUCP (Guest Moderator, John B. Chambers) (10/02/86)

From cbosgd!cbosgd.ATT.COM!mark@seismo.CSS.GOV Wed Oct  1 16:55:45 1986
Date: Mon, 29 Sep 86 12:33:36 edt
From: mark@cbosgd.att.com (Mark Horton)
Message-Id: <8609291633.AA10479@cbosgd.ATT.COM>
Newsgroups: mod.std.unix
Subject: Case sensitive file names

OK, here's a new topic.  File names.

I note that the committee recently decided that all file names
in conforming systems must be case sensitive, for example,
makefile and Makefile must be different files.  (I've forgotten
where I read this, it was probably Communixations.)

I think this is a mistake.  UNIX is the only major operating system
that treats things like file names, logins, host names, and commands
as case sensitive.  The net effect of this is that users get
confused, since they have to get the capitalization right every time.
To avoid confusion, everybody always just uses lower case.  So
there are few, if any, benefits from a two-case system, and any time
anyone tries to do something that isn't pure lower case, it causes
confusion for somebody and often breaks some program.

Another problem is that emulations on other operating systems,
such as VMS or MS DOS, will become impossible without drastic
changes to their file systems.  Given the problems in the above
paragraph, plus politics as usual, I think it is unlikely that
other systems will be changed to have case sensitive file systems.
After all, it's not like it was easiest to make the VMS filesystem
case insensitive - that took extra effort on their part.

I think it's a mistake to move in the direction of requiring other
operating systems to become case sensitive.  If anything, motion in
the other direction might be of more benefit.

Note: I am NOT suggesting that UNIX should have a case insensitive
filesystem that maps everything to UPPER CASE like MS DOS.  There is
nothing wrong with mapping everything to lower case, for example.
It's also reasonable to leave the case alone, but ignore case in
comparisons.  There is also probably a good argument for keeping
it case sensitive (after all, there are probably 5 or 6 people out
there who really need both makefile and Makefile, or both mail and
Mail, for some reason that escapes me at the moment.)  But I think
it would be a mistake to require other systems to change if they
are to support a POSIX emulation on top of them.  (On the other hand,
it may be reasonable to expect other operating systems to support
more general file name lengths and character sets, rather than things
like the MS DOS 8+3 convention.  But in practice, this may be too
painful to fix.)

	Mark Horton

Volume-Number: Volume 7, Number 11

std-unix@ut-sally.UUCP (10/02/86)

From @SUMEX-AIM.ARPA:MRC@PANDA Thu Oct  2 05:09:39 1986
Date: Thu 2 Oct 86 01:59:26-PDT
From: Mark Crispin <MRC%PANDA@SUMEX-AIM.Stanford.EDU>
Subject: Re: Case sensitive file names
To: std-unix%ut-sally.UUCP@SALLY.UTEXAS.EDU
In-Reply-To: <5860@ut-sally.UUCP>
Postal-Address: 1802 Hackett Ave.; Mountain View, CA  94043-4431
Phone: +1 (415) 968-1052
Message-Id: <12243533720.7.MRC@PANDA>

I would like to add a loud "Bravo!" to Mark Horton's message!  The present
case sensitivity of the Unix filesystem is a real drag, and something that
has regularly and reliably caused me problems when working in a heterogenous
environment.  As far as I can tell, the only individuals who actually *like*
case sensitivity in Unix are the high-schoolish hackers who think it's really
cute to write programs with separate -1, -l, -I, and -L switches.

I think that the most reasonable proposal is to do a free case match on input,
so that "more foobar" is the same as "More FooBar", etc.  On output, you first
do a free case match to see if there is an extant file and if so preserve the
case of that file.  In other words, if I overwrite FooBar but specify foobar
or FOOBAR, the file is still called FooBar.  Otherwise, use whatever case the
user specifies.  Renaming would always use the case the user specifies, so the
user can rename foobar to FooBar, etc.

Now, if I can convince you guys to do this for usernames, I will take back at
least 50% of the nasty things I've ever said about Unix.  Golly gee, it would
be nice to be MRC or Crispin, not "mrc" or "crispin"...

Another way of doing it is how TOPS-20 does it.  TOPS-20's filesystem isn't
*really* case independent.  All lowercase characters are coerced into upper
case, so if I say foobar.txt it becomes FOOBAR.TXT in the actual filename.
This is both from the user interface and from the filename lookup system call.
It is, however, possible for any of the 128 ASCII characters to be in a filename,
provided that the "oddball" characters are quoted using CTRL/V.  In other words,
a FooBar.Txt file is possible on TOPS-20, but only by F<^V>o<^V>oB<^V>a<^V>r.T<^V>x<^V>t.

For once, I don't favor the TOPS-20 way of doing things.  TOPS-20's scheme is
alright if you started with case independence to begin with, but I don't think
it would fit in well into Unix, and certainly not without a major flag day.  I
hope that my suggestion above could fit in with only minimal inconvenience.

I found on TOPS-20 that no serious user used case-sensitive filenames.  Everybody
appreciated the case-insensitivity of the interface, even though it took the form
of coercing to upper case.  My experience also suggests that case sensitivity is
a pain in the a**; I tried writing a major utility in Interlisp using mixed case
function and variable names and eventually gave up when most of my errors turned
out to be case errors.  It's *so* much easier to keep the shift lock key down...

-- Mark --
-------

Volume-Number: Volume 7, Number 12

std-unix@ut-sally.UUCP (Guest Moderator, John B. Chambers) (10/03/86)

From im4u!dan@prophet.bbn.com Fri Oct  3 04:42:00 1986
Message-Id: <8610030928.AA14794@im4u.UTEXAS.EDU>
Date:     Thu, 2 Oct 86 12:43:49 EDT
From: Dan Franklin <im4u!dan@prophet.bbn.com>
To: "Guest Moderator, John B. Chambers" <std-unix%ut-sally.UUCP@im4u.UTEXAS.EDU>
Subject:  Re:  Case sensitive file names

I can see that it will be hard to emulate POSIX filenames on top of an
operating system such as MS-DOS or VMS, but the benefits of changing the
POSIX spec must be weighed against the costs.  Suppose we changed the spec
so that it permitted a POSIX implementor to provide either a
case-sensitive or case-insensitive filesystem, their choice (which I think
is what Mark is proposing).  There are three groups of people who will be
affected: those who write POSIX emulators, those who write programs for
POSIX, and those who *use* POSIX and its programs.  The last group will be
the largest and most important by far; the emulator writers will be the
smallest group.

So how would users be affected?  It might benefit them, because
case-insensitivity might really be better than case-sensitivity.  However,
in the absence of a controlled study, let's assume the null hypothesis:
that it makes no big difference.  More than "proof by assertion" is needed!

Regardless of which is really better, some users will probably benefit
because they will be used to other operating systems providing
case-insensitivity, particularly MS-DOS.

However, if we really make it an implementor's choice, users will
be hurt by the fact that each POSIX system they encounter will be
different.  In fact, this system-to-system difference will probably
cause more problems than optional case insensitivity would solve.

What about people who write POSIX programs?  They will lose.  To the extent
that POSIX permits two possible underlying filesystems, a truly portable
POSIX program will have to be prepared for either one.  For many programs
it may not matter what the FS looks like, but if it does matter, it will
mean extra work.

Finally, there are all those emulator writers.  They might find it easier;
then again, they might not.  If I were going to do an emulator on top of
MS-DOS, then (since I don't work for Microsoft) I would probably use the
existing filesystem just as a base to build the POSIX filesystem, almost
the way UNIX builds a named hierarchical filesystem space out of inodes.
Going to case insensitivity wouldn't help me a bit, because of the other
limitations Mark mentioned.  It might help Microsoft, because they could
change the 8+3 convention at the same time.  But unless they were willing
to do that, it wouldn't help them either.  VAX-VMS might be easier, but
again there are other problems I would have to solve.  Case-insensitivity
would help me some, but I'd still have a lot of work ahead of me.

But arguments regarding emulator-writing are beside the point.  No matter
what POSIX does on this, it will always be possible to write a POSIX
emulator on top of an existing operating system.  So the ease of *using*
the system must take precedence over the ease of writing it.

For the reasons above, I believe that making case-insensitivity an *option*
would be a bad idea.  Changing the spec to *insist* on case-insensitivity
might be a good idea, but it would cause enough problems w.r.t. existing
UNIX systems that it ought to be very strongly motivated.  To start with:
is it really much easier for people to use such a system?

	Dan Franklin

Volume-Number: Volume 7, Number 14

std-unix@ut-sally.UUCP (Guest Moderator, John B. Chambers) (10/03/86)

From davest%tektronix.csnet@CSNET-RELAY.ARPA Fri Oct  3 14:04:03 1986
Message-Id: <8610031825.AA25406@tektronix.TEK>
To: std-unix@SALLY.UTEXAS.EDU
Subject: Re: Case sensitive file names
Newsgroups: mod.std.unix
In-Reply-To: <5860@ut-sally.UUCP>
Organization: Tektronix, Inc., Beaverton, OR.
Date: 03 Oct 86 11:25:11 PDT (Fri)
From: "David C. Stewart" <davest%tektronix.csnet@CSNET-RELAY.ARPA>
Source-Info:  From (or Sender) name not authenticated.

In article <5860@ut-sally.UUCP> Mark Horton <mark@cbosgd.att.com> writes:
>It's also reasonable to leave the case alone, but ignore case in
>comparisons.  There is also probably a good argument for keeping
>it case sensitive (after all, there are probably 5 or 6 people out
>there who really need both makefile and Makefile, or both mail and
>Mail, for some reason that escapes me at the moment.)

	I can think of one well-used exception right away: make(1), as it
works now, will look for rules in `makefile' first, and if `Makefile'
exists in the same directory, it will not be used by make.  On the
other hand, Glenn Fowler's Fourth Generation Make [1] chooses the
opposite order of accepting default rules files, ie, it tries
`Makefile' first and, if one does not exist, it tries `makefile'.
It is claimed that this is a feature, rather than an annoyance since
Fourth Generation makefiles are incompatable with old-style makefiles.
Thus, one can maintain the old make makefile in `makefile' and the new make
makefile in `Makefile'.

	This may just be picking nits, but I think the point is that
case sensitivity in the file system is a Unix feature, like it or
not.  There may be other applications that depend on case-sensitive
file names that would become non-portable.

[1] Fowler, Glenn S., "The Fourth Generation Make", Proceedings of the
Usenix Association Summer Conference, Portland, OR, 1985.  (Note that
the actual release of nmake in the AT&T Toolchest differs in this
respect with the function described in this paper.)

--
David C. Stewart                          uucp:    tektronix!davest
Unix Systems Support Group                csnet:   davest@TEKTRONIX
Tektronix, Inc.                           phone:   (503) 627-5418

Volume-Number: Volume 7, Number 15

std-unix@ut-sally.UUCP (Guest Moderator, John B. Chambers) (10/04/86)

From sun!gorodish!guy@utastro.UUCP Fri Oct  3 15:34:59 1986
Date: Fri, 3 Oct 86 12:26:22 PDT
From: sun!gorodish!guy@utastro.UUCP (Guy Harris)
Message-Id: <8610031926.AA09026@gorodish.sun.com>
To: ut-sally!std-unix@utastro.uucp
Subject: Re: Case sensitive file names

> From: mark@cbosgd.att.com (Mark Horton)
> Subject: Case sensitive file names

> I think this is a mistake.  UNIX is the only major operating system
> that treats things like file names, logins, host names, and commands
> as case sensitive.

It's been a while since I used Multics; I think it was case-sensitive.  Of
course, I don't know whether it counts as "major" here or not; I don't know
how many sites are around.  Are you sure there are no others?

> It's also reasonable to leave the case alone, but ignore case in
> comparisons.

This would probably be the best scheme (I think the Xerox Alto's operating
system did this).  Some people may want to use mixed case in file names for
aesthetic reasons, for example.

> There is also probably a good argument for keeping it case sensitive
> (after all, there are probably 5 or 6 people out there who really need
> both makefile and Makefile...

This means UNIX probably can't change, at least not without a fair bit of
pain.  I know of at least one directory on a UNIX system that has both
"makefile" and "Makefile" in it; this would cause some upset on a
case-mapping UNIX system.

However, there is another problem with case mapping.  It's dependent on the
language the text is in!  Doing case mapping is all very well and good for
English-speaking users; the algorithm for mapping characters between cases
in English is straightforward.  However, in German "ss" is a single special
character in lower-case but "SS" in upper case.  Even if you don't have
anomalies like this, the current schemes proposed by AT&T for "international
UNIX" use various ISO codes; this means that the character whose hex value
is E6 is the "ae" diaresis in the ISO Latin Alphabet #1, and thus matches
the character whose hex value is C6 (which is the "AE" diaresis); however,
in the JIS C6226 Kanji set, it is probably the first byte of a two-byte
sequence representing a Kanji sysmbol, and I don't think it gets case mapped
at all.

This means that the operating system would have to know what character set a
particular character was in, so that it could map its case correctly; this
would be best done with sequences embedded in the file name indicating
shifts in the character set to which bytes belong.  (These same sequences
should be used in text files, character strings in programs, etc..  Other
suggestions include a per-file character set designator, that would
presumably apply to any files containing character strings, including
directories; however, this means that *all* strings in that file must be in
the same character set, which is not always a reasonable restriction.)  It
would then have to know how to do case mapping for all character sets
supported by the system, and would have to be modified or have new
information supplied to it if a new character set was to be supported.

Volume-Number: Volume 7, Number 16

std-unix@ut-sally.UUCP (Guest Moderator, John B. Chambers) (10/06/86)

Date: Fri, 3 Oct 86 20:07:32 edt
From: Robert Viduya <gatech!gitpyr!robert@seismo.UUCP>
Subject: Re: Case sensitive file names

> Date: Mon, 29 Sep 86 12:33:36 edt
> From: mark@cbosgd.att.com (Mark Horton)
> Subject: Case sensitive file names

I've found a useful rule to be used in deciding cases like this is to
decide in favor of the more general and flexible.  A couple of times
I've been guilty of saying, "Well, I can't think of any good reason for
this particular feature, so I'll get rid of it", only to discover,
later on, a good reason for a feature.  I don't believe in artificial
limits mainly because the person who implements the limit generally
hasn't considered ALL possible reasons for going beyond the limit.

> I think this is a mistake.  UNIX is the only major operating system
> that treats things like file names, logins, host names, and commands
> as case sensitive.  The net effect of this is that users get
> confused, since they have to get the capitalization right every time.
> To avoid confusion, everybody always just uses lower case.  So
> there are few, if any, benefits from a two-case system, and any time
> anyone tries to do something that isn't pure lower case, it causes
> confusion for somebody and often breaks some program.

It isn't difficult to explain Unix's case-sensitivity to a user and,
once explained, the case-sensitivity tends to be one of the few things
a user remembers without having to be reminded.  What confusion may be
caused by case-sensitivity is lost in the much greater confusion caused
by trying to learn a new operating system.

> Another problem is that emulations on other operating systems,
> such as VMS or MS DOS, will become impossible without drastic
> changes to their file systems.  Given the problems in the above
> paragraph, plus politics as usual, I think it is unlikely that
> other systems will be changed to have case sensitive file systems.
> After all, it's not like it was easiest to make the VMS filesystem
> case insensitive - that took extra effort on their part.

But, on the other hand, adopting a VMS or MS-DOS filesystem to coexist
with Unix in a Unix environment would be trivial as far as filenames
are concerned.  The fact that Unix allows *any* ascii character in it's
filenames (except for the path seperator, '/', and the string
terminator, NUL), makes it almost ideal for adopting other, foreign
filesystems to it because most of the special graphic characters (!, @, #,
$, and etc..) can already be represented in a filename without having to
be mapped to something else (unlike other, more restrictive, operating
systems).


				robert

---
Robert Viduya					     robert@pyr.ocs.gatech.edu

Office of Computing Services					(404) 894-4660
Georgia Institute of Technology
Atlanta, Georgia	30332

Volume-Number: Volume 7, Number 17

std-unix@ut-sally.UUCP (Guest Moderator, John B. Chambers) (10/06/86)

Date: Fri, 3 Oct 86 23:56:26 edt
From: mark@cbosgd.att.com (Mark Horton)
Subject: Re:  Case sensitive file names

>Finally, there are all those emulator writers.  They might find it easier;
>then again, they might not.  If I were going to do an emulator on top of
>MS-DOS, then (since I don't work for Microsoft) I would probably use the
>existing filesystem just as a base to build the POSIX filesystem, almost
>the way UNIX builds a named hierarchical filesystem space out of inodes.
>Going to case insensitivity wouldn't help me a bit, because of the other
>limitations Mark mentioned.  It might help Microsoft, because they could
>change the 8+3 convention at the same time.  But unless they were willing
>to do that, it wouldn't help them either.  VAX-VMS might be easier, but
>again there are other problems I would have to solve.  Case-insensitivity
>would help me some, but I'd still have a lot of work ahead of me.

I'm not concerned very much about the amount of work the emulator
writer has to do, but I am concerned about the quality of the
resulting emulation.  If I'm a user of an emulator which is written
on an otherwise-reasonable case insensitive filesystem (VMS comes
to mind) which emulates case sensitivity, then apparent POSIX filenames
will bear little resemblance to real native filenames.  Either there's
an external table somewhere not unlike the UNIX directory/inode # tables,
or else file names are somehow encoded into longer native filenames.
I'm living with the latter kind of system now (Sun's PC/NFS, which makes
UNIX filesystems look like DOS filesystems) and the contortions it has
to go through to fit ordinary UNIX file names into DOS filenames are
a serious inconvenience.  The former kind of system makes it impossible
to access native files from inside the POSIX environment, unless someone
is awfully clever.

On the other hand, if case insensitive is an option for the emulator,
then two possibilities occur: (1) the vendor of the native operating
system can otherwise upgrade their filesystem to allow a clean POSIX
implementation (maybe they will arrange that their native OS conforms
directly to POSIX; wouldn't you consider it strongly if the market
starts to demand POSIX compatibility?) and (2) True UNIX systems have
the option to evolve to case insensitive, should a study be done and
the world conclude that insensitive is better.

I agree that a study should be done; I have my own intuitive feelings
on the subject, and there is quite a collection of operating systems
out there that went to extra work to be case insensitive, they can't
all be wrong, can they?  But by all means, this would make a great
human factors study for somebody.

	Mark

Volume-Number: Volume 7, Number 18

std-unix@ut-sally.UUCP (Guest Moderator, John B. Chambers) (10/06/86)

Date: Sat, 4 Oct 86 04:19:12 CDT
From: dutoit!dmr@research.UUCP
Subject:  Case sensitive file names

The suggestion that POSIX be required (worse, permitted) to conflate
cases in file names is utterly loony.  We have enough portability
problems already in reconciling System V with 4.x without trying to
make Unix compatible with MS-DOS.

It is granted that Stu Feldman committed a rare lapse of taste in
accepting both `makefile' and `Makefile' (thus dooming everyone to
typing `cat ?akefile') and that Fowler apparently compounded the
distinction to the point of felony by encouraging both kinds of
?akefiles to exist and have different meanings.

Nevertheless, neither the possibility of silliness in choosing file
name conventions nor the dubious advantages of permitting Unix to be
embedded in other systems are relevant; what is important is that such
a subtle yet central change would be certain to make transport of
programs and of files more onerous.  This is not a wise thing for an
endeavor devoted to promoting portability.

	Dennis Ritchie

Volume-Number: Volume 7, Number 19

std-unix@ut-sally.UUCP (Guest Moderator, John B. Chambers) (10/06/86)

Date: Sat, 4 Oct 86 16:54:37 PDT
From: hoptoad!gnu@lll-crg.ARPA (John Gilmore)
Subject: Re: Case sensitive file names

> From: mark@cbosgd.att.com (Mark Horton)
> Another problem is that emulations on other operating systems,
> such as VMS or MS DOS, will become impossible without drastic
> changes to their file systems.

I think we should eliminate the hierarchical file system too (-:).
After all, VM/370 doesn't use it, nor does CP/M.  It would be too hard
to emulate.  (Thank Bog that MSDOS and the Mac added the feature, and
that Atari and Amiga started that way, or somebody might actually take
me seriously!)  We could consider getting rid of devices-as-files, though --
there's an idea that none of those people have picked up :-).

> After all, it's not like it was easiest to make the VMS filesystem
> case insensitive - that took extra effort on their part.

Their feeling it was worth the work for VMS doesn't make it right for Unix.

> I think it's a mistake to move in the direction of requiring other
> operating systems to become case sensitive.

Nobody is requiring anything of any other operating system.  We're
defining a *new* operating system here.

My impression was that the "new operating system" was supposed to look
very much like the set of features-in-common to the various Unix operating
systems.  If we are trying to standardize an environment that will
run under other operating systems, somebody better tell us quick.
I thought the "Portable Operating System" stuff was just a legalese hack
because we can't use the trademarked name "Unix".  Was I wrong?

>                                                        But I think
> it would be a mistake to require other systems to change if they
> are to support a POSIX emulation on top of them.  (On the other hand,
> it may be reasonable to expect other operating systems to support
> more general file name lengths and character sets, rather than things
> like the MS DOS 8+3 convention.  But in practice, this may be too
> painful to fix.)

Either they will implement POSIX compatability or they won't.  If we
define POSIX systems to be case insensitive, MSDOS would not qualify
anyway, since you can't use an arbitrary 14-character file name.  VMS
would have problems with files whose names contained [, ], or colon,
etc.  So they will have to provide some form of file name translation,
and they should handle the case issue at the same time they handle the
length and allowable character set issues.

Volume-Number: Volume 7, Number 20

std-unix@ut-sally.UUCP (Guest Moderator, John B. Chambers) (10/06/86)

From: axiom!drilex!dricej@harvard.UUCP
Date: Mon, 6 Oct 86 10:24:22 edt
Subject: Re:  Case sensitive file names

I fully support Mark Horton's points about making case-insensitivity
optional in POSIX.  The fact remains that case-sensitivity in file names
is a Unix parochialism, and not a very good one, at that.  I've found that
case-sensitivity is not hard to teach, just hard to get along with.  I am
in a situation that is not unusual these days--I use several operating
systems each day (Unix, MS-DOS, VM/CMS, Burroughs MCP).  To remember the
peculiarities of each one is difficult--and case-sensitivity in file names
(and switches) is such a peculiarity.  The uppercase-lowercase system just
wasn't designed to convey that much information (in English)!  Look at
e. e. cummings!
---
Craig Jackson
UUCP: {harvard,linus}!axiom!drilex!dricej
BIX:  cjackson

Volume-Number: Volume 7, Number 22

std-unix@ut-sally.UUCP (Guest Moderator, John B. Chambers) (10/06/86)

Date:     Mon, 6 Oct 86 0:30:45 EDT
From: Bernie Cosell <cosell@prophet.bbn.com>
Subject:  Case Sensitive file names

To my view, the case folders have to make a VERY strong case that
case-sensitivity is a bad thing before we could justify BUILDING IN
that somewhat arbitrary limitation onto the operating system.  As has
been mentioned, if the filesystem is left alone, then it is easy to
envision that for certain uses, certain users *might* want to use (or
to use utilities that use...) a variant or (or layer on) stdio that
simply toupper's the filename string in the fopen call.  The users
that didn't need or want such a limitation should be free to do as
they wish.

Note that most of these other systems that are being presented as
exemplars have pretty horrible filename conventions (most punctuation
marks are not legal, certainly control chars aren't legal, the
equivalent of '..' is *built in* to the kernel, the operation of '.'
is *built in*.  I've always thought that was a crock!

From what I've heard of the arguments so far, coupled with my biases,
I'd vote to keep it case-sensitive (but then, of course, I don't have
a vote so that hardly matters... :-).
 
  /Bernie

Volume-Number: Volume 7, Number 23

std-unix@ut-sally.UUCP (Guest Moderator, John B. Chambers) (10/06/86)

Date: Mon, 6 Oct 86 09:56:50 edt
From: philabs!nyit!rick@seismo.CSS.GOV
Subject: Re: Case sensitive file names

Regarding comments by Mark Crispin <MRC%PANDA@SUMEX-AIM.Stanford.EDU>:

> I would like to add a loud "Bravo!" to Mark Horton's message!  The present
> case sensitivity of the Unix filesystem is a real drag, and something that
> has regularly and reliably caused me problems when working in a heterogenous
> environment.

What specifically is wrong with case-sensitivity?  I work on both
case-sensitive (UNIX) and other (TOPS-20) systems regularly, and
have no problems in switching between them.

>              As far as I can tell, the only individuals who actually *like*
> case sensitivity in Unix are the high-schoolish hackers who think it's really
> cute to write programs with separate -1, -l, -I, and -L switches.

And many software professionals.

> I think that the most reasonable proposal is to do a free case match on input,
> so that "more foobar" is the same as "More FooBar", etc.  On output, you first
> do a free case match to see if there is an extant file and if so preserve the
> case of that file.  In other words, if I overwrite FooBar but specify foobar
> or FOOBAR, the file is still called FooBar.  Otherwise, use whatever case the
> user specifies.  Renaming would always use the case the user specifies, so the
> user can rename foobar to FooBar, etc.

Changing the UNIX kernel to behave like this is, of course, within the
capabilities of a single programmer.  However, not all filename recognition
under UNIX occurs in the kernel, and you're going to have an
awesome task finding and rewriting all those user-mode programs that
know implicitly that filenames are case-sensitive.  The problem is
exacerbated by the fact that you're going to a more complex scheme
than what was there in the first place.

> ... a FooBar.Txt file is possible on TOPS-20, but only by
> F<^V>o<^V>oB<^V>a<^V>r.T<^V>x<^V>t.
> For once, I don't favor the TOPS-20 way of doing things.  TOPS-20's scheme is
> alright if you started with case independence to begin with, but I don't think
> it would fit in well into Unix, and certainly not without a major flag day.  I
> hope that my suggestion above could fit in with only minimal inconvenience.

It could fit in *part of the way* with minimal inconvenience.

> I found on TOPS-20 that no serious user used case-sensitive filenames.

You've got the cart before the horse.  No serious TOPS-20 users used
case-sensitive filenames because of the inconvenience in entering
filenames with embedded lowercase characters.  On output, filenames
with embedded ^V characters are aesthetically unpleasant as well.

> Everybody
> appreciated the case-insensitivity of the interface, even though it took the form
> of coercing to upper case.

You can replace the word "appreciated" with the words "became accustomed to".
Also, this argument fails on the grounds that it's hard to get people
to vote rationally on a subject that involves a decision to change
from something they're comfortable with.

>                         My experience also suggests that case sensitivity is
> a pain in the a**; I tried writing a major utility in Interlisp using mixed case
> function and variable names and eventually gave up when most of my errors turned
> out to be case errors.

How about keeping all variable names in one case?

>                     It's *so* much easier to keep the shift lock key down...
> 
> -- Mark --

It's just as easy to leave the shift lock key up.  Many typists do.


The issue of case-sensitivity is a subjective one, thus you'll always
find many vehement proponents on both sides of the fence.  At this
point in the development of UNIX, such a fundamental change in the
behavior of the OS would receive at best only partial acceptance among
the myriad UNIX implementations, leading to even more divergence.
This effect diametrically opposes the purpose of a standard.

-----
Rick Ace
Computer Graphics Laboratory
New York Institute of Technology
Old Westbury, NY  11568
(516) 686-7644

{decvax,seismo}!philabs!nyit!rick


Volume-Number: Volume 7, Number 25

std-unix@ut-sally.UUCP (Moderator, John Quarterman) (10/07/86)

From: seismo!hadron!jsdy@sally.utexas.edu (Joseph S. D. Yao)
To: ut-sally!std-unix@sally.utexas.edu
Date: Sun, 5 Oct 86 11:52:40 edt
Summary: Case sensitivity is useful; harms only those not used to it.
Organization: Hadron, Inc., Fairfax, VA

In <5860@ut-sally.UUCP>, mark@cbosgd.att.com (Mark Horton) writes:
> Message-Id: <8609291633.AA10479@cbosgd.ATT.COM>
> Newsgroups: mod.std.unix
> 
> I note that the committee recently decided that all file names
> in conforming systems must be case sensitive, for example,
> makefile and Makefile must be different files.  ...
> I think this is a mistake.  UNIX is the only major operating system
> that treats things like file names, logins, host names, and commands
> as case sensitive.  The net effect of this is that users get
> confused, since they have to get the capitalization right every time.

Since this is primarily an opinion, I'll say that I think any such
"confusion" is a product of someone getting wedded to odd ways of
doing things in a single-case environment, and not really learning
their own language.  Only the followers of the late great e. e.
cummings have any problem with "normal" use of different cases.
(Yes, German does it differently.  Fine!  AS LONG AS THERE IS A
STANDARD CONVENTION, I am willing to let Nouns be Uppercasen.)
I use both cases for reasons and, now that I have been weaned away
(for years!) from single-case environments, I find them very limiting.
After all, we DO have two cases here, and they are separate characters
that can be used separately.  Not to mention that UC-lc conversion
is only easy in the USASCII standard -- ISO and other conversions may
be quite difficult.

The sole time I like case independence is on the occasional text
search (often because some @#$% case-independent language allowed
a whimsical program to vary case without care).  Vi/ex's ":set ic"
mode works well for this, but I wish there were an "ignorecase"
flag to the grep family.  (-ic:ascii / -ic:deutsche / ... ?)
[ There is:  "grep -i"  -mod ]

(Anecdote: UPPER CASE ONLY is a product of the original TTYs' design.
A study had said that  l o w e r  case was easier to read!  but it
was decided to be UC-only, when a Board member asked the president
whether he wanted to be responsible when the name of God came over the
wires ... in lower case ...)

The emulation argument,
> Another problem is that emulations on other operating systems,
> such as VMS or MS DOS, will become impossible without drastic
> changes to their file systems.
almost swayed me, except that this is not an emulation document,
this is an OS document!
[ It's neither:  it's an interface document.  -mod ]
  And I remembered that it's quite possible
to provide a "flexnames"-type of mapping: RATFOR does something
similar.  Perhaps POSIX might wish to add a codicil, regarding
emulations ("hosted" implementations?), that gives some relaxation
and some requirements for minimum performance.  Perhaps they do
not want to relax their standards for emulations at all.  Their
privilege (considering that the Committee includes many vendors).
[ Hosted systems have been considered in excruciating detail
in writing the standard.  -mod ]

In article <5865@ut-sally.UUCP>, MRC%PANDA@SUMEX-AIM (Mark Crispin) writes:
>case sensitivity of the Unix filesystem is a real drag, and something that
>has regularly and reliably caused me problems when working in a heterogenous
>environment.

See above.

There follow several comments on the use of mixed case.  OF COURSE
people won't use mixed case when the operating system stands in the
way of using it comfortably!  And if hackers aren't taught better
than to mix 1, I, L, O, and 0 in their codes (as a certain major
stinker of a company does -- using them EXCLUSIVELY -- in software
released with an alleged source-code license!), then people should
undertake to educate them ... and their alleged educators.

When I name a file FooBar, I better well come back and find it named
FooBar ... NOT FOOBAR or foobar or (God help us) FoObAr.

>		    ...  It's *so* much easier to keep the shift lock key down...

I HATE it when people do this to my terminal ... and leave it
that way ...
-- 

	Joe Yao		hadron!jsdy@seismo.{CSS.GOV,ARPA,UUCP}
			jsdy@hadron.COM (not yet domainised)

Volume-Number: Volume 7, Number 26

std-unix@ut-sally.UUCP (Moderator, John Quarterman) (10/07/86)

The discussion has been interesting and has brought up some topics,
such as what case insensitivity means in non-English languages, that
many of the readers were evidently unaware of.  However, it's getting
a bit out of hand.

IEEE P1003.1 is interested in promoting portability of applications
by defining a UNIX-like operating system interface.  Any major change
from a feature of *every* variant of UN*X, such as case-sensitive
file names (really, filenames as uninterpreted byte strings), needs
major justification before being considered.  So further assertions
of the form "I want it because I like it" are not of interest.  It
would be most interesting to see the results of a survey on user
reaction to case sensitivity or insensitivity, but this newsgroup
isn't the place to conduct such a survey, and it's not clear that
the results would be relevant to 1003.1 anyway (what does case
mean in Japanese or Finnish)?

So, unless you've got something new to say on this subject, please
let's go on to something else.

Volume-Number: Volume 7, Number 27

std-unix@ut-sally.UUCP (Moderator, John Quarterman) (10/09/86)

From: caip!uw-beaver!geops!uw-atm!james@seismo.css.gov (James M Synge)
Date: Wed, 8 Oct 86 09:17:04 pdt

Just a note on usefulness:  I've used two machines (Xerox XDE and Amiga) where
the case of a filename is preserved, but not used for comparisons.  This means
I can create a file called READ_ME, and be sure that it stands out (to some
extent) in a directory listing where most filenames are lower case or mixed
case.  This feature is a nice convenience.  Not essential, but nice.

I find it irritating to find filenames like makefile and Makefile in the
same directory, because I must then try to remember the searching scheme used
by make.  There are similar problems with mail and Mail.

None of this is to say it SHOULD be done one way or the other.  I simply want
it kept in mind that people must use these systems, and they have preferences
based on such things as levels of irritability and ease of use; not because
something is "right".
---
---------------------------------------------------------------------------
James M Synge, Department of Atmospheric Sciences, University of Washington
VOX: 1 206 543 0308 (Work)   1 206 455 2025 (Home)
UUCP: uw-beaver!geops!uw-atm!james
ARPA: geops!uw-atm!james@beaver.cs.washington.edu

Volume-Number: Volume 7, Number 36

std-unix@ut-sally.UUCP (Moderator, John Quarterman) (10/09/86)

From: seismo!nbs-amrf!libes@sally.utexas.edu (Don Libes)
Date: Wed, 8 Oct 86 19:54:05 EDT

I write programs for both case-sensitive (CS) and case-insensitive
(CI) systems.  As an applications programmer, I prefer case-sensitivity.

Why?  Because my code on the CI system is full of calls to upper(),
lower(), isupper() and islower(), while the CS programs don't have
any of that.  On the CS system, case is important - it would be a
mistake to map it either way.

On the other hand, take the CI system.  If I have a user-supplied
filename, depending upon the system I may have to case-map it before
calling open.  But suppose I'm reading a directory and I want to
match the filename against the entries.  Now, I definitely have to
case-map it before doing a string comparison.  Unless you want to
supply me with a filecmp() which is just a case-map wrapped around
a strcmp().  Seems silly.

Now you may think, I'm getting annoyed over one little case-map,
but as MRC points out, OSs tend to go about this in a big way.  For
example, VMS has case-insensitive filenames, logical names, device
names, usernames, symbols, etc.  Everytime I deal with an object,
the first thing I have to do is start worrying about case.
Depending upon the utility, library, language, etc I'm working with
I then have to start thinking if their interfaces are
case-sensitive or not.  I find all of this quite annoying.

That is why, as an application programmer, I much prefer case-sensitivity.

Please don't tell me I am insensitive to users.  I am not about to
argue here whether or not users have the intelligence to hold down
the shift key at the appropriate times.

As far as m/Mail, m/Makefile goes, the problem is not that users
find them easily confused.  That should've been obvious to the
genius who reused the name.  If you want, I can easily choose
filenames that you will find confusing, even in the same case.

As far as emulator's go, I daily use Eunice, which faces this very
problem of handling case-sensitive file names in a case-insensitive
environment.  As far as case-mapping, their solution is very
elegant.  (No other claims about the elegance of Eunice are
proffered here.)  I.e.  UNIX programs see a case-sensitive
filesystem.  Further, they are also allowed arbitrary characters in
a filename, outside the legal VMS character set.)

Don Libes   {seismo,umcp-cs}!nbs-amrf!libes

Volume-Number: Volume 7, Number 37

std-unix@ut-sally.UUCP (Moderator, John Quarterman) (10/09/86)

From: seismo!mnetor!spectrix!clewis (Chris Lewis)
Date: Wed Oct  8 11:00:33 1986
Organization: Spectrix Microsystems Inc., Toronto, Ontario, Canada

Mod,

I'll leave this to your judgement whether to post this or not...
[ Judgement?  Who, me?  -mod ]

I'd rather everybody be very careful about making global statements like
"all other systems are case insensitive".  Many of the examples given
so far, eg: CP/M, VM/CMS *are* case sensitive.   When dealing with these 
O/S's right down at the "system call level" (if you could call it that), 
they *do* respect case.  The upper-casing is done in the command 
interpreters (CCP, EXEC1, EXEC2, optionally in REX), and in the utilities.  
[Most of the time it's damn difficult to get lower case into a VM/CMS 
system in any way].  [That comment was not by the moderator -mod]
However, down deep (eg: CP/M BDOS, VM/CMS FSOPEN/FSREAD) 
these systems will create files with mixed case and respect case in file 
name searches.  One of my CP/M floppies still has a lower case named 
file on it because Microsoft basic isn't smart enough to upper case 
file names, and I haven't gotten around to writing the assembler code 
to delete it.  One of the favorite CCP hacks is to zap the upper-case
command line code.

Yes, there are some systems that are truly case insensitive - Honeywell
GCOS comes to mind - it keeps its file names in BCD!

Further, I wonder whether any sort of conformance would help - every
system differs so much from each other, that case [in]sensitivity is
a very minor part.  Eg: 18 character 3 blank separated part file names 
in CMS (gack ptui!) etc., etc., etc....  When writing an emulator you'll
almost always have to write your own filesystem handler anyways, with
specific escapes for "native mode" files.

Mind you, it would be nice to have file transfer utilities (eg: tar)
warn you that the file names you are putting on your tape may not be
unique/representable on a non-UNIX target.  Eg: warn when two files
on the tape differ only in case and when two files have the same name
within the first 8 chars.

Chris Lewis
UUCP: {utzoo|utcs|yetti|genat|seismo}!mnetor!spectrix!clewis
Phone: (416)-474-1955

Volume-Number: Volume 7, Number 38

std-unix@ut-sally.UUCP (Moderator, John Quarterman) (10/09/86)

From: seismo!hpscda!hpdsd!hpda!hpisoa1!davel (Dave Lennert)
Date: Mon, 6 Oct 86 15:24:20 pdt

Perhaps a good approach might be to require POSIX applications to specify
filenames to the system as all lowercase in order to be truly portable.
As long as the filenames don't contain *mixed* case then case shifting to
all lower or all uppper (or leaving case insensitive) on the part of the
system won't result in name collisions.

A variant on this is to require all hardcoded filenames in applications
to be lowercase, but filenames supplied by a user could be passed to the
system unshifted by the application.  The user should know the case
limitations of the underlying system.

However, my preference is to have the POSIX interface support cases
sensitive filenames.  I agree with others that this will not be hard
to implement given some of the other things that will have to be
provided.

Volume-Number: Volume 7, Number 39

std-unix@ut-sally.UUCP (Moderator, John Quarterman) (10/10/86)

From: seismo!philabs!phri!cooper!cooper!chris (Chris Lent)
Date: Tue, 7 Oct 86 19:26:42 edt

Just wondering,
	Why not set up a few functions to determine how the heck
each operating system handles filenames?

	For case sensitivity how about something like:
		isfsense() 
	which could be a macro to a constant or a function.

	Or better how about:
		isflegal(fname)
		char *fname;
	which would tell you if the operating system approves of your file
	name? Of course this could be done through existing functions
by opening the file, but this way COULD be implemented to reduce
file access overhead.

	But I think a good solution would be to follow FORTRAN-77's
example with the inquire statement which can get back the fully expanded
filename of an already open unit-number (file descriptor) or a
closed file.  I've found all F-77's I've tried to give back the
full pathnames on files.

	But I think that a minimum allowable character set for
filenames might be sufficient.  That is 'A-Z0-9.' would be fine
for most users I've run into.

Well that's about it
Chris Lent
ihnp4!philabs!phri!cooper!chris

Volume-Number: Volume 7, Number 44

std-unix@ut-sally.UUCP (Moderator, John Quarterman) (10/14/86)

From: mcvax!axis!philip (Philip Peake)
Organization: Axis Digital, 135 rue d'Aguesseau, Boulogne, 92100, FRANCE

>OK, here's a new topic.  File names.
>
>UNIX is the only major operating system
>that treats things like file names, logins, host names, and commands
>as case sensitive.  The net effect of this is that users get
>confused, since they have to get the capitalization right every time.

This is mainly because such users move from restrictive environments
where they are forced to use a single case. if you look at the problems
of *NEW* users - those not having been crippled by having already worked
in a single case environment, the natural method of working is in
two cases. I have never found anyone incapable of understanding that
upper and lowwer case letters are different.

>To avoid confusion, everybody always just uses lower case.

Maybe you do, but, there are many people who don't.

>there are few, if any, benefits from a two-case system, and any time
>anyone tries to do something that isn't pure lower case, it causes
>confusion for somebody and often breaks some program.

This is mainly bad software engineering. Taken to its logical conclusion
one could say that letting users get at programs often breaks them
(the programs that it, (usually)) so let's ban users.

>Another problem is that emulations on other operating systems,
>such as VMS or MS DOS, will become impossible without drastic
>changes to their file systems.  Given the problems in the above
>paragraph, plus politics as usual, I think it is unlikely that
>other systems will be changed to have case sensitive file systems.
>After all, it's not like it was easiest to make the VMS filesystem
>case insensitive - that took extra effort on their part.

It seems to me that this extra effort was needed to circumvent the
extra effort needed in making their system work correctly with
all the legal ascii characters - it was designed by a team of
people who had been mentaly crippled by using such a one-case
system.

>I think it's a mistake to move in the direction of requiring other
>operating systems to become case sensitive.  If anything, motion in
>the other direction might be of more benefit.

This seems like a retrograde step.

>Note: I am NOT suggesting that UNIX should have a case insensitive
>filesystem that maps everything to UPPER CASE like MS DOS.  There is
>nothing wrong with mapping everything to lower case, for example.
>It's also reasonable to leave the case alone, but ignore case in
>comparisons.  There is also probably a good argument for keeping
>it case sensitive (after all, there are probably 5 or 6 people out
>there who really need both makefile and Makefile, or both mail and
>Mail, for some reason that escapes me at the moment.)

Here we have a typing error, I think that you really meant 5*10^4 or
6*10^4, didn't you Mark ?

This seems to be a logical extention to the ridiculous proposal
for command names and options which came from Bell Labs. some time
ago - all lower case, single letter options etc.

If you want to use upper and lowwer case for login names, it is a simple
matter to re-write login to be case insensitive.

If you want the same for file name handling in the shell, again it is
fairly simple to add a test for some environment variable, which
would force upper-lower case equivalence. Exactly what happens then
if you have both Makefile and makefile (which is another case of bad
software enginering - that make accepts both) you get both files,
or maybe an error. That's your problem, but I want to keep the ability
to use both cases.

In a more general case, are you suggesting that UNIX is going to be
forever tied to ASCII - what about internationalisation issues - how
do you handle non-english alphabets where case may be CRITICALLY
important.

I would propose that the current scheme is a good one - allow file names
to be composed of any characters in the base character set.

Philip Peake

Volume-Number: Volume 7, Number 56

std-unix@ut-sally.UUCP (Moderator, John Quarterman) (10/16/86)

[ *sigh*  Below you will find two examples of proof by emotion,
one for case sensitivity, one for case insensitivity.  Now that
we have one on each side together like this, how about let's
either use facts and arguments or go on to another subject?

Below the second example there is a somewhat new point, marked
by another interjection from the moderator.  -mod ]

From: seismo!mcvax!gec-mi-at.co.uk!adam
Date: Thu, 16 Oct 86 09:29:20 -0100
Organization: Marconi Instruments Ltd., St. Albans, Herts, UK

>I would like to add a loud "Bravo!" to Mark Horton's message!  The present
>case sensitivity of the Unix filesystem is a real drag....

No NO nO NO nO No no! Case sensitivity is a bonus. If you can't handle it,
it's your problem. I've worked with both case-sensitive, -preserving and
-insensitive systems, and I prefer them in that order.

       -Adam.

From: pyramid!lll-crg!nike!ucbcad!ucbvax!excelan!donp (Don Provan)
Date: Wed, 15 Oct 86 09:58:48 pdt

This is a good example of why people coming from other operating
systems so often dislike UNIX.  Two people pointed out what is
clearly a bug in UNIX which particularly upsets them.  Many people
responded that it was a feature.  Hrumph!

[ Below is the new point.  -mod ]

If you're so concerned about correctly handling of foreign languages,
why don't you start by handling English correctly?  In English,
"Make" and "make" are considered identical.  Capitalization rarely
has an effect on meaning.  Yet in UNIX, "Makefile" and "makefile" are
two different files with different "meanings".  Where are your *NEW*
users that are going to understand this sudden departure from a rule
of their native tongue?

[ The point is wrong.  Capitalization is significant in English:
internet and Internet do not have the same meaning, nor do john and
John (for readers outside the States, perhaps I should point out that
john with no capital refers to a toilet).  The distinction applies
not only to proper names but also in Emphasis and in syntax at the
beginning of sentences.  -mod ]

I am not sufficiently versed in foreign languages to understand the
issues concerning capitalization there.  It sounds like in some cases
the rules of what letters are equivalent (such as "A" and "a" in
English) might require tailoring.  If you're going to support foreign
languages in a meaningful way, i assume you're going to make lots of
other modifications, too.  For example, "Makefile" would need to have
a different name, right?  (I suppose the UNIX utilities themselves
already have names far enough removed from English so that they're no
problem.  What *does* "ls" stand for, anyway?)

[ As a moderately good reader of French and Spanish, I believe I can
state that the same sort of capitalization conventions exist in them as
in English, but with different details as to when capitalizaition is
appropriate.  The lexical details also differ:  the capital of ll (a single
letter in Spanish) is usually Ll, except when it's LL; in French, whether
an e with an acute accent still has an accent in its capital E form
depends on whether you're in France, Belgium, Quebec, Louisiana, etc.

I understand Greek is an interesting language:  there are several kinds
of lower case forms of some letters, to be used in different places in
a word (beginning, middle, end).  Similar distinctions exist in Arabic.

And, as several people have pointed out, case isn't meaningful in
Chinese, Korean, or Japanese kanji.  Also, the number of bytes used to
encode a character changes with the language, and multiple languages
should be supportable on the same system (in Japan, they commonly use
English, Japanese in romanji, and Japanese in Kanji; in Scandinavian
countries I suspect they have a lot of English interspersed with the
national language in technical literature).

In most European countries, UNIX command names are used unchanged,
and Makefile does not in fact have a different name.  Would some
Europeans care to comment?
-mod ]

Having done a lot of case insensitive work, i've always felt that the
UNIX case sensitivity was from laziness.  If i were to be charitable,
i might go so far as to call it a shortcut.

[ See Doug Gwyn's previous article for a good explanation of why file
names are case sensitive (or, rather, byte streams uninterpreted by the
kernel) in UNIX (see Barry Shein's article for a good explanation of why
some other systems are case insensitive).  In places where there was a
reason for case insensitivity (e.g., to match mail standards), it has
been done.  -mod ]

  But it's ridiculous to
say it makes more sense or it makes UNIX easier for new users or it
allows UNIX to support foreign languages.

[ "Ridiculous" is not an argument.  -mod ]

						don provan

Volume-Number: Volume 7, Number 62

std-unix@ut-sally.UUCP (10/17/86)

From: cbosgd!cbosgd.ATT.COM!mark@ucbvax.berkeley.edu (Mark Horton)
Date: Fri, 17 Oct 86 11:20:32 edt
Organization: AT&T Medical Information Systems, Columbus

Don Provan raises some interesting questions about foreign languages.
In general, I think we know how to do a case insensitive comparison
appropriately, by extending a function (I think it's called strcoll,
but I don't have my X3J11 draft handy) defined in ANSI C; the function
is like strcpy, but the destination buffer gets a translation of the
string that will collate properly when a lexicographic comparison like
strcmp is used.  If we extend this function to also translate to one
case (as appropriate) and allow each country to define its own function,
it's technically possible to ignore case.  Whether it's fast enough for
the UNIX filesystem is unclear, although this problem is not restricted
to UNIX.

I think it would be interesting to hear what other, case-insensitive
operating systems do about these issues.  What do MS DOS, or VM/CMS,
or VMS, or whatever, do with their case insensitive file names in
Europe, or Japan, or whereever?

If the answer is that file names are restricted to use the same character
set as in the USA, and that extra letters are disallowed, then we need to
know how well this is accepted by the users on other systems.  Maybe it's
good enough.  Do users in other countries often create files whose names
contain extra letters?  If they try, does the shell get in the way if their
letter happens to be "|", for example?

If the answer is that other operating systems have forced other countries
to put up with Americanisms, and that POSIX is an opportunity to break new
ground by handling other languages properly, then by all means let's do it
right.  This might require 8 bit characters in file names, for example.

Incidently, I've seen it claimed here that UNIX allows arbitrary byte
streams in file names.  Perhaps this is the intent, but in reality the
UNIX filesystem is far from a transparent path.  There are lots of
restrictions, some of which are:

	The slash character is special.
	The null character is special.
	Sequences of more than 14 chars not containing a slash are
		either illegal or only significant to 14 chars or
		significant to 256 chars, depending on the version of UNIX.
	Characters with the 8th bit turned on are not allowed.
	Since many commands take names beginning with "-" as flags,
		file names beginning with "-" don't always work.
	Since the shell treats many of the punctuation characters
		specially, file names containing space, #, $, &, *, (, ),
		[, ], ;, ', ", \, |, <, >. and ? do not always work
		properly.  Even if you quote them, the shell strips
		off the quotes, so that if multiple layers of shell
		are involved (for example, uux) it still fails.

Because some of these problems only affect certain uses of the filesystem
(whether or not you go through the shell, whether or not you're going
through a command that takes arguments) it's not unusual for casual users
to create a file and then have trouble using, renaming, or even removing it.
I recall that removing a file whose 8th bit has been set is a frequent topic
on net.unix.
	
If the filesystem were really transparent, the designers of /proc would
not have had to encode process ID's in ASCII digits, they could have
directly used the binary representation.

It's for these reasons that I feel that a conservative UNIX user should
restrict themselves to certain "reasonable" filename conventions; basically
using only lower case letters, digits, and a few save punctuation characters
such as . and - in their filenames.  Just because it's possible to put a
space in a file name doesn't make it a good idea.

	Mark

Volume-Number: Volume 7, Number 67

std-unix@ut-sally.UUCP (Moderator, John Quarterman) (10/18/86)

From: mordor!jdb@sally.utexas.edu (John Bruner)
Reply-To: jdb@s1-c.arpa 
Date: Fri, 17 Oct 86 14:39:08 PDT
Organization: S-1 Project, LLNL

It seems to me that there are three alternatives.  POSIX can specify
that conforming implementations must be case sensitive, must be case
insensitive, or may be either case sensitive or case insensitive.

If a conforming system must be case insensitive, then UNIX doesn't
conform.  If UNIX is to be included in the set of POSIX-compatible
systems, then case sensitivity must be permitted.

If a conforming system may be case sensitive or case insensitive,
then a lot of programs won't be portable.  Ignore for the moment
all existing UNIX code and consider new program development.  I
believe that programmers on one kind of system won't bother
with the library routines that are used to compare and/or convert
mixed-case names to monocase.  It doesn't matter what people "ought"
to do.  A well-known example of this effect is 4.2BSD.  The source
code is full of variables that should be declared "long" but --
since on the VAX "long" and "int" are identical -- are not.  In the
same way, optional case sensitivity will spawn code that only runs
correctly in the environment where it was written.

Therefore, I believe that case sensitivity must be retained, and
it should not be made optional.

Volume-Number: Volume 7, Number 68

std-unix@ut-sally.UUCP (Moderator, John Quarterman) (10/18/86)

From: rbj@icst-cmr.arpa (Jim Cottrell)
Date: Fri, 17 Oct 86 16:57:43 EDT

> Having done a lot of case insensitive work, i've always felt that the
> UNIX case sensitivity was from laziness.  If i were to be charitable,
> i might go so far as to call it a shortcut.

I prefer to call it optimization. Case insensitivity must be enforced.
By my count, that's at least two instructions per character, plus loop
control (unless you have something like VAX's `move translated characters').
That ought to negate any speedup from hashing or name translation caching.

What is lazy is people refusing to learn the difference.

[ See previous comments about argument by character assassination.  -mod ]

	(Root Boy) Jim Cottrell		<rbj@icst-cmr.arpa>
	YOW!! I'm in a very clever and adorable INSANE ASYLUM!!


Volume-Number: Volume 7, Number 69

std-unix@ut-sally.UUCP (10/20/86)

From: cbosgd!cbosgd.ATT.COM!mark@ucbvax.berkeley.edu (Mark Horton)
Date: Sun, 19 Oct 86 23:11:35 edt
Organization: AT&T Medical Information Systems, Columbus

>If a conforming system may be case sensitive or case insensitive,
>then a lot of programs won't be portable.  Ignore for the moment
>all existing UNIX code and consider new program development.  I
>believe that programmers on one kind of system won't bother
>with the library routines that are used to compare and/or convert
>mixed-case names to monocase.  It doesn't matter what people "ought"
>to do.  A well-known example of this effect is 4.2BSD.  The source
>code is full of variables that should be declared "long" but --
>since on the VAX "long" and "int" are identical -- are not.  In the
>same way, optional case sensitivity will spawn code that only runs
>correctly in the environment where it was written.
>
>Therefore, I believe that case sensitivity must be retained, and
>it should not be made optional.

I'm sorry, but I don't buy this argument.  It seems to be based on
the assumption that case insensitivity will be implemented by the
use of subroutines for case-insensitive operations, with a different
user interface from that available today.  I think such an implementation
is silly, even if other operating systems may do it that way.

I'm talking about file names only.  I do not advocate even considering
making all of the user interfaces in UNIX case insensitive.  While it
might have once been a good idea to design them that way, I feel it's
far too late for someone to decree that all the upper and lower case
keys in, say, vi must be equivalent.

I think it's a given that existing code won't be rewritten to use new
interfaces, even if we come up with a wonderful way to do it.  Vi still
uses raw terminfo, even through curses would have been much easier and
better.  Also, there are lots of binaries out there that can't even be
recompiled.  Any solution to this problem must be in the kernel, or possibly
in libc underneath such subroutines as open, unlink, and chmod, (if you
have shared libraries or full source to recompile) or it won't work all
the time.

The obvious implementation is that the code in the kernel, when mapping a
filename to an inode number, to do a case-insensitive comparison when
checking each filename element in a directory.  This would be pretty
simple to add, although issues such as speed and international variations
would probably require a clever case-insensitive comparison, possibly
using a country-specific case mapping table with some flags or other
hacks to deal with single-multiple glyph mappings like SS to ess-tset.
There might even be a performance GAIN if creation of a directory entry
including calculating an appropriate hash function which is also stored
in the directory and used for initial comparisons.

I see no need to map everything to lower case when creating the directory
entry.  Let the entries be in mixed case; this allows more readable names.
I don't know what to do about sorting (e.g. in the shell or ls) - it might
be case sensitive or insensitive sorting, and good arguments can probably
be made for both.

The behavior I'm concerned about is that, if the user types, say, "mail"
and there's a command "Mail" in the search path, it should still work.
If the file "FooBar" exists and the user cats "foobar", because somebody
read that name over the phone, it should find it.

	Mark

Volume-Number: Volume 7, Number 72

std-unix@ut-sally.UUCP (10/20/86)

From: pyramid!utzoo!henry (Henry Spencer)
Date: Mon, 20 Oct 86 04:07:24 CDT

> If the filesystem were really transparent, the designers of /proc would
> not have had to encode process ID's in ASCII digits, they could have
> directly used the binary representation.

This is rather a red herring, since they wouldn't have done this even if
it had been trivially possible.  The ASCII representation is a whole lot
more useful for human beings, and isn't a significant nuisance to programs.
The extra code needed to do it isn't much (yes, I have read it).

> It's for these reasons that I feel that a conservative UNIX user should
> restrict themselves to certain "reasonable" filename conventions...

Agreed, but that is not the topic of the discussion.  Standards must address
requirements other than those of conservative human users.  It is a serious
mistake for a standard to attempt to legislate morality.

				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,decvax,pyramid}!utzoo!henry



Volume-Number: Volume 7, Number 73

std-unix@ut-sally.UUCP (Moderator, John Quarterman) (10/26/86)

From: guy@sun.com (Guy Harris)
Date: Mon, 20 Oct 86 10:49:33 PDT

Responses to a couple of messages:

>From Mark Horton:

> Any solution to this problem must be in the kernel, or possibly
> in libc underneath such subroutines as open, unlink, and chmod, (if you
> have shared libraries or full source to recompile) or it won't work all
> the time.

Any solution to this problem must be applied to operating systems other than
UNIX.  As John Bruner pointed out, mandating case-insensitivity will only
have the effect of removing UNIX from the list of standard-conforming
systems.  Changing the semantics of file names at this late date is unlikely
to meet with approval from many UNIX vendors and users.  For one thing, what
are you going to do about directories that contain files named, say,
"makefile" and "Makefile" (yes, they exist)?  You may feel that having
directories like this is a mistake, but declaring them to be a mistake isn't
going to make them go away.

There seem to be two issues here:

1) Should POSIX mandate case-sensitivity?

2) Should UNIX be changed to be case-insensitive if POSIX doesn't mandate
case-sensitivity?

These are rather separate issues.  A case can be made that POSIX should not
mandate case-sensitivity.  Applications must then not depend on
case-sensitivity.  This will affect programs that create files with names
other than those provided by the user.  It could also affect programs that
*read* directories, since they'd have to know that "foobar" and "FoOBaR"
refer to the same file.

I see great difficulty in changing UNIX to be case-insensitive, however.  It
certainly wouldn't pose any great *implementation* difficulties, but I would
not like to bet that no user or program would be greatly affected.

>From Mark R. Crispin:

>     It seems that the two sides in this issue boil down to this:
> . "gee, since we're defining a standard portable operating system
>   that isn't necessarily the present de facto Unix, let's fix
>   this case sensitivity cretinism"
> . "case sensitivity is what makes Unix better than any other
>   operating system, and only a cretin can't understand why this
>   is wonderful"

Not really.  A POSIX standard that does not *mandate* case-sensitivity need
not *forbid* it.  And I have seen *no* arguments that "case sensitivity is
what makes UNIX better than any other operating system."

>      Let's start by discarding the arguments which are bogus.
> The most glaring of these has got to be the international
> compatibility argument.  The only advocates of this argument seem
> to be pro case sensitivity Americans who have seized upon this as
> an argument to shore up their position without really thinking
> over the issue carefully.

Well, it may seem that way, but it isn't.  I admit to being a United States
citizen, but I am not unreservedly pro-case-sensitivity.  I see the merits
to both sides of the argument, but I see more problems with
case-insensitivity than with case-sensitivity.

>      Unix does not allow arbitrary strings in filenames.  Any
> number of "funny" characters must be within a quoted string.  I
> can't say
> 	rm foo.bar;1
> I have to say
> 	rm "foo.bar;1"
> Guess what.  A number of foreign keyboards use those "funny"
> characters to be non-English glyphs.

As the moderator pointed out, the shell, not the operating system,
interprets these funny characters.  Applications need not get file names
passed as arguments from the shell.  The office automation system we
developed at CCI had its own shell, which did no parsing of path names
whatsoever; the only characters it forbade were the slash and the null
character (because they are not allowed in UNIX filenames) and those
characters its forms package didn't allow you to type in (because we never
got around to changing it to do so).  I frequently used file names
containing blanks within this application, even though it made it
inconvenient to manipulate those files using commands typed at the UNIX
shell.

>      I have yet to hear of any organization in Japan using kanzi
> or hirogana or katakana in filenames.

I have a document in front of me from ASCII Corporation in Japan, describing
changes made to 4.2BSD to support Kanji and Kana.  It says:

	It is possible to create a file whose name contains Kana and/or
	Kanji characterss, since the file system and Kanji version of
	the shell support it.  However, we don't recommend such filenames,
	becasue it is impossible to handle such files from ASCII terminals.

The argument used against it would not apply if, for example, no terminals
attached to the machine were ASCII terminals and the site didn't expect to
export these files to machines with only ASCII terminals attached.  The
developers of it may be coming from a more "traditional" UNIX environment,
where you have many ASCII terminals attached to the machine and where you
frequently exchange files with other sites not running the same hardware and
software that you are running.  In an office environment, it may be possible
to provide everyone with a Kanji/Kana terminal, and it may not be as
important to worry about exchanging file with some random development
machine in the United States.

>   There are good reasons for
> this!  One is that there isn't a single way of representing
> written Japanese.  In older terminals, the high order bit when
> set indicated katakana (much as DEC VT220's use the high order
> bit for their "international characters").  Modern Japanese
> terminals use the JIS (Japanese Industrial Standard) system of
> ESCAPE followed by two bytes to define a 14 bit character.

The system they describe uses "Shift JIS" code for Kanji, and supports both
terminals that use this code and the regular JIS code for Kanji; it does
code conversion between the codes for JIS-Kanji terminals.

>      Some German keyboards use various 7-bit glyphs (I believe
> "@" is umlaut-a) for their umlauts and ess-tset.  Or, there's the
> VT220 system.  I just tried creating a file called Goethestrasse
> (using umlaut-o for "oe" and ess-tset for "ss") on my local Unix
> system using my VT220 clone.  It made "GVthestra_e", the 7-bit
> form.

The latter sounds like ISO Latin Alphabet No. 1; "umlaut-O" has the hex code
D6 and capital V has the code 56; 56 hex + 80 hex is D6 hex.  (I believe DEC
recommended the VT220 code set to ISO for standardization.)

>   Dare I mention that in German, only nouns (and the first
> word in a sentence) are capitalized?

The same is true of English; so what?

>      The point is that Unix does *not* support international
> character sets in filenames.  It supports 7-bit USASCII.  So
> let's leave that issue to rest.

As the moderator pointed out, this is not the case.  The kernel supports all
characters except slash and the null character, except for the 4.[23]BSD
kernel which (too helpfully) refuses to create files with characters in
their name that have the eighth bit set.  Certain UNIX utilities do not
handle 8-bit characters; this is not, however, an intrinsic characteristic
of the UNIX system.  I would ask European and Asian customers what they
wanted the UNIX system to do about character sets other than 7-bit USASCII
before I casually dismissed the possibility of supporting them.

>      I haven't yet heard of any serious use of full 8-bit bytes
> for filenames on any other operating system, which, if you are
> serious about supporting international character sets, you must
> do.  There's this small problem of getting 8-bit (as opposed to
> 7-bit) ASCII through various pieces of hardware and networks
> which think that the high order bit is parity...

Not all such pieces of hardware have this limitation.  The paper from ASCII
Corporation simply says "Kana and Kanji terminals must be set up to use 8
bit no parity mode."  If other terminals use a 7-bit encoding of an 8-bit
data stream, the terminal driver can do code translation transparently to
the rest of the system.

The fact that most OSes haven't solved these problems, and don't provide for
full 8-bit characters in file names, doesn't mean there is no demand for
full 8-bit characters in file names.  The users in non-English-speaking
countries may just have learned to get around this problem, and either use
English-language file names or approximate their native spelling in file
names.

Volume-Number: Volume 7, Number 76

std-unix@ut-sally.UUCP (Moderator, John Quarterman) (10/26/86)

From: @SUMEX-AIM.ARPA:MRC@PANDA  (Mark Crispin)
Date: Mon 20 Oct 86 05:42:50-PDT
Postal-Address: 1802 Hackett Ave.; Mountain View, CA  94043-4431
Phone: +1 (415) 968-1052

     The XDE Lisp machine file server I use has a file system of the
sort that Mark Horton describes.  That is, it accepts and preserves
mixed case in filenames, but in name selection it does a case-independent
match.

     I find that on this file server I am much more likely to use a file
name such as TokyoPaper.FirstDraft.  In fact, this file server encourages
me to mix case like this freely, since there is no cost in doing so.  I
can edit "tokyopaper.firstdraft" or "TOKYOPAPER.FIRSTDRAFT" or even
"tOKYOpAPER.fIRSTdRAFT" and the system is still smart enough to figure
out I mean TokyoPaper.FirstDraft.

     On the DEC-20 and Unix file servers, it's single case and hyphens.
I end up using something like "tokyo-paper.first-draft".

     These were personal observations.  However, I know for a fact that
nobody uses mixed case on our Unix-based file server.  The Leaf (Xerox
Lisp machine file access protocol) server on Unix was modified to coerce
all filenames to be entirely lowercase on the Unix machine's disk and to
coerce it back to all uppercase in the other direction.  There were/are
two reasons:
 (1) transfers to/from the third file server, a DEC-20, were hopeless
     otherwise since the Unix system would insist that two identical files
     were different because the case of the names didn't match
 (2) the users found the case dependence to be a serious problem.

-- Mark --
-------

Volume-Number: Volume 7, Number 78

std-unix@ut-sally.UUCP (Moderator, John Quarterman) (10/26/86)

From: mckenney@sri-unix.arpa (Paul E. McKenney)
Date: Thu, 23 Oct 86 17:27:21 pdt
Organization: SRI, Menlo Park, CA.

Ok, how about a compromise proposal?

Keep roughly the same case-sensitivity in the kernel interface that exists
now.  This means that (for example) 'unlink("abc")' and 'unlink("ABC")' will
remove two different files.

Keep the normal shell interface for filenames.  This means that (again, for
example) 'rm abc' and 'rm ABC' will again remove two different files.

Make escape completion case insensitive.  (Escape completion is used in some
versions of BSD 4.x csh, perhaps elsewhere also.  It allows a user to
type the first part of a filename (or command name) and then hit
ESC.  The system will complete the filename as best it can.  If it cannot
unambiguously determine the filename from the part given by the user, it
will beep after having supplied as much of the filename as it can without
problems with ambiguity.  There is also usually a feature that allow the
user to display all filenames that match what he has typed so far --
control-D serves this function in some variants of BSD 4.2 csh.)

In other words, if a user types 'rm abc<ESC>' (where <ESC> represents the
ESC key), and there is a file named 'ABC', and there is no other file that
matches the pattern '[aA][bB][cC]', the shell (-not- the kernel) will
backspace over the 'abc' and overwrite it with 'ABC' so that the command
line will look as if the user had typed 'rm ABC'.  The user may then
hit RETURN if he wishes to execute the command, or he may further edit
the command line (using his usual backspace/delete, etc. characters).

This escape-mapping facility should be supplied in a library routine so that
application programs can easily act the same way.  It would be nice if such
a function could work with keywords, hostnames, etc. as well as filenames.

This proposal has the following advantages:

o	It does not impact existing software (addition of the case-insensitive
	ESC does not add any functionality, it just makes it easier on users).

o	It answers Mark Horton's 'filename-over-the-phone' problem
	<6049@ut-sally.UUCP> (just tell the user to type 'foobar<ESC>').

o	It allows users from a case-insensitive environment a helpful tool
	to ease their transition (let's face it -- if it is different than
	whatever you are used to, it ain't friendly -- regardless of whether
	you are used to case sensitivity, case insensitivity, or hieroglyphics).

o	Removes the need for millions and millions of 'upper()' calls in
	application code mentioned by Dan Libes <5959@ut-sally.UUCP>
	(although the additional code to do good escape-completion is far
	from trivial!).

o	Removes the need for 'isfsense()' or 'isflegal()' (Chris Lent,
	<5971@ut-sally.UUCP>) since all implementations could use the same
	definition of legal characters in a pathname.  Note that 'isflegal()'
	is still useful for programs that are trying to be portable across
	different operating systems.

This proposal leaves the following two issues unresolved:

o	Whether the eighth bit on characters within a filename should be
	significant.  The developers of BSD 4.[23] must have had some good
	reason for making it insignificant, but the only reason that comes
	to mind is that most terminals cannot easily specify the eighth bit
	(just like some older terminals cannot easily specify lower case!).

o	Whether there should be some escaping mechanism to allow slash ("/")
	and ASCII NUL in a filename.  I cannot think of a reason for allowing
	this that seems worth the trouble -- any comments?


			Paul E. McKenney
			mckenney@sri-unix.arpa
			{pyramid,rutgers,ucbvax!hplabs}!sri-unix!mckenney

Volume-Number: Volume 7, Number 89

std-unix@ut-sally.UUCP (Moderator, John Quarterman) (10/27/86)

From: seismo!enea!tut!intrin.uucp!jty (Jyrki Yli-Nokari)
Date: Mon, 27 Oct 86 20:54:42 -0200
Organization: Intrinsic Oy, Tampere, Finland

There seems to be misunderstanding about Unix not accepting 8 bit characters
in file names.

I would like to point out that Unix is perfectly happy to include
ANY 8 bit characters in the file name, EXCEPT slash '/' or null '\0'.

[ Depends on which system you're referring to:  some really do
strip the eighth bit in the file system, not in the shell.
Though there are many shells that also strip that bit,
as you point out.  -mod ]

The REAL problem is the shell that strips the 8:th bit off for its
own purposes.

At least IBM's AIX and HP's HP-UX have fixed this problem.

Regardless of the case sensitivity we MUST start from the fact
that characters are made out of at least eight bits, not seven = USASCII.

Now that I use 7 bit modified ascii character set,
the O umlaut in my terminal is really a backslash '\'
as far as Unix is concerned.

Try explaning that to a casual end-user, who wants to create a file
called '\rkki'.

Volume-Number: Volume 7, Number 94

std-unix@ut-sally.UUCP (Moderator, John Quarterman) (11/02/86)

From: seismo!enea!chalmers.UUCP!jacob (Jacob Hallen)
Date: Sun, 2 Nov 86 01:38:02 -0100
Organization: Dept. of CS, Chalmers, Sweden

I would like to point out one small but very useful advantage with
case sensitive filenames.
In a directory containing many files its difficult to spot files
with names like makefile, readme and instructions. Given the names
Makefile, Readme and Instructions these files will appear first
in the listing where they are easy to find.

Jacob Hallen

Volume-Number: Volume 8, Number 18

std-unix@ut-sally.UUCP (Moderator, John Quarterman) (11/03/86)

From: guy@sun.com (Guy Harris)
Date: Mon, 3 Nov 86 00:54:24 PST

> I would like to point out one small but very useful advantage with
> case sensitive filenames.
> In a directory containing many files its difficult to spot files
> with names like makefile, readme and instructions. Given the names
> Makefile, Readme and Instructions these files will appear first
> in the listing where they are easy to find.

This is an advantage of file systems that permit both upper-case and
lower-case letters in file names.  File systems with case-sensitive file
names, and file systems with case-insensitive file names, can both permit
two cases of letters in file names.

[ Perhaps the third kind is called case-coercing? -mod ]

Volume-Number: Volume 8, Number 19

std-unix@ut-sally.UUCP (Moderator, John Quarterman) (11/03/86)

From: @SUMEX-AIM.ARPA:MRC@PANDA (Mark Crispin)
Date: Sun 2 Nov 86 10:54:35-PST
Postal-Address: 1802 Hackett Ave.; Mountain View, CA  94043-4431
Phone: +1 (415) 968-1052

Jacob Hallen -

     You missed the point, I think.  Very few if any of us in the
case-independence camp are arguing that case should be coerced into
all upper (e.g. TOPS-20) or all lower (e.g. what you have to do with
a Unix file server in a case-independent network environment).  You
should be allowed to create a file called ReadMe.

     What we are asking for is that if you try to access the ReadMe
file by specifying "readme" or "Readme" or "README" or even "rEADmE"
you should get the ReadMe file instead of a file not found error.
Furthermore, if you open "readme", "Readme", etc. for write, it should
supercede the ReadMe file and the resulting file should have the
original case of ReadMe.

     In other words, finding a file for read will match any case.
Finding a file for write will match any case, supercede any such older
file, and will preserve the case of that older file.  The only way to
change the case would be with rename; the source name would be case
independent but the destination case would be preserved.  Of course,
you could also change the case by deleting ReadMe and then opening
README for write...

     This gives you all the directory advantages of a case-dependent
filesystem.  The only "feature" you lose is the ability to create a
separate Readme, ReadMe, readme, and README set of files.  I personally
believe that anybody who creates files which differ from case deserves
to be shot or at least have his employment terminated with extreme
prejudice.  [ I suggest readers interpret that last sentence as a
hypothetical statement applying to none of them.  -mod ]

     There are filesystems that behave in this manner, and they are
quite pleasant to use.  Please, if you support case-dependence, don't
give the "mixed case filesystems" class of arguments.  The only two
arguments you really have are (1) it is a "feature" (however dubious)
that you can create Makefile and makefile as separate files in the
same directory, and (2) Unix does it this way.
-------

Volume-Number: Volume 8, Number 25

std-unix@ut-sally.UUCP (Moderator, John Quarterman) (11/03/86)

From: seismo!utai!utcsri!mcgill-vision!mouse
Date: Thu, 30 Oct 86 05:28:02 EST

In article <6107@ut-sally.UUCP> mckenney@sri-unix.arpa (Paul E. McKenney) writes:
[that what he said above leaves unaddressed]
> o	Whether the eighth bit on characters within a filename should be
> 	significant.  The developers of BSD 4.[23] must have had some good
> 	reason for making it insignificant, but the only reason that comes
> 	to mind is that most terminals cannot easily specify the eighth bit
> 	(just like some older terminals cannot easily specify lower
> 	case!).

There are also programs (the shell comes to mind) that use the eighth
bit for their own purposes.  I believe the shell uses it as a quote
indicator.  Although it is not relevant to filenames, I seem to recall
seeing some code along the lines of curses that used the eighth bit to
indicate highlighting.  Also, all the 7-bit characters can be specified
to (say) rm by careful use of quotes or backslashes.  With most
terminals, this is not possible for 8th-bit-set characters, and even if
the terminal and tty driver could handle it, as I implied above the
shell would strip it anyway.  So you *couldn't* do anything with such
files.

					der Mouse

USA: {ihnp4,decvax,akgua,utzoo,etc}!utcsri!mcgill-vision!mouse
     think!mosart!mcgill-vision!mouse
Europe: mcvax!decvax!utcsri!mcgill-vision!mouse
ARPAnet: think!mosart!mcgill-vision!mouse@harvard.harvard.edu

Volume-Number: Volume 8, Number 26

std-unix@ut-sally.UUCP (Moderator, John Quarterman) (11/04/86)

From: chris@mimsy.umd.edu (Chris Torek)
Date: Tue, 4 Nov 86 07:33:44 EST

We seem to have three proposals:

CS: Case sensitive file systems.  This is what all major Unix variants
    (V6, V7, SysIII, SysV, 2BSD, and 4BSD) now support.

CC: Case coercive file systems (file names forced to all upper or all
    lower case).

CR: Case retaining but otherwise insensitive file systems (new names
    are created according to the given case; matches are not case
    sensitive).

I sincerely hope that no one is seriously suggesting POSIX adopt
CC: no one seems to like such systems much.  That leaves CS and
CR.  The case for CR appears to be that those who have used both
CS and CR prefer CR.  This may be true; I have seen no studies,
but the anecdotes do seem to favour it.  I have used such a system,
and did not think it so wonderful, but for the sake of argument,
let us assume that CR really is objectively better than CS---so
much so that 5BSD and System V Release N+1 will have CR style file
systems.  Fine.

But as I understand it, POSIX is intended to be an interface
specification for something that resembles `Unix' (whatever `Unix'
may be).  If that is indeed the case, the only sensible choice is
CS, for, as I noted above, this is what all major Unix variants
*do*.  *They all agree:* file names are case sensitive.  Should
we make standard something that no one uses?  I say no!  When
5BSD and Release N+1 come out, then we can create a new standard
to describe these wonderful new systems, but until then, let
us write something that describes what we have now.

I believe that the first standard for *anything* that already exists
should describe the existing implementations, at least wherever
they agree.  Afterward, feel free to invent new improved standards,
so as to foist progress upon vendors.  Indeed, it might not be a
bad idea to publish two standards virtually simultaneously: That
Which Is, and That Which Should Be.  But list first That Which Is.

[ There really are (or at least were) two discussions going on here:
one about what should be in POSIX, the other about what UNIX should do.
I haven't seen any recent arguments that POSIX should do anything but
reflect what UNIX currently does, i.e., case sensitive file names
(really file names as uninterpreted byte streams).  -mod ]

-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu

Volume-Number: Volume 8, Number 34

std-unix@ut-sally.UUCP (Moderator, John Quarterman) (11/04/86)

From: campbell%maynard.UUCP@harvisr.harvard.edu (Larry Campbell)
Date: Tue, 4 Nov 86 10:19:11 EST
Organization: The Boston Software Works, Inc.

>From: @SUMEX-AIM.ARPA:MRC@PANDA (Mark Crispin)

>     What we are asking for is that if you try to access the ReadMe
>file by specifying "readme" or "Readme" or "README" or even "rEADmE"
>you should get the ReadMe file instead of a file not found error.
>Furthermore, if you open "readme", "Readme", etc. for write, it should
>supercede [sic] the ReadMe file and the resulting file should have the
>original case of ReadMe.
>
>     In other words, finding a file for read will match any case.
>Finding a file for write will match any case, supercede [sic] any such older
>file, and will preserve the case of that older file.  The only way to
>change the case would be with rename; the source name would be case
>independent but the destination case would be preserved.  Of course,
>you could also change the case by deleting ReadMe and then opening
>README for write...

>     There are filesystems that behave in this manner, and they are
>quite pleasant to use.  Please, if you support case-dependence, don't
>give the "mixed case filesystems" class of arguments.  The only two
>arguments you really have are (1) it is a "feature" (however dubious)
>that you can create Makefile and makefile as separate files in the
>same directory, and (2) Unix does it this way.

Sorry to keep beating this dead horse, but some people just haven't
yet caught on to one of the principle design fundamentals of UNIX.

	"Keep it small and simple."

As has already been pointed out, the system (I'm deliberately avoiding
the term "kernel") treats filenames as uninterpreted strings of bytes.
Adding case folding to the system adds complexity to the system that
provides only a tiny benefit (is it really that hard to type the correct
filename?).

I think everyone agrees that creating "Makefile" and "makefile" in the
same directory is braindamaged.  What I disagree with is the notion
that the system should be in the business of preventing this.  Should
the C compiler enforce a certain Hamming distance between identifiers?

Note also that case folding is only "simple" in some languages.  As has
already been pointed out, there are languages (like German) where case
folding is decidedly complex.  And in an international environment, the
case folding algorithm may need to be different for each user.

I wish I could remember who said this, but someone once pointed out
that "One of the reasons Dennis Ritchie is a genius is that whenever
someone says `Wouldn't it be nice if UNIX had feature X?', instead
of saying `Wow, yeah, I'll go hack that in', he says, `Yep, sure would.'"
-- 
Larry Campbell       MCI: LCAMPBELL          The Boston Software Works, Inc.
UUCP: {alliant,wjh12}!maynard!campbell      120 Fulton Street, Boston MA 02109
ARPA: campbell%maynard.uucp@harvisr.harvard.edu     (617) 367-6846

Volume-Number: Volume 8, Number 35

std-unix@ut-sally.UUCP (Moderator, John Quarterman) (11/05/86)

From: pyramid!utzoo!henry (Henry Spencer)
Date: Tue, 4 Nov 86 19:48:36 CST

> There are also programs (the shell comes to mind) that use the eighth
> bit for their own purposes.  I believe the shell uses it as a quote
> indicator...

I believe that in recent times, AT&T has made strenuous efforts to stamp
out such extraneous uses of the 8th bit, so that full 8-bit character sets
could be used.  I know other vendors have as well.  So this objection will
not be valid much longer, even if it remains valid today.

				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,decvax,pyramid}!utzoo!henry

Volume-Number: Volume 8, Number 36

std-unix@ut-sally.UUCP (Moderator, John Quarterman) (11/05/86)

From: seismo!hadron!jsdy@sally.utexas.edu (Joseph S. D. Yao)
Cc: jsdy@sally.utexas.edu
Date: Tue, 4 Nov 86 21:16:04 est

>From: @SUMEX-AIM.ARPA:MRC@PANDA  (Mark Crispin)
>Date: Mon 20 Oct 86 05:42:50-PDT
>     On the DEC-20 and Unix file servers, it's single case and hyphens.
>I end up using something like "tokyo-paper.first-draft".

This sounds like a local convention.  Unix filenames may contain
any ASCII character, including upper and lower cases, except for
NUL and '/'.

>nobody uses mixed case on our Unix-based file server.  The Leaf (Xerox
>Lisp machine file access protocol) server on Unix was modified to coerce
>all filenames to be entirely lowercase on the Unix machine's disk and to
>coerce it back to all uppercase in the other direction.  There were/are
>two reasons:
> (1) transfers to/from the third file server, a DEC-20, were hopeless
>     otherwise since the Unix system would insist that two identical files
>     were different because the case of the names didn't match
> (2) the users found the case dependence to be a serious problem.

We now see the source of the discrepancy.  (2) obviously came first:
people who were used to the older (I did NOT say antique ;-) ) file
system on the 20's, and wanted not to worry about filename conversion,
tried to make the restrictions on Unix file names a combination of
the DEC-20's and what they PERCEIVED as the Unix conventions.  This
indubitably has caused further consternation among people familiar
with one or the other but not both systems.

The Leaf server apparently gives this version of Unix a modified file
system with an attempt at monocase restriction.  I have no idea how
prevalent it is, but my off-hand observation is "not very."  I don't
think arguments based on what it does can be very compelling.
-- 

	Joe Yao		hadron!jsdy@seismo.{CSS.GOV,ARPA,UUCP}
			jsdy@hadron.COM (not yet domainised)

Volume-Number: Volume 8, Number 38

std-unix@ut-sally.UUCP (Moderator, John Quarterman) (11/07/86)

From: mcvax!axis!philip@seismo.css.gov (Philip Peake)
Date: Fri, 7 Nov 86 09:41:00 -0100
Organization: Axis Digital, 135 rue d'Aguesseau, Boulogne, 92100, FRANCE

In article <6226@ut-sally.UUCP>:
>From: chris@mimsy.umd.edu (Chris Torek)
>Date: Tue, 4 Nov 86 07:33:44 EST
>
>We seem to have three proposals:
>
>CS: Case sensitive file systems.  This is what all major Unix variants
>    (V6, V7, SysIII, SysV, 2BSD, and 4BSD) now support.
>
>CC: Case coercive file systems (file names forced to all upper or all
>    lower case).
>
>CR: Case retaining but otherwise insensitive file systems (new names
>    are created according to the given case; matches are not case
>    sensitive).
>
>I sincerely hope that no one is seriously suggesting POSIX adopt
>CC: no one seems to like such systems much.

This one line invalidates completely the rest of this article.
WHY do people trying to defend one of their pet ideas always claim
than 'no one' wants the opposite?

There is at least ONE person whod DOES want such filesystems - me!
And I DO suggest that POSIX adopt such a system.

Philip

[ Nor are arguments consisting solely of "I want it" very useful.  -mod ]

Volume-Number: Volume 8, Number 47

std-unix@ut-sally.UUCP (Moderator, John Quarterman) (11/09/86)

From: seismo!enea!chalmers.UUCP!jacob (Jacob Hallen)
Date: Fri, 7 Nov 86 23:35:50 -0100
Organization: Dept. of CS, Chalmers, Sweden

>We seem to have three proposals:
>
>CS: Case sensitive file systems.  This is what all major Unix variants
>    (V6, V7, SysIII, SysV, 2BSD, and 4BSD) now support.
>
>CC: Case coercive file systems (file names forced to all upper or all
>    lower case).
>
>CR: Case retaining but otherwise insensitive file systems (new names
>    are created according to the given case; matches are not case
>    sensitive).
>

There is a serious flaw in the CR case! You lose orthogonality in
the interpretation of commands since creations, moves, copies and
some other file operations will interpret arguments literally while
other commands will have their arguments interpreted in the flexible way.
A move by the way is a good example since one argument will be treated
in one way and the other in the other way.

Jacob Hallen

Volume-Number: Volume 8, Number 49

std-unix@ut-sally.UUCP (Guest Moderator, John B. Chambers) (11/17/86)

References:


>From im4u!rbj@icst-cmr.ARPA Mon Nov 10 16:31:53 1986
Date: Mon, 10 Nov 86 16:40:48 EST
From: Root Boy Jim <im4u!rbj@icst-cmr.ARPA>

Re: Volume-Number: Volume 8, Number 25
>      This gives you all the directory advantages of a case-dependent
> filesystem.  The only "feature" you lose is the ability to create a
> separate Readme, ReadMe, readme, and README set of files.  I personally
> believe that anybody who creates files which differ from case deserves
> to be shot or at least have his employment terminated with extreme
> prejudice.  [ I suggest readers interpret that last sentence as a
> hypothetical statement applying to none of them.  -mod ]

There are several uses I can think of:

	1) linking: cd /etc; ln passwd PASSWD
		This makes it less likely that I will lose my passwd
		file even if I do `rm p*'.
	2) old versions: cd /etc; cp passwd PASSWD
		Keeps a backup version. Note that these two uses may
		conflict if I decide to `cp /dev/null PASSWD'!
	3) filename completion: using (1) an the 4.3 csh, I can type
		`vi /etc/P<ESC><RET>'. Ok, ok, emacs then :-)
	4) intermediate files: instead of picking a new name, I can
		just change case. Yes I know I can use other methods.

While I generally think it undesirable to depend on case for human 
distinction, it comes in quite handy sometimes. I have seen the same
trick used in C programs as well, #defining foo to union_name.Foo.
Before you flame the usage, my source is the Berkeley VLSI tools.

	(Root Boy) Jim Cottrell		<rbj@icst-cmr.arpa>
	Was John Hinckley allowed to watch `Taxi Driver' last night?


Volume-Number: Volume 8, Number 54

std-unix@ut-sally.UUCP (Guest Moderator, John B. Chambers) (11/22/86)

>From uw-beaver!uw-vlsi!mprvaxa!ubc-vision!utai!utcsri!mcgill-vision!mouse@nike.UUCP Wed Nov 19 04:57:50 1986
Date: Sun, 16 Nov 86 02:46:16 EST
From: der Mouse  <uw-beaver!ubc-vision!mcgill-vision!mouse@nike.UUCP>

> Please, if you support case-dependence, don't give the "mixed case
> filesystems" class of arguments.  The only two arguments you really
> have are (1) it is a "feature" (however dubious) that you can create
> Makefile and makefile as separate files in the same directory, and
> (2) Unix does it this way.

I think everyone arguing over case sensitivity is missing something.
Why treat letters specially?  For example, UNIX treats a and A
differently just as it treats = and % differently.  I see no reason to
restrict filenames to [a-zA-Z0-9] and a few special characters like .
and -; and given that uniformity case folding makes as much (or as
little) sense as folding 0123456789 onto !"#$%&'()* (to pick a
particularly silly example).

I would say that (1) is not particularly useful, but it can be nice to
be able to create files named D.mcgill-X04T2 and D.mcgill-X04t2 in the
same directory.  This is less of an issue, though; it's just as easy to
make a program use base-36 as base-62 or base-126.

					der Mouse

USA: {ihnp4,decvax,akgua,utzoo,etc}!utcsri!mcgill-vision!mouse
     think!mosart!mcgill-vision!mouse
Europe: mcvax!decvax!utcsri!mcgill-vision!mouse
ARPAnet: think!mosart!mcgill-vision!mouse@harvard.harvard.edu


Volume-Number: Volume 8, Number 57

std-unix@ut-sally.UUCP (Guest Moderator, John B. Chambers) (11/22/86)

References:


>From bu-cs!bzs@harvard.UUCP Wed Nov 19 07:19:28 1986
Date: Tue, 18 Nov 86 21:35:03 EST
From: bu-cs!bu-cs.BU.EDU!bzs@harvard.UUCP (Barry Shein)


The problem with a file system where you cannot have ReadMe and
README is that you are throwing away possibilities. This also
means that I cannot have tmp01234A, tmp01234B, ... , tmp01234a, ...

I fear that although many people have applications that are small and
have small requirements they should not place restrictions on those
with large requirements, use your imagination, consider MasterCard's
data base for a moment or some of the multi-library catalog systems
people are building, they may need (and have machines that have no
trouble with) many thousands of files who's names may serve as primary
keys (why not, it's one way to guarantee write-through on update...)

Next they'll be telling us we should only allow 16-bit ints because
any number larger than 16-bits is hard to type in and error prone
anyhow.

I still suggest the use of 'stty lcase' if that's what you want
(alias run 'stty -lcase; \!* ; stty lcase' :-)

	-Barry Shein, Boston University



Volume-Number: Volume 8, Number 58

std-unix@ut-sally.UUCP (Guest Moderator, John B. Chambers) (11/22/86)

Date: Thu, 20 Nov 86 08:39:02 -0200
From: mcvax!crin!tombre@seismo.UUCP (Karl Tombre)
Organization: C.R.I.N., Nancy, France

On use of case in filenames :

>There are several uses I can think of:
>
>	1) linking: cd /etc; ln passwd PASSWD
>		This makes it less likely that I will lose my passwd
>		file even if I do `rm p*'.
>	2) old versions: cd /etc; cp passwd PASSWD
>		Keeps a backup version. Note that these two uses may
>		conflict if I decide to `cp /dev/null PASSWD'!
>	3) filename completion: using (1) an the 4.3 csh, I can type
>		`vi /etc/P<ESC><RET>'. Ok, ok, emacs then :-)
>	4) intermediate files: instead of picking a new name, I can
>		just change case. Yes I know I can use other methods.
>

Well and how about directories? I know at least 2 tools using cases in their
directories : rn (News directory) and mail mode in Unipress emacs (Messages
directory). So I generalized this use. All my directories begin with
uppercase, the other files with lowercase. This provides an easy way to
separate directory from file. Of course, that's what I do in my home
directory, I let /usr, /etc and so on remain in lower case :-)



Volume-Number: Volume 8, Number 60

std-unix@ut-sally.UUCP (Guest Moderator, John B. Chambers) (11/28/86)

>From rutgers!ames!ucbcad!ucbvax!decwrl!amdcad!amd!pesnta!valid!sbs@im4u.UUCP Sun Nov 23 06:32:49 1986
Date: Sat, 22 Nov 86 21:06:10 pst
From: rutgers!ames!ucbcad!ucbvax!decwrl!amd!valid!valid!sbs@im4u.UUCP (Steven Brian McKechnie Sargent)

4.2BSD refuses to namei a file with 8-bit character(s) because that's a good
sign that the directory entry has been thumped.  The super-user is allowed
to namei files with 8-bit characters.

In non-USASCII environments (where 8-bit characters are plausible), this is
bad planning.

S.



Volume-Number: Volume 8, Number 61

std-unix@ut-sally.UUCP (12/04/86)

References:

Date: Mon, 1 Dec 86 13:49:52 PST
From: guy@Sun.COM (Guy Harris)

> 4.2BSD refuses to namei a file with 8-bit character(s) because that's a good
> sign that the directory entry has been thumped.  The super-user is allowed
> to namei files with 8-bit characters.

4.2BSD refuses to namei a file with 8-bit character(s) because files like
that are a royal pain to deal with, due to both the Bourne and C shell
stripping all arguments to 7 bits before passing them to programs - not
because they are most likely to appear in smashed directory entries.  The
super-user is NOT allowed to namei files with 8-bit characters; the error
returned in 4.2BSD is EPERM, but that doesn't mean it won't be given to the
super-user.  The error was changed to EINVAL in 4.3BSD.

The point still stands, however, that the kernel shouldn't enforce
restrictions like this.  The System V Release 3 Bourne shell has been fixed
to handle 8-bit arguments, so you can use "rm -i *" or something like that
if you want to remove files with 8-bit characters in their names.  Some
Japanese companies have also fixed the C shell to handle files with names
containing 8-bit characters.


Volume-Number: Volume 8, Number 62