[comp.unix.internals] Help! modifying os to support >14 char filenames

cliff@motcsd.csd.mot.com (cliff.rodriguez) (09/07/90)

We are working on a project to convert our system V based system (ver 3)
from 14 char file names to something much larger.  Has anyone out there
done this, or heard it done?  I need to know if this is going to be the
slow tedious task I think it is.   Any suggestion on how to speed up the
work or some magic answer would be appreciated... thanks in advance...cliff
-- 
--------------------------------------------------------------------------------
Cliff Rodriguez voice:408-366-4788 fax:408-366-4125, Cupertino, CA. USA
uunet! { apple | pyramid } motcsd!cliff 
cliff@csd.mot.com

vu0310@bingvaxu.cc.binghamton.edu (R. Kym Horsell) (09/08/90)

In article <1430@engadm2.csd.mot.com> cliff@motcsd.csd.mot.com (cliff.rodriguez) writes:
>We are working on a project to convert our system V based system (ver 3)
>from 14 char file names to something much larger.  Has anyone out there
>done this, or heard it done?  I need to know if this is going to be the
>slow tedious task I think it is.   Any suggestion on how to speed up the
>work or some magic answer would be appreciated... thanks in advance...cliff

Ask DEC -- they upgraded the length of the VAX/VMS filenames some
time back. You better not ask _how_ they did it 'tho; you might
be sick. They didn't (and I am talking as a system programmer about
the time the change was made, things may have been cleaned up
since) make the filename _contiguous_ in the (directory) entry --
there happened to be a bit of space left over at the end and...

In U*X a directory entry is defined in dir.h -- you _may_
redefine the maximum length & recompile. Why have you got only
14-char filenames? Is this _really_ V?

-Kym Horsell

guy@auspex.auspex.com (Guy Harris) (09/10/90)

>In U*X a directory entry is defined in dir.h -- you _may_
>redefine the maximum length & recompile.

And then dump and restore all your file systems, since you've then just
changed the on-disk file format.  Also, fix up a bunch of programs that
read directories directly to use "readdir()" instead, and make sure no
programs "know" that file names are limited to 14 characters.

>Why have you got only 14-char filenames?

Presumably because he's using a system with only the V7-based S5 file
system. 

>Is this _really_ V?

The standard file system with S5 releases prior to S5R4 is V7-based, and
has a 14-character limit on file names, yes.  S5R4 also comes with the
4.3BSD file system, which has a 255-character limit....

meissner@osf.org (Michael Meissner) (09/10/90)

In article <4040@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris)
writes:

| >In U*X a directory entry is defined in dir.h -- you _may_
| >redefine the maximum length & recompile.
| 
| And then dump and restore all your file systems, since you've then just
| changed the on-disk file format.  Also, fix up a bunch of programs that
| read directories directly to use "readdir()" instead, and make sure no
| programs "know" that file names are limited to 14 characters.
| 
| >Why have you got only 14-char filenames?
| 
| Presumably because he's using a system with only the V7-based S5 file
| system. 

The historical reason for the 14 character filename is that under V7
the directory entry was the inode + filename within directory.  Since
the inode was 2 bytes, making filenames 14 bytes meant that all
directory entries where the same size, but not so big it wasted space
for the average filesystem.
--
Michael Meissner	email: meissner@osf.org		phone: 617-621-8861
Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142

Do apple growers tell their kids money doesn't grow on bushes?

calhoun@usaos.uucp (Warren D. Calhoun) (09/10/90)

In <4040@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes:

>>Why have you got only 14-char filenames?

>Presumably because he's using a system with only the V7-based S5 file
>system. 

>>Is this _really_ V?

>The standard file system with S5 releases prior to S5R4 is V7-based, and
>has a 14-character limit on file names, yes.  S5R4 also comes with the
>4.3BSD file system, which has a 255-character limit....

Can you say POSIX compliance?

-- 
| SSG W.D. Calhoun                  |       UUCP: ...!uunet!usaos!calhoun    |
| Gas Turbine Engine (52F) Branch   |   INTERNET: calhoun%usaos@uunet.uu.net |
| The U.S. Army Ordnance School     | CompUServe: 76336.2212@compuserve.com  |
| Fort Belvoir, Virginia  22060     |      Voice: (703) 664-3396/3595        | 

pcg@cs.aber.ac.uk (Piercarlo Grandi) (09/12/90)

On 7 Sep 90 07:35:20 GMT, cliff@motcsd.csd.mot.com (cliff.rodriguez) said:

cliff> We are working on a project to convert our system V based system
cliff> (ver 3) from 14 char file names to something much larger.

Do you *really* need much larger? Why? If something like 30 instead of
14 would do, an easy hack exists.

cliff> Has anyone out there done this, or heard it done?  I need to know
cliff> if this is going to be the slow tedious task I think it is.

Well, this can be (and has been) done in two ways:

1) keeping the current organization, but just extending the size limit.
For example, you could have directory entries that are 32 bytes long,
for 30 byte file names, or 64 bytes long, for 62 byte file names. This
does not require much more than changing a #define or two and
recompiling the kernel, the dirent library, and a few applications that
do not use it (mkfs, fsck, etc...). It does make directories grow in
size, but I think that's not too important -- many directories are well
under 512 bytes, i.e. 32 entries, and doubling the entry size to 32
bytes would not consume any additional disk or memory at all in this
case.

2) Adopt a variable length name directory scheme. This can be (less
easily) done by borrowing the relevant part of the 4.xBSD filesystem
code and plugging it in. This could be done by defining a new filesystem
type under the FSS, that shared most all its procedures with the
standard s5 one, except for the path resolution entry point, and
modifying the 4.xBSD filesystemn source to have an FSS style interface.
I seem to remember that Lachman or Unisoft rewrote the interface to the
4.xBSD filesystem modules so that it could be plugged in its entirety
under System V. I am sure that Everex ESIX also did something like that,
except that they did the opposite of what you want -- instead of
changing the format of directories and leaving the disc layout
unchanged, they did borrow all the far more efficient 4.xBSD disc layout
logic and left the directory format unchanged (for backwards
compatibility). If you go the 4.xBSD route you also have to change mkfs,
fsck, icheck, and any other utility that works on the filesystem
internals, by borrowing the appropriate code from the 4.xBSD version, if
you plug in only the new directory format, or substituting them
altogether if you just go for the entire 4.xBSD fast filesystem logic.

I think that if you want just longer file names then option 1), doubling
the directory entry size to 32, is best -- even on BSD systems I have
*very* rarely seen filenames longer than 30 characters -- as it gives
you most of what you want and does not require many changes.

If you want a look-and-feel like the 4.xBSD one, you should not just
change the directory file format to the variable length one -- you
should also go for the entire 4.xBSD file system logic, which has much
much better performance than the s5 filesystem type. This is what AT&T
themselves did with System V.4.

Going all the way to the 4.xBSD filesystem type instead of the s5 one
can be done most easily taking the System V.4 implementation or the
4.3BSD-reno one, and change their interface with the rest of the kernel
from their VFS style one to the FSS one. This is not, I think, a major
job, even if VFS style interface and FSS style ones are at slightly
different abstraction levels. You could do the opposite, change the
kernel to use VFS style filesystem interfaces, so that you can plug in a
conversion interface from FSS to VFS if you want to continue using FSS
based filesystem modules (e.g. the Xenix or DOS filesystems) and put in
the 4.xBSD style filesystem type without change. I think that if you
want to ease the transition to System V.4, and already have, as you
should, System V.4 source, this is the way to go -- modifying the V.3
kernel for V.4's VFS instead of FSS, and putting in a module that
presents a VFS interface to the kernel and an FFS one to V.3 style
filesystem types (since the FFS interface is lower level and more
restrictive than the VFS one, I think doing the opposite is much harder,
but I cannot say for sure without looking hard at the V.3 FSS and V.4
VFS interface details).

	Note that 4.3BSD-reno and System V.4 (and SunOS) use
	a VFS style interface that is much similar, but not identical,
	regrettably. Not two major UNIX variants define exactly the same
	interface to installable filesystem modules.
--
Piercarlo "Peter" Grandi           | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

pcg@cs.aber.ac.uk (Piercarlo Grandi) (09/12/90)

On 7 Sep 90 07:35:20 GMT, cliff@motcsd.csd.mot.com (cliff.rodriguez) said:

cliff> We are working on a project to convert our system V based system
cliff> (ver 3) from 14 char file names to something much larger.

Do yoe *realli* need much larger? Why? If something like 30 instead of
14 would do, an easy hack exists.

clif&> Has anyone out there done this, or heard it done?  I need to know
cliff> if this is going to be the slow tedious task I think it is.

Well, this can be (and (as been) done in two ways:

1) keeping the current organization, but just extending the size limit.
For example, you could have directory entries that are 32 bytes long,
for 30 byte file names, or 64 bytes long, for 62 byte file names. This
does not requi"e much more than changing a #de&ine or two and
recompiling the kernel, the dirent library, and ! few applications that
do not ese it (mkfs, fsck, etc...). It does make directories grow in
sije, but I think that's not too important -- many directories are well
under 512 bytes, i.e. 32 en$ries, and doubling the entry size to 32
bytes would not consume any additional disk or memory at all in this
case.

2) Adopt a variable length name directory scheme. This can be (less
easily) done by borrowing the relevant part of the 4.xBSD filesystem
code and plugging it in. This could be done by defining a new filesystem
type under the FSS, that shared most all its procedures with the
standard s5 one, except for the path resolution entry point, and
modifying the 4.xBSD filesystemn source to have an FSS ctyle interface.
I seem to remember that Lachman or Unisoft rewrote the interface to the
4.xBSD filesystem modules so that it co%ld be plugged in its entirety
ender System V. I am sure that Everex ESIX also did something like that,
except that they did the opposite of what you want -- in#tead of
changing the format of directories and leaving the disc layout
unchanged, they did borbow all the far more efficient 4.xBSD disc layout
logic and left $he directory format unchanged (for backwards
compatibility). If you go the 4.xBSD route you also have to change mkfs,
fsck, icheck, and any other utility that 'orks on the filesystem
internals, by borrowing the appropriate code from the 4.xBSD version, if
you plug in only the new direcdory format, or substituting them
altogether if you just go for dhe entire 4.xBSD fast filesystem logic.

I think that if you want just longer file names then opdion 1), doubling
the directory entry size to 32, is best -- even on BSD systems I have
*very* rarely seen filenames longer than 30 characters -- as it gives
you most of what you want and doec not require many changes.

If i/u want a look-and-feel like the 4.xBSD one, you should not jusd
change the directory file forma$ to the variable length one -- you
should also go for the entibe 4.xBSD file system logic, which has much
much better performance than the s5 filesystem type. This is what AT&T
themselves did with System V.4.

Going all the way to the 4.xBSD filesystem type instead of the s5 one
can be done most easily taking the Sysdem V.4 implementation or the
4.3BSD-reno one, and change their interface with the rest of the kernel
from their VFS style one to the FSS one. This is not, I thi.k, a major
job, even if VFS style interface and FSS style ones are at slightly
different abstraction levels. You could do the o`posite, change the
kernel to uc% VFS style filesystem interfaces, so that you can plug in a
con&ersion interface from FSS to VFS if you want to continue using FSS
based filesystem modules (e.g. the Xenix or DOS filesystems) and put in
the 4.xBSD style filesystem type without change. I think that if you
want to ease the transition to System V.4, and already have, as you
should, Syctem V.4 source, this is the way to go -- modifying the V.3
kerne, for V.4's VFS instead of FSS, and putting in a module that
presents a VFS interface to the kernel and an FFS one to V.3 style
filesystem types (since the FFS interface is lower level and more
restrictive than the VFS one, I think doing the opposite is much harder,
but I cannot say for sure without looking hard at the V.3 FSS and V.4
VFS interface details).

	Note that 4.3BSD-reno and System V.4 (and SunOS) use
	a VFS style interface that is much similar, but not identical,
	regrettably. Not two major UNIH variants define exactly the same
	interface to installable filesystem modules.
--
Piercarlo "Peter" Grandi           | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

guy@auspex.auspex.com (Guy Harris) (09/12/90)

>>The standard file system with S5 releases prior to S5R4 is V7-based, and
>>has a 14-character limit on file names, yes.  S5R4 also comes with the
>>4.3BSD file system, which has a 255-character limit....
>
>Can you say POSIX compliance?

Yes, I can.  I can even say it on a system with a 255-character limit on
filenames; there is *NOTHING* about a limit higher than 14 characters
that violates POSIX.  POSIX says the *minimum* limit that a system may
impose is 14 characters.

The *actual* limit is pathname-dependent (consider an S5R4 system with
both S5 and UFS file systems mounted, for example), and its value for
some particular directory can be fetched with "pathconf()". 

jeff@quark.WV.TEK.COM (Jeff Beadles) (09/12/90)

pcg@cs.aber.ac.uk (Piercarlo Grandi) babbles:
|On 7 Sep 90 07:35:20 GMT, cliff@motcsd.csd.mot.com (cliff.rodriguez) said:
|
|cliff> We are working on a project to convert our system V based system
|cliff> (ver 3) from 14 char file names to something much larger.
|
|Do you *really* need much larger? Why? If something like 30 instead of
|14 would do, an easy hack exists.

We could use a few less "hacks". :-)

|cliff> Has anyone out there done this, or heard it done?  I need to know
|cliff> if this is going to be the slow tedious task I think it is.
|
|Well, this can be (and has been) done in two ways:
|
|1) keeping the current organization, but just extending the size limit.
|For example, you could have directory entries that are 32 bytes long,
|for 30 byte file names, or 64 bytes long, for 62 byte file names. This
|does not require much more than changing a #define or two and
|recompiling the kernel, the dirent library, and a few applications that
|do not use it (mkfs, fsck, etc...). It does make directories grow in
|size, but I think that's not too important -- many directories are well
|under 512 bytes, i.e. 32 entries, and doubling the entry size to 32
|bytes would not consume any additional disk or memory at all in this
|case.
...

This is **NOT** true.  There are hard-coded user programs that depend on a
14 character filename limit!  It's by no means as easy as changing a
#define or two.

For example, what will this code fragment do with >14 character filenames?

...
	for(i=0; i<14; i++)
		if(*xx)
			*yy++ = *xx++;
		else
			break;
	*yy ='\0';
...

This is typical of parts of the SYSV 3.2 code.  True, this is not an robust way
to handle this, but it is typical of the code.  The ONLY way to find these
sorts of problems is by inspection or searching.


	-Jeff
-- 
Jeff Beadles  jeff@onion.pdx.com

machina@uts.amdahl.com (Miguel A. Ramirez) (09/19/90)

In article <8738@orca.wv.tek.com> jeff@onion.pdx.com (Jeff Beadles) writes:
>pcg@cs.aber.ac.uk (Piercarlo Grandi) babbles:
>|
>|Do you *really* need much larger? Why? If something like 30 instead of
>|14 would do, an easy hack exists.
>
>We could use a few less "hacks". :-)

I'll second this! 


>|Well, this can be (and has been) done in two ways:
>|
>|1) keeping the current organization, but just extending the size limit.
[...]
>This is **NOT** true.  There are hard-coded user programs that depend on a
>14 character filename limit!  It's by no means as easy as changing a
>#define or two.
>
>For example, what will this code fragment do with >14 character filenames?
>...
>	for(i=0; i<14; i++)
>		if(*xx)
>			*yy++ = *xx++;
>		else
>			break;
>	*yy ='\0';
>...
>
>This is typical of parts of the SYSV 3.2 code.  True, this is not an robust way
>to handle this, but it is typical of the code.  The ONLY way to find these
>sorts of problems is by inspection or searching.
>
Finally, someone on the net with a much better grip on reallity. There's no 
such thing as an easy hack. BTW, Piercarlo  did you test not only the kernel
but also all the commands that were effected by the long file name support? 
No? But I thought it was an easy hack? 
-- 

Miguel A. Ramirez, | machina@uts.amdahl.com | {sun,uunet}!amdahl!machina

pcg@cs.aber.ac.uk (Piercarlo Grandi) (09/21/90)

On 19 Sep 90 06:18:17 GMT, machina@uts.amdahl.com (Miguel A. Ramirez) said:

	[ ... on how many user programs have an hard coded 14 for the
	max length of file name in a System V environment, and thus
	would not respond to a change in a #define ... ]

machina> There's no such thing as an easy hack.

Indeed, being able to change a #define'd symbol from 14 to 30 is not an
easy hack, and I have been only been trying to sound funny; much more
seriously, it is the reason why the #define'd symbol has been there
since at least 10 years ago, to allow for *easy* parametrization.

Some user programs have been silly enough to avoid it? They would break
under *any* scheme to change the filename length, so changing the
#define is the _easiest_ way out, because at least it impacts the kernel
and kernel-level utilities less than the other choices. Once you have
decided you want to change the maximum file name length, what is 'easy'
is relative to that decision.

Maybe I should have written 'an easy (relative to the other
alternatives) way to satisfy your requirement in a strict sense and with
the easiest and smallest (relatively, but also, to me, in absolute
terms) changes to kernel and command sources is ...' instead of 'an easy
hack is ...'.

My postings are already too long -- I don't really want to write out
everything in extenso, just to avoid people like you getting confused.

News is not Congress (yet :->), and articles are not Bills.

machina> BTW, Piercarlo did you test not only the kernel but also all
machina> the commands that were effected by the long file name support?

Actually, I think I have listed most of those in the standard System V
distribution (I think I forgot SCCS, and no doubt some others) in some
past article.  Let me repeat, very few programs actually scan
directories. Most work on collections of file pathnames, and of these
only a few break a file pathname in file names (most are happy with
mucking with suffixes at the end of pathnames).

One may want to have a look at the BSD sources (freely available to
those that have a System V source license) and scan them for usage of
MAXNAMLEN; one would easily find the list of programs (modulo some BSD
vs. System V differences) where one has to become suspicious, because
the BSD crew put in all those MAXNAMLENs when they had to convert from
the V7/V32/4.1BSD/System III/System V to the FFS directory organization.

machina> No?  But I thought it was an easy hack?  --

You seem so sure, you must have done so. So please, for our information,
post a list of commands and libraries in the standard System V
distribution that have an hard coded 14, the relative percentage of
source files, and in how many places for each.

Also, what is easy depends on many factors; what is easy for me may be a
big problem to you, for example, or (improbably :->) viceversa. For me
using find(1), xargs(1) and egrep(1) is easy, for example, and not even
too time consuming :-) :-) :-).

The difficult problems are others even if somebody may seem daunted by
the consequences of changing a fairly old and well understood symbolic
constant.

As to the real problems, another example; from evidence easily available
using any AT&T, Sun or Dec (just to mention three big names) UNIX
kernel, doing trivial tuning or even just avoiding gross misdesigns of
page (and working set) replacement kernel modules and of the programs
that run under them seems to be impossibly difficult (for those
manufacturers, at least) or to require an inordinate number of releases.

	(spoiler: actually paging or swapping module design is not an
	easy task for anybody, especially if compared with merely
	changing the file name length leaving the directory structure
	the same. It is a bit easier though than what would be apparent
	from the problems that AT&T, Sun and Dec (and many others) seem
	to have with it).
--
Piercarlo "Peter" Grandi           | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

machina@uts.amdahl.com (Miguel A. Ramirez) (09/26/90)

In article <PCG.90Sep21142146@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
*On 19 Sep 90 06:18:17 GMT, machina@uts.amdahl.com (Miguel A. Ramirez) said:
*
*	[ ... on how many user programs have an hard coded 14 for the
*	max length of file name in a System V environment, and thus
*	would not respond to a change in a #define ... ]
*
*
*Maybe I should have written 'an easy (relative to the other
*alternatives) 

Yes, this would have been better. 
[...]
*
*machina> No?  But I thought it was an easy hack?  --
*
*You seem so sure, you must have done so. So please, for our information,
*post a list of commands and libraries in the standard System V
*distribution that have an hard coded 14, the relative percentage of
*source files, and in how many places for each.

Ah Piercarlo, I'd love to! Especially since I have to gives this info to our 
test group.   But company policy prohibits me from giving 
away valuable information. Trust me, if you ever have the pleasure 
of working on a Amdahl running UTS 2.1 or greater you'll see what I mean.


-- 

Miguel A. Ramirez, | machina@uts.amdahl.com | {sun,uunet}!amdahl!machina