[comp.std.unix] Query about <dirent.h>

std-unix@longway.TIC.COM (Moderator, John S. Quarterman) (11/22/89)

From: Andy Tanenbaum <uunet!cs.vu.nl!ast>

The <dirent.h> header is required by P1003.1 to have a field
	d_name []

Now the question arises about what size to use there.  One possibility is
	d_name[NAME_MAX+1]

However, doing this means that <limits.h> must be included.  As far as I
can see, the following is a conforming application:

#include <dirent.h>

main()
{
  (void) opendir("/usr");
}

If one uses NAME_MAX in <dirent.h>, we have a conforming application that
won't even compile.

Another solution is to put

#define _NAME_MAX  14

in <dirent.h> and use that (assuming the result of the discussion on
name space pollution permits this).  It might work, but it is hardly
elegant.

What's an implementer to do?

Andy Tanenbaum (ast@cs.vu.nl)

Volume-Number: Volume 17, Number 65

gwyn@BRL.MIL (Doug Gwyn) (11/23/89)

From: gwyn@BRL.MIL (Doug Gwyn)

In article <437@longway.TIC.COM> Andy Tanenbaum <uunet!cs.vu.nl!ast> writes:
>Now the question arises about what size to use there.  One possibility is
>	d_name[NAME_MAX+1]
>However, doing this means that <limits.h> must be included.

You, the implementer, could manually replace that NAME_MAX with the
appropriate value (perhaps found by inspecting <limits.h>).  This is
the same issue as occurs when declaring v*printf() in <stdio.h>; the
related header need not (nay, MUST not) be included, but a compatible
type (or value in the NAME_MAX case) must be used.

>What's an implementer to do?

What I did in my implementation was to cheat:
	char	d_name[1];
We were careful to word the IEEE Std 1003.1 specification so that
this is explicitly allowed.

Volume-Number: Volume 17, Number 66

std-unix@longway.TIC.COM (Moderator, John S. Quarterman) (11/24/89)

From: Andy Tanenbaum <uunet!cs.vu.nl!ast>

In article <438@longway.TIC.COM> gwyn@brl.arpa (Doug Gwyn) writes:

>You, the implementer, could manually replace that NAME_MAX with the
>appropriate value (perhaps found by inspecting <limits.h>). 

It is true I could just put 14 there, or #define it to be _NAME_MAX in that
file, but that seems poor programming practice.  If that constant ever gets
changed in <limits.h> but not in <dirent.h> disaster will strike.

>What I did in my implementation was to cheat:
>	char	d_name[1];

What happens when a program allocates a struct dirent in a program?  The
compiler will not allocate enough storage and it will crash when used.

Is it legal to add a line

#include <limits.h> 

in <dirent.h>?

Andy Tanenbaum (ast@cs.vu.nl)

Volume-Number: Volume 17, Number 69

mark@jhereg.Minnetech.MN.ORG (Mark H. Colburn) (11/24/89)

From: mark@jhereg.Minnetech.MN.ORG (Mark H. Colburn)

In article <437@longway.TIC.COM> Andy Tanenbaum <uunet!cs.vu.nl!ast> writes:
>From: Andy Tanenbaum <uunet!cs.vu.nl!ast>
>
>The <dirent.h> header is required by P1003.1 to have a field
>	d_name []
>
>Now the question arises about what size to use there.  One possibility is
>	d_name[NAME_MAX+1]
>

At least three implementations that I know of define dname as follows:

	d_name[1];

And, they put it at the end of the strucutre.  In this way, when the
structure is allocated, the implementation may allocate enough space for
the directory name, no matter what it is.  For a good, publicly available
example, you might want to check out Doug Gwyn's dirent library.

-- 
Mark H. Colburn                       mark@Minnetech.MN.ORG
Open Systems Architects, Inc.

Volume-Number: Volume 17, Number 71

gwyn@BRL.MIL (11/25/89)

From: gwyn@BRL.MIL

In article <441@longway.TIC.COM> Andy Tanenbaum <uunet!cs.vu.nl!ast> writes:
->	char	d_name[1];
-What happens when a program allocates a struct dirent in a program?  The
-compiler will not allocate enough storage and it will crash when used.

That's what happens when programmers assume things that are not promised
by the standards.

-Is it legal to add a line
-#include <limits.h> 
-in <dirent.h>?

NO.

Volume-Number: Volume 17, Number 70

std-unix@longway.TIC.COM (Moderator, John S. Quarterman) (11/26/89)

From: Andy Tanenbaum <uunet!cs.vu.nl!ast>

In article <442@longway.TIC.COM> gwyn@brl.arpa (Doug Gwyn) writes:
>That's what happens when programmers assume things that are not promised
>by the standards.
I don't follow.  What is it that that the standards don't promise.  Surely
a programmer may declare a struct dirent, since readdir() returns a pointer
to one of them.  Furthermore, a programmer may assume that d_name is an
array of characters that can hold a file name.  I don't see how you can
put a file name in 1 character.  I don't see any alternative than to
allocate NAME_MAX+1 characters there.  Why doesn't the standard require
<dirent.h> to have <limits.h> as a prerequisite, so that NAME_MAX
is at least known.

Andy Tanenbaum (ast@cs.vu.nl)

Volume-Number: Volume 17, Number 72

karish@forel.stanford.edu (Chuck Karish) (11/28/89)

From: karish@forel.stanford.edu (Chuck Karish)

In article <442@longway.TIC.COM> gwyn@brl.arpa (Doug Gwyn) wrote:
>In article <441@longway.TIC.COM> Andy Tanenbaum <uunet!cs.vu.nl!ast> writes:
>->	char	d_name[1];
>-What happens when a program allocates a struct dirent in a program?  The
>-compiler will not allocate enough storage and it will crash when used.
>
>That's what happens when programmers assume things that are not promised
>by the standards.

This is spelled out in the rationale (B.5.1.1).

>-Is it legal to add a line
>-#include <limits.h> 
>-in <dirent.h>?
>
>NO.

A citation would be more useful here than this proclamation.

I haven't been able to find anything in the 1003.1 documents that would
prohibit this.

The form of a header is defined by the implementation.  There are many
places in the Standard where it is required that a particular
identifier be available when a particular header is #included, but I
haven't found any that require that identifiers not be visible when the
headers to which they are assigned have not been #included.

Portable application code must #include headers as listed in the
function descriptions in the standard, if only for compatibility
with implementations that don't support ANSI C.  It will be easier
to identify non-portable code under implementations that refrain from
#including, for example, <sys/types.h> in <stat.h>.

	Chuck Karish		karish@mindcraft.com
	(415) 323-9000		karish@forel.stanford.edu

Volume-Number: Volume 17, Number 75

gwyn@BRL.MIL (11/28/89)

From: gwyn@BRL.MIL

In article <444@longway.TIC.COM> Andy Tanenbaum <uunet!cs.vu.nl!ast> writes:
>In article <442@longway.TIC.COM> gwyn@brl.arpa (Doug Gwyn) writes:
>>That's what happens when programmers assume things that are not promised
>>by the standards.
>I don't follow.  What is it that that the standards don't promise.

>From IEEE Std 1003.1-1988: "The character array d_name is of unspecified size".

>Surely a programmer may declare a struct dirent, ...

Sure, but that doesn't mean that it will be particularly useful to do so.

>Furthermore, a programmer may assume that d_name is an array of characters
>that can hold a file name.

That is the important thing that is NOT promised by the standard.
d_name must have type char[] and the filename that it represents
must be null terminated, with no more than NAME_MAX bytes before
the null terminator.

>I don't see how you can put a file name in 1 character.  I don't see any
>alternative than to allocate NAME_MAX+1 characters there.

Because it is wasteful to allocate so much storage for what are typically
short strings, many (maybe even most) implementations allocate just as
much as is actually needed for each individual struct dirent (including
possibly a few bytes for alignment).  Practically all C compilers support
this kind of "type punning", but the programmer need to be careful not to
make unwarranted assumptions.

The reason the standard does not specify char d_name[NAME_MAX+1] is
precisely to allow this particular implementation technique.

>Why doesn't the standard require <dirent.h> to have <limits.h> as a
>prerequisite, so that NAME_MAX is at least known.

IEEE Std 1003.1-1988 explains how a programmer who wished to obtain
that information may do so.  Since the implementation of <dirent.h>
does not require the macro NAME_MAX, it would have been pretty silly
to have required <limits.h> to be included before <dirent.h>.

Volume-Number: Volume 17, Number 74

henry@utzoo.uucp (11/28/89)

From: henry@utzoo.uucp

>From: Andy Tanenbaum <uunet!cs.vu.nl!ast>
>I don't follow.  What is it that that the standards don't promise.  Surely
>a programmer may declare a struct dirent...

That is exactly what is not promised:  that you can declare a `struct
dirent' (as opposed to a `struct dirent *') that is of any use to you.
The only use for `struct dirent' defined in 1003.1 is that readdir()
returns a pointer to one, and that the thing that pointer points to has
a member `d_name' that you can examine.  There is no promise that the
type `struct dirent' is good for anything else whatsoever.

                                     Henry Spencer at U of Toronto Zoology
                                 uunet!attcan!utzoo!henry henry@zoo.toronto.edu


Volume-Number: Volume 17, Number 73

dmr@research.att.com (Dennis Ritchie) (11/29/89)

From: dmr@research.att.com (Dennis Ritchie)

I wish Gwyn et. al had sounded a bit more embarrassed about using
`char d_name[1]' in struct dirent.  Tanenbaum is absolutely correct to
question it; it is an abuse of the language and would not pass a
compiler system with careful run-time checks.  From the language point
of view, there is no reason at all to forbid declaring an instance of
struct dirent, or copying the value pointed to by the value of readdir().
Gwyn's definition implies incorrect C.   (Well, the definition
is well-defined, but not the expectation that there is more than 1 character
actually in the d_name array).

There is no such type as char[], and `char d_name[]' may not appear
in a structure, and if the declaration is `char d_name[1]' then
you may not refer to d_name[i] when i>1.

I don't have the POSIX wording at hand, but if it forbids
`struct dirent d = *readdir(dp)' then it is flaky.

	Dennis Ritchie
	dmr@research.att.com
	att!research!dmr

Volume-Number: Volume 17, Number 76

jms@Apple.COM (John Sovereign) (12/01/89)

From: jms@Apple.COM (John Sovereign)

Since NAME_MAX is filesystem-dependent, NAME_MAX is probably a poor choice for
an implementation's definition of d_name, unless the implementation KNOWS that
it will only talk to filesystems which limit filenames to NAME_MAX.

In article <448@longway.TIC.COM> dmr@research.att.com (Dennis Ritchie) writes:
>From: dmr@research.att.com (Dennis Ritchie)
>
>I don't have the POSIX wording at hand, but if it forbids
>`struct dirent d = *readdir(dp)' then it is flaky.
>

POSIX (and the "historical implementation" which introduced this) is flaky.

John Sovereign
jms@apple.com

Volume-Number: Volume 17, Number 77

std-unix@longway.TIC.COM (Moderator, John S. Quarterman) (12/01/89)

From: Doug Gwyn <uunet!smoke.brl.mil!gwyn>

In article <448@longway.TIC.COM> dmr@research.att.com (Dennis Ritchie) writes:
>I wish Gwyn et. al had sounded a bit more embarrassed about using
>`char d_name[1]' in struct dirent.

Here is the line in question taken directly from my PD dirent implementation:
	char		d_name[1];	/* name of file */	/* non-ANSI */
You will note that I'm well aware that a trick is being used here.

I don't like such tricks either.  The problem is, the alternatives
were all worse:
	char	*d_name;	/* programs need to know whether d_name
				   specifies an array or not, due to a
				   generic C botch in using array names;
				   P1003 used to be ambiguous about this
				   but finally required it to be an array */
	char	d_name[HUGE_NUMBER];	/* valid, but wastes a lot of space */
	char	d_name[0];	/* worse than [1] according to ANSI C */
	char	d_name[];	/* almost certain to cause a diagnostic */

>There is no such type as char[], and `char d_name[]' may not appear
>in a structure, and if the declaration is `char d_name[1]' then
>you may not refer to d_name[i] when i>1.

Certainly it is unportable usage, i.e. not guaranteed to work
by the C language specification.  However, there is a large body
of existing C code (typically implementing network protocols) that
relies on exactly this trick, precisely because there is no really
good alternative.  I have yet to hear of a production UNIX system
where this trick fails.  (Perhaps 10th Edition UNIX is one?)

Probably what I really should have done was to parameterize the "1":
	char	d_name[1+_DNAME_LEN];	/* _DNAME_LEN=0 if you can,
					   _DNAME_LEN=PATH_MAX otherwise */
That would allow the dirent package installer a quick solution for
C environments that are fussier about this than the typical UNIX ones.
I may do this for future releases of my package.

>I don't have the POSIX wording at hand, but if it forbids
>`struct dirent d = *readdir(dp)' then it is flaky.

It says:
	The readdir() function returns a pointer to an object of type
	struct dirent that includes the member:

		Member	Member
		 Type	 Name		Description
		______	______	________________________
		char[]	d_name	Null-terminated filename

	The character array d_name is of unspecified size, but the
	number of bytes preceding the terminating null character
	shall not exceed {NAME_MAX}.

I believe my implementation meets these specifications, taken
literally.

At one time, the description of readdir() contained a warning about
copying struct dirents, but by the time of the final Std 1003.1 the
entire section had been rewritten and this got lost in the shuffle.
I think some other unwanted changes were introduced too, but at the
moment I can't recall what they were.  (We also have to keep beating
down attempts to legislate support for seekdir() and telldir().)

Anyway, the whole point of the words "unspecified size" really was to
permit implementations to use the [1] trick so they could allocate
a relatively small struct_dirent+secret_extension if the C compiler
permitted it.  Otherwise either NAME_MAX+1 or some other defined
implementation constant would have been specified in Std 1003.1
(as for c_cc[NCCS]).

I would have preferred char*d_name; however, that would be as hard
for an application to copy as a struct_dirent+secret_extension.

Certainly
	char	d_name[1+PATH_MAX];	/* use actual value for PATH_MAX */
is a legal and portable declaration for d_name that meets the POSIX
specs.  I happen not to like it because PATH_MAX is potentially
unbounded in an ideal networked universe, and always allocating big
chunks of space of which a tiny portion is used bothers me more than
this particular well-known implementation-specific cheat.

My advice for applications using dirent facilities is NOT to assume
that a literal copy of the struct dirent is good for anything.  If
you need to keep the entry string around, you should allocate storage
for it based on its strlen().  (Since the other members of a struct
dirent are unspecified, you can't use them anyway in a POSIX-portable
application.)

There are numerous related issues with IEEE Std 1003.1 that we could
get into.  For example, it is not stated whether or not it is safe
for an application to use a copy of a struct dirent or of several other
system data structures where the struct has a different address from
the one that the allocator (e.g. readdir()) assigned.  (Presumably an
implementation could depend on the object residing in a known place.)
Also, since there are no constraints on other struct dirent member
names, the traditional practice of using d_* for these is unsafe;
instead the "always reserved for the implementation" name space must
be used to avoid problems like
	#define d_namlen 42
	#include <dirent.h>
I don't know if there's much point into going into such problems in
more detail.  My personal feeling is that 1003.1 serves ONE useful
purpose:  By specifying it in OS procurements (in ADDITION to more
useful specs such as ANSI/ISO C and SVID), one can obtain portable
interfaces for some otherwise problematic areas such as reliable
signals and terminal modes.  I wish I could say the same about other
1003.* standards-in-progress, but I cannot.  1003.2 in particular
seems to be legislating an utterly horrible environment instead of
specifying cleanly the UNIX utility subset of interest to portable
applications.  You can bet I'm not going to include it in procurement
specifications.

Volume-Number: Volume 17, Number 78