std-unix@longway.TIC.COM (Moderator, John S. Quarterman) (11/22/89)
From: Andy Tanenbaum <uunet!cs.vu.nl!ast>
The <dirent.h> header is required by P1003.1 to have a field
d_name []
Now the question arises about what size to use there. One possibility is
d_name[NAME_MAX+1]
However, doing this means that <limits.h> must be included. As far as I
can see, the following is a conforming application:
#include <dirent.h>
main()
{
(void) opendir("/usr");
}
If one uses NAME_MAX in <dirent.h>, we have a conforming application that
won't even compile.
Another solution is to put
#define _NAME_MAX 14
in <dirent.h> and use that (assuming the result of the discussion on
name space pollution permits this). It might work, but it is hardly
elegant.
What's an implementer to do?
Andy Tanenbaum (ast@cs.vu.nl)
Volume-Number: Volume 17, Number 65
gwyn@BRL.MIL (Doug Gwyn) (11/23/89)
From: gwyn@BRL.MIL (Doug Gwyn) In article <437@longway.TIC.COM> Andy Tanenbaum <uunet!cs.vu.nl!ast> writes: >Now the question arises about what size to use there. One possibility is > d_name[NAME_MAX+1] >However, doing this means that <limits.h> must be included. You, the implementer, could manually replace that NAME_MAX with the appropriate value (perhaps found by inspecting <limits.h>). This is the same issue as occurs when declaring v*printf() in <stdio.h>; the related header need not (nay, MUST not) be included, but a compatible type (or value in the NAME_MAX case) must be used. >What's an implementer to do? What I did in my implementation was to cheat: char d_name[1]; We were careful to word the IEEE Std 1003.1 specification so that this is explicitly allowed. Volume-Number: Volume 17, Number 66
std-unix@longway.TIC.COM (Moderator, John S. Quarterman) (11/24/89)
From: Andy Tanenbaum <uunet!cs.vu.nl!ast> In article <438@longway.TIC.COM> gwyn@brl.arpa (Doug Gwyn) writes: >You, the implementer, could manually replace that NAME_MAX with the >appropriate value (perhaps found by inspecting <limits.h>). It is true I could just put 14 there, or #define it to be _NAME_MAX in that file, but that seems poor programming practice. If that constant ever gets changed in <limits.h> but not in <dirent.h> disaster will strike. >What I did in my implementation was to cheat: > char d_name[1]; What happens when a program allocates a struct dirent in a program? The compiler will not allocate enough storage and it will crash when used. Is it legal to add a line #include <limits.h> in <dirent.h>? Andy Tanenbaum (ast@cs.vu.nl) Volume-Number: Volume 17, Number 69
mark@jhereg.Minnetech.MN.ORG (Mark H. Colburn) (11/24/89)
From: mark@jhereg.Minnetech.MN.ORG (Mark H. Colburn) In article <437@longway.TIC.COM> Andy Tanenbaum <uunet!cs.vu.nl!ast> writes: >From: Andy Tanenbaum <uunet!cs.vu.nl!ast> > >The <dirent.h> header is required by P1003.1 to have a field > d_name [] > >Now the question arises about what size to use there. One possibility is > d_name[NAME_MAX+1] > At least three implementations that I know of define dname as follows: d_name[1]; And, they put it at the end of the strucutre. In this way, when the structure is allocated, the implementation may allocate enough space for the directory name, no matter what it is. For a good, publicly available example, you might want to check out Doug Gwyn's dirent library. -- Mark H. Colburn mark@Minnetech.MN.ORG Open Systems Architects, Inc. Volume-Number: Volume 17, Number 71
gwyn@BRL.MIL (11/25/89)
From: gwyn@BRL.MIL
In article <441@longway.TIC.COM> Andy Tanenbaum <uunet!cs.vu.nl!ast> writes:
-> char d_name[1];
-What happens when a program allocates a struct dirent in a program? The
-compiler will not allocate enough storage and it will crash when used.
That's what happens when programmers assume things that are not promised
by the standards.
-Is it legal to add a line
-#include <limits.h>
-in <dirent.h>?
NO.
Volume-Number: Volume 17, Number 70
std-unix@longway.TIC.COM (Moderator, John S. Quarterman) (11/26/89)
From: Andy Tanenbaum <uunet!cs.vu.nl!ast> In article <442@longway.TIC.COM> gwyn@brl.arpa (Doug Gwyn) writes: >That's what happens when programmers assume things that are not promised >by the standards. I don't follow. What is it that that the standards don't promise. Surely a programmer may declare a struct dirent, since readdir() returns a pointer to one of them. Furthermore, a programmer may assume that d_name is an array of characters that can hold a file name. I don't see how you can put a file name in 1 character. I don't see any alternative than to allocate NAME_MAX+1 characters there. Why doesn't the standard require <dirent.h> to have <limits.h> as a prerequisite, so that NAME_MAX is at least known. Andy Tanenbaum (ast@cs.vu.nl) Volume-Number: Volume 17, Number 72
karish@forel.stanford.edu (Chuck Karish) (11/28/89)
From: karish@forel.stanford.edu (Chuck Karish) In article <442@longway.TIC.COM> gwyn@brl.arpa (Doug Gwyn) wrote: >In article <441@longway.TIC.COM> Andy Tanenbaum <uunet!cs.vu.nl!ast> writes: >-> char d_name[1]; >-What happens when a program allocates a struct dirent in a program? The >-compiler will not allocate enough storage and it will crash when used. > >That's what happens when programmers assume things that are not promised >by the standards. This is spelled out in the rationale (B.5.1.1). >-Is it legal to add a line >-#include <limits.h> >-in <dirent.h>? > >NO. A citation would be more useful here than this proclamation. I haven't been able to find anything in the 1003.1 documents that would prohibit this. The form of a header is defined by the implementation. There are many places in the Standard where it is required that a particular identifier be available when a particular header is #included, but I haven't found any that require that identifiers not be visible when the headers to which they are assigned have not been #included. Portable application code must #include headers as listed in the function descriptions in the standard, if only for compatibility with implementations that don't support ANSI C. It will be easier to identify non-portable code under implementations that refrain from #including, for example, <sys/types.h> in <stat.h>. Chuck Karish karish@mindcraft.com (415) 323-9000 karish@forel.stanford.edu Volume-Number: Volume 17, Number 75
gwyn@BRL.MIL (11/28/89)
From: gwyn@BRL.MIL In article <444@longway.TIC.COM> Andy Tanenbaum <uunet!cs.vu.nl!ast> writes: >In article <442@longway.TIC.COM> gwyn@brl.arpa (Doug Gwyn) writes: >>That's what happens when programmers assume things that are not promised >>by the standards. >I don't follow. What is it that that the standards don't promise. >From IEEE Std 1003.1-1988: "The character array d_name is of unspecified size". >Surely a programmer may declare a struct dirent, ... Sure, but that doesn't mean that it will be particularly useful to do so. >Furthermore, a programmer may assume that d_name is an array of characters >that can hold a file name. That is the important thing that is NOT promised by the standard. d_name must have type char[] and the filename that it represents must be null terminated, with no more than NAME_MAX bytes before the null terminator. >I don't see how you can put a file name in 1 character. I don't see any >alternative than to allocate NAME_MAX+1 characters there. Because it is wasteful to allocate so much storage for what are typically short strings, many (maybe even most) implementations allocate just as much as is actually needed for each individual struct dirent (including possibly a few bytes for alignment). Practically all C compilers support this kind of "type punning", but the programmer need to be careful not to make unwarranted assumptions. The reason the standard does not specify char d_name[NAME_MAX+1] is precisely to allow this particular implementation technique. >Why doesn't the standard require <dirent.h> to have <limits.h> as a >prerequisite, so that NAME_MAX is at least known. IEEE Std 1003.1-1988 explains how a programmer who wished to obtain that information may do so. Since the implementation of <dirent.h> does not require the macro NAME_MAX, it would have been pretty silly to have required <limits.h> to be included before <dirent.h>. Volume-Number: Volume 17, Number 74
henry@utzoo.uucp (11/28/89)
From: henry@utzoo.uucp >From: Andy Tanenbaum <uunet!cs.vu.nl!ast> >I don't follow. What is it that that the standards don't promise. Surely >a programmer may declare a struct dirent... That is exactly what is not promised: that you can declare a `struct dirent' (as opposed to a `struct dirent *') that is of any use to you. The only use for `struct dirent' defined in 1003.1 is that readdir() returns a pointer to one, and that the thing that pointer points to has a member `d_name' that you can examine. There is no promise that the type `struct dirent' is good for anything else whatsoever. Henry Spencer at U of Toronto Zoology uunet!attcan!utzoo!henry henry@zoo.toronto.edu Volume-Number: Volume 17, Number 73
dmr@research.att.com (Dennis Ritchie) (11/29/89)
From: dmr@research.att.com (Dennis Ritchie) I wish Gwyn et. al had sounded a bit more embarrassed about using `char d_name[1]' in struct dirent. Tanenbaum is absolutely correct to question it; it is an abuse of the language and would not pass a compiler system with careful run-time checks. From the language point of view, there is no reason at all to forbid declaring an instance of struct dirent, or copying the value pointed to by the value of readdir(). Gwyn's definition implies incorrect C. (Well, the definition is well-defined, but not the expectation that there is more than 1 character actually in the d_name array). There is no such type as char[], and `char d_name[]' may not appear in a structure, and if the declaration is `char d_name[1]' then you may not refer to d_name[i] when i>1. I don't have the POSIX wording at hand, but if it forbids `struct dirent d = *readdir(dp)' then it is flaky. Dennis Ritchie dmr@research.att.com att!research!dmr Volume-Number: Volume 17, Number 76
jms@Apple.COM (John Sovereign) (12/01/89)
From: jms@Apple.COM (John Sovereign) Since NAME_MAX is filesystem-dependent, NAME_MAX is probably a poor choice for an implementation's definition of d_name, unless the implementation KNOWS that it will only talk to filesystems which limit filenames to NAME_MAX. In article <448@longway.TIC.COM> dmr@research.att.com (Dennis Ritchie) writes: >From: dmr@research.att.com (Dennis Ritchie) > >I don't have the POSIX wording at hand, but if it forbids >`struct dirent d = *readdir(dp)' then it is flaky. > POSIX (and the "historical implementation" which introduced this) is flaky. John Sovereign jms@apple.com Volume-Number: Volume 17, Number 77
std-unix@longway.TIC.COM (Moderator, John S. Quarterman) (12/01/89)
From: Doug Gwyn <uunet!smoke.brl.mil!gwyn> In article <448@longway.TIC.COM> dmr@research.att.com (Dennis Ritchie) writes: >I wish Gwyn et. al had sounded a bit more embarrassed about using >`char d_name[1]' in struct dirent. Here is the line in question taken directly from my PD dirent implementation: char d_name[1]; /* name of file */ /* non-ANSI */ You will note that I'm well aware that a trick is being used here. I don't like such tricks either. The problem is, the alternatives were all worse: char *d_name; /* programs need to know whether d_name specifies an array or not, due to a generic C botch in using array names; P1003 used to be ambiguous about this but finally required it to be an array */ char d_name[HUGE_NUMBER]; /* valid, but wastes a lot of space */ char d_name[0]; /* worse than [1] according to ANSI C */ char d_name[]; /* almost certain to cause a diagnostic */ >There is no such type as char[], and `char d_name[]' may not appear >in a structure, and if the declaration is `char d_name[1]' then >you may not refer to d_name[i] when i>1. Certainly it is unportable usage, i.e. not guaranteed to work by the C language specification. However, there is a large body of existing C code (typically implementing network protocols) that relies on exactly this trick, precisely because there is no really good alternative. I have yet to hear of a production UNIX system where this trick fails. (Perhaps 10th Edition UNIX is one?) Probably what I really should have done was to parameterize the "1": char d_name[1+_DNAME_LEN]; /* _DNAME_LEN=0 if you can, _DNAME_LEN=PATH_MAX otherwise */ That would allow the dirent package installer a quick solution for C environments that are fussier about this than the typical UNIX ones. I may do this for future releases of my package. >I don't have the POSIX wording at hand, but if it forbids >`struct dirent d = *readdir(dp)' then it is flaky. It says: The readdir() function returns a pointer to an object of type struct dirent that includes the member: Member Member Type Name Description ______ ______ ________________________ char[] d_name Null-terminated filename The character array d_name is of unspecified size, but the number of bytes preceding the terminating null character shall not exceed {NAME_MAX}. I believe my implementation meets these specifications, taken literally. At one time, the description of readdir() contained a warning about copying struct dirents, but by the time of the final Std 1003.1 the entire section had been rewritten and this got lost in the shuffle. I think some other unwanted changes were introduced too, but at the moment I can't recall what they were. (We also have to keep beating down attempts to legislate support for seekdir() and telldir().) Anyway, the whole point of the words "unspecified size" really was to permit implementations to use the [1] trick so they could allocate a relatively small struct_dirent+secret_extension if the C compiler permitted it. Otherwise either NAME_MAX+1 or some other defined implementation constant would have been specified in Std 1003.1 (as for c_cc[NCCS]). I would have preferred char*d_name; however, that would be as hard for an application to copy as a struct_dirent+secret_extension. Certainly char d_name[1+PATH_MAX]; /* use actual value for PATH_MAX */ is a legal and portable declaration for d_name that meets the POSIX specs. I happen not to like it because PATH_MAX is potentially unbounded in an ideal networked universe, and always allocating big chunks of space of which a tiny portion is used bothers me more than this particular well-known implementation-specific cheat. My advice for applications using dirent facilities is NOT to assume that a literal copy of the struct dirent is good for anything. If you need to keep the entry string around, you should allocate storage for it based on its strlen(). (Since the other members of a struct dirent are unspecified, you can't use them anyway in a POSIX-portable application.) There are numerous related issues with IEEE Std 1003.1 that we could get into. For example, it is not stated whether or not it is safe for an application to use a copy of a struct dirent or of several other system data structures where the struct has a different address from the one that the allocator (e.g. readdir()) assigned. (Presumably an implementation could depend on the object residing in a known place.) Also, since there are no constraints on other struct dirent member names, the traditional practice of using d_* for these is unsafe; instead the "always reserved for the implementation" name space must be used to avoid problems like #define d_namlen 42 #include <dirent.h> I don't know if there's much point into going into such problems in more detail. My personal feeling is that 1003.1 serves ONE useful purpose: By specifying it in OS procurements (in ADDITION to more useful specs such as ANSI/ISO C and SVID), one can obtain portable interfaces for some otherwise problematic areas such as reliable signals and terminal modes. I wish I could say the same about other 1003.* standards-in-progress, but I cannot. 1003.2 in particular seems to be legislating an utterly horrible environment instead of specifying cleanly the UNIX utility subset of interest to portable applications. You can bet I'm not going to include it in procurement specifications. Volume-Number: Volume 17, Number 78