[comp.unix.questions] opendir

jdc@naucse.UUCP (John Campbell) (06/25/89)

A favorite program of mine, that works with Doug Gwyn's dirent
routines, was reposted and when built yielded the following:

   RE - can't read RE
   Ma - can't read Ma

with each file name truncated to two letters.  It turns out that this
was due to the fact that Doug Gwyn's dirent.h file (at least my version)
refused to put a 14 character limit (or any limit) on file names.  Instead
he defined the following (/usr/include/sys/dirent.h):

======start of doug's dirent.h======
/*
	<sys/dirent.h> -- file system independent directory entry (SVR3)

	last edit:	25-Apr-1987	D A Gwyn

	prerequisite:	<sys/types.h>
*/

struct dirent				/* data from getdents()/readdir() */
	{
	long		d_ino;		/* inode number of entry */
	off_t		d_off;		/* offset of disk directory entry */
	unsigned short	d_reclen;	/* length of this record */
	char		d_name[1];	/* name of file */	/* non-POSIX */
	};

/* The following nonportable ugliness could have been avoided by defining
   DIRENTSIZ and DIRENTBASESIZ to also have (struct dirent *) arguments. */
#define	DIRENTBASESIZ		(((struct dirent *)0)->d_name \
				- (char *)&((struct dirent *)0)->d_ino)
#define	DIRENTSIZ( namlen )	((DIRENTBASESIZ + sizeof(long) + (namlen)) \
				/ sizeof(long) * sizeof(long))

/* DAG -- the following was moved from <dirent.h>, which was the wrong place */
#define	MAXNAMLEN	512		/* maximum filename length */

#ifndef NAME_MAX
#define	NAME_MAX	(MAXNAMLEN - 1)	/* DAG -- added for POSIX */
#endif
=========end of doug's dirent.h========

What this means is that memcpy (x, y, sizeof (struct dirent)) was only 
copying two bytes of the name (structure alignment).  Anyway, I made 
patches based on DIRENTSIZ and DIRENTBASESIZ to work around this--but it 
added a mess to the code and some #define DIRENTSIZ 0 lines for other unix 
implementations.

My question (great, now he gets to it) is, "Is this the best way to
build a portable opendir(), readdir(), etc. package?" Handling arbitrary 
length file names is always a bit more work.  Also, I run on a 3b1 SYSV
(sort of) machine and I have another opendir() package by Scott Hazen 
Muellyer (scott@zorch.uucp) that would have worked with the original code 
since it assumes a fixed size for each directory entry (ndir.h):

======start of scott's ndir.h======
/*	@(#)ndir.h	1.7	10/7/87	*/
#if defined(HP9K5)
/* He should have included it instead of this, but prevent confusion */
#include <ndir.h>
#else /* other */
#ifndef DEV_BSIZE
#define	DEV_BSIZE	512
#endif
#define DIRBLKSIZ	DEV_BSIZE
#define	MAXNAMLEN	255

struct	directy {
	long	d_ino;			/* inode number of entry */
	short	d_reclen;		/* length of this record */
	short	d_namlen;		/* length of string in d_name */
	char	d_name[MAXNAMLEN + 1];	/* name must be no longer than this */
};

/*
 * The DIRSIZ macro gives the minimum record length which will hold
 * the directory entry.  This requires the amount of space in struct directy
 * without the d_name field, plus enough space for the name with a terminating
 * null byte (dp->d_namlen+1), rounded up to a 4 byte boundary.
 */

#ifdef DIRSIZ
#undef DIRSIZ
#endif /* DIRSIZ */
#define DIRSIZ(dp) \
    ((sizeof (struct directy) - (MAXNAMLEN+1)) + (((dp)->d_namlen+1 + 3) &~ 3))

/*
 * Definitions for library routines operating on directories.
 */
typedef struct _dirdesc {
	int	dd_fd;
	long	dd_loc;
	long	dd_size;
	char	dd_buf[DIRBLKSIZ];
} DIR;
#ifndef NULL
#define NULL 0
#endif
extern	DIR *opendir();
extern	struct directy *readdir();
extern	void closedir();

#define rewinddir(dirp)	seekdir((dirp), (long)0)
#endif /* other */
======end of scott's ndir.h======

What is the consensus?  Which way should the package work?  I know
Doug's stuff is wide spread (and good), but is there a reason to
change it's implementation?  What does POSIX say?
-- 
	John Campbell               ...!arizona!naucse!jdc
                                    CAMPBELL@NAUVAX.bitnet
	unix?  Sure send me a dozen, all different colors.

gwyn@smoke.BRL.MIL (Doug Gwyn) (06/27/89)

In article <1509@naucse.UUCP> jdc@naucse.UUCP (John Campbell) writes:
>======start of doug's dirent.h======

Here is the current version:

/*
	<sys/dirent.h> -- file system independent directory entry (SVR3)

	last edit:	27-Oct-1988	D A Gwyn

	prerequisite:	<sys/types.h>
*/

struct dirent				/* data from getdents()/readdir() */
	{
	long		d_ino;		/* inode number of entry */
	off_t		d_off;		/* offset of disk directory entry */
	unsigned short	d_reclen;	/* length of this record */
	char		d_name[1];	/* name of file */	/* non-ANSI */
	};

#ifdef BSD_SYSV				/* (e.g., when compiling getdents.c) */
extern struct dirent	__dirent;	/* (not actually used) */
/* The following is portable, although rather silly. */
#define	DIRENTBASESIZ		(__dirent.d_name - (char *)&__dirent.d_ino)

#else
/* The following nonportable ugliness could have been avoided by defining
   DIRENTSIZ and DIRENTBASESIZ to also have (struct dirent *) arguments.
   There shouldn't be any problem if you avoid using the DIRENTSIZ() macro. */

#define	DIRENTBASESIZ		(((struct dirent *)0)->d_name \
				- (char *)&((struct dirent *)0)->d_ino)
#endif

#define	DIRENTSIZ( namlen )	((DIRENTBASESIZ + sizeof(long) + (namlen)) \
				/ sizeof(long) * sizeof(long))

/* DAG -- the following was moved from <dirent.h>, which was the wrong place */
#define	MAXNAMLEN	512		/* maximum filename length */

#ifndef NAME_MAX
#define	NAME_MAX	(MAXNAMLEN - 1)	/* DAG -- added for POSIX */
#endif

>What this means is that memcpy (x, y, sizeof (struct dirent)) ...

It is strongly implied by the POSIX spec that readdir() "owns" the
contents of this struct; an attempt to keep it from getting overwritten
by making a copy of it is doomed, because there is no portable way to
know how big the actual allocation for the struct dirent is; IEEE 1003.1
specifically states that the character array d_name is of unspecified
size, and this was done deliberately to allow implementations such as
mine.  In some drafts we had required d_name to be a char* rather than
an array (thus the "non-POSIX" comment in the version you posted).
After Section 5.1 had gotten straightened out, taking into account my
feedback, some more comments (from Berkeley, I think) were received and
further changes were made, unfortunately with no opportunity for further
review.  (This was a generic problem with the 1003.1 balloting process.)
There are a lot of things that needed to be more clearly specified.  For
example, can a DIR be copied to make a separate but equal handle on a
directory stream?  (The answer is "no", but it's not specified in
IEEE Std 1003.1.)

The way to save a directory entry is to either copy the name string,
using strlen() to determine the proper size, or to use the (non-POSIX)
telldir() function to obtain a position for a later seekdir().  I
recommend not using telldir()/seekdir() for reasons other than their
being nonstandard.  Note that d_name is the only member of a struct
dirent that POSIX mentions; therefore it's the only part you can
portably use anyway.

>My question (great, now he gets to it) is, "Is this the best way to
>build a portable opendir(), readdir(), etc. package?"

Certainly I think so.  The only system dependency (apart from
stretching the limits of the C language) is isolated in the getdents()
function, which is either a system call (SVR3) or an emulation of one.
Thus porting the package to a previously unsupported environment
consists almost entirely of devising a working getdents() emulation.

By the way, the reason for my using DIRENTBASESIZ etc. as you see them
is that the SVR3 implementors had done so, and my package is intended
to be usable as a direct replacement for SVR3's.  The comment in the
code explains how it could be done better were SVR3 compatibility not
a requirement.  (Actually, much of the SVR3 implementation appears to
have been based on an earlier version of my package.  Very strange
feedback loops we have operating here!)