[net.unix] reading directory under SYSTEM V

guy@sun.uucp (Guy Harris) (09/04/86)

> Cound anybody suggest a method of reading directory information from a C
> program under SYSTEM V UNIX? I need something equivalent to the opendir and 
> readdir functionality supplied under BSD.

This is a UNIX question, not a C question (adding the phrase "from a C
program" does not magically transform a question about UNIX into a question
about C), so I'm moving discussion to "net.unix".

The opendir/readdir/etc. code was originally written for the V7 file system,
as used by 4.1BSD (for which it was originally written) and S5.  The V7 file
system version was posted to the net by Kirk McKusick a while ago (I mailed
a copy to the person who asked the question originally).

Unfortunately, the file that is included by most 4.2 programs that defines
the data structures used by the directory library is <sys/dir.h>, which is
aready used to define the native directory structure on systems with the V7
file system.  This means you can't conveniently drop the new "dir.h" include
file into such a system and avoid the need for #ifdefs when building a
program for systems with the V7 and with the 4.2BSD file system.

The IEEE 1003.1 standard has a slightly modified version of the directory
library routines.  They include <dirent.h>, which is not part of V7, 4BSD,
S3, or S5 prior to S5R3, so it can be dropped into such a system (I hope
nobody else has added a different include file with the same name).  S5R3
provides a 1003.1-compatible version of the directory library.  The only
other difference between the two versions of the directory library is that
the original one calls the directory entry structure "struct direct", which
collides with the "struct direct" specifying the format of directories on
disk, as specified in <sys/dir.h>, while the new one calls it "struct
dirent" to avoid this collision.  Systems using the 4.2BSD file system need
merely provide a <dirent.h> that specifies a "struct dirent" identical to a
"struct direct" in order to provide a 1003.1-compatible version of the
directory library routines.
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com (or guy@sun.arpa)

guy@sun.UUCP (09/08/86)

(Trying once again to move this discussion to "net.unix", where it belongs,
and away from "net.lang.c", where it doesn't; reading directories has
nothing to do with C, and everything to do with UNIX.)

> This may not thrill you, but have you considered forking an 'ls' into
> a tmp file and reading that back? (or, similarly, using a popen() to
> ls)?

Oh, all right, I'll post Kirk McKusick's implementation of the directory
library for systems with the V7 file system (V7, 4.1BSD, System III, System
V), if you insist.  (Kirk posted it to net.sources a long time ago, but I
guess people have lost track of it.)

The one nuisance is that 4.2BSD programs include <sys/dir.h> to get the
declarations of the structure that "readdir" returns a pointer to.
<sys/dir.h> is, on UNIX systems that don't have support for multiple file
system types, the declaration of the format of a directory entry on disk.
4.2BSD can get away with this because the format of the structure that
"readdir" returns a pointer to is the same as the format of a directory
entry on disk; this is not true on other systems.  This means you can't
conveniently just drop in this package and build programs for both file
systems from the same source.

The IEEE 1003.1 standard fixes this by calling the structure that "readdir"
returns a pointer to a "struct dirent", so that this doesn't collide with
the "struct direct" that's defined in various UNIX systems' <sys/dir.h>.
The include file that defines a "struct dirent" is <dirent.h>.  This is what
is implemented in S5R3.  You can easily add a <dirent.h> to systems using the
4.2BSD file system and directory library.

Here it is:

From: sun!decwrl!decvax!ucbvax!mckusick (Kirk Mckusick)
Newsgroups: net.sources
Title: directory access compatibility routines
Article-I.D.: ucbvax.191
Posted: Sat Apr  2 18:36:51 1983
Received: Sun Apr  3 00:38:10 1983

#!/bin/csh
# Run this file as shell script
mkdir libndir
chdir libndir
echo extracting README
cat > README <<'EOF'
The purpose of this library is to simulate the new flexible length
directory names on top of the old directory structure. It allows
programs to be converted to the new directory access interface, so
that they need only be relinked when 4.2bsd becomes available.
'EOF'
echo extracting Makefile
cat > Makefile <<'EOF'
# @(#)Makefile	4.8 (Berkeley) 3/21/83

DESTDIR	=
CFLAGS=	-O

OBJS=	closedir.o opendir.o readdir.o seekdir.o telldir.o
DIST=	README Makefile dir.h closedir.c opendir.c readdir.c \
	seekdir.c telldir.c directory.3s

.c.o:
	${CC} ${CFLAGS} -I. -c $*.c
	-ld -x -r $*.o
	mv a.out $*.o

libndir: ${OBJS}
	ar cru libndir ${OBJS}
	ranlib libndir

install: libndir
	cp dir.h ${DESTDIR}/usr/include/dir.h
	cp libndir ${DESTDIR}/usr/lib/libndir.a
	ranlib ${DESTDIR}/usr/lib/libndir.a
	cp directory.3s /usr/man/man3/directory.3s

clean:
	rm -f libndir ${OBJS}

dist: ${DIST}
	echo "#!/bin/csh" >dist
	echo "# Run this file as shell script" >> dist
	echo "mkdir libndir" >> dist
	echo "chdir libndir" >> dist
	for i in ${DIST}; do ( \
		echo "echo extracting $$i" >> dist; \
		echo "cat > $$i <<'EOF'" >> dist; \
		cat $$i >> dist; \
		echo "'EOF'" >> dist); done
	chmod 775 dist
'EOF'
echo extracting dir.h
cat > dir.h <<'EOF'
/*	dir.h	4.4	82/07/25	*/

/*
 * A directory consists of some number of blocks of DIRBLKSIZ
 * bytes, where DIRBLKSIZ is chosen such that it can be transferred
 * to disk in a single atomic operation (e.g. 512 bytes on most machines).
 *
 * Each DIRBLKSIZ byte block contains some number of directory entry
 * structures, which are of variable length.  Each directory entry has
 * a struct direct at the front of it, containing its inode number,
 * the length of the entry, and the length of the name contained in
 * the entry.  These are followed by the name padded to a 4 byte boundary
 * with null bytes.  All names are guaranteed null terminated.
 * The maximum length of a name in a directory is MAXNAMLEN.
 *
 * The macro DIRSIZ(dp) gives the amount of space required to represent
 * a directory entry.  Free space in a directory is represented by
 * entries which have dp->d_reclen >= DIRSIZ(dp).  All DIRBLKSIZ bytes
 * in a directory block are claimed by the directory entries.  This
 * usually results in the last entry in a directory having a large
 * dp->d_reclen.  When entries are deleted from a directory, the
 * space is returned to the previous entry in the same directory
 * block by increasing its dp->d_reclen.  If the first entry of
 * a directory block is free, then its dp->d_ino is set to 0.
 * Entries other than the first in a directory do not normally have
 * dp->d_ino set to 0.
 */
#define DIRBLKSIZ	512
#define	MAXNAMLEN	255

#ifdef pdp11
#define u_long long
#endif

struct	direct {
	u_long	d_ino;			/* inode number of entry */
	u_short	d_reclen;		/* length of this record */
	u_short	d_namlen;		/* length of string in d_name */
	char	d_name[MAXNAMLEN + 1];	/* name must be no longer than this */
};

/*
 * The DIRSIZ macro gives the minimum record length which will hold
 * the directory entry.  This requires the amount of space in struct direct
 * without the d_name field, plus enough space for the name with a terminating
 * null byte (dp->d_namlen+1), rounded up to a 4 byte boundary.
 */
#undef DIRSIZ
#define DIRSIZ(dp) \
    ((sizeof (struct direct) - (MAXNAMLEN+1)) + (((dp)->d_namlen+1 + 3) &~ 3))

#ifndef KERNEL
/*
 * Definitions for library routines operating on directories.
 */
typedef struct _dirdesc {
	int	dd_fd;
	long	dd_loc;
	long	dd_size;
	char	dd_buf[DIRBLKSIZ];
} DIR;
#ifndef NULL
#define NULL 0
#endif
extern	DIR *opendir();
extern	struct direct *readdir();
extern	long telldir();
extern	void seekdir();
#define rewinddir(dirp)	seekdir((dirp), (long)0)
extern	void closedir();
#endif KERNEL
'EOF'
echo extracting closedir.c
cat > closedir.c <<'EOF'
static char sccsid[] = "@(#)closedir.c 4.2 3/10/82";

#include <sys/types.h>
#include <dir.h>

/*
 * close a directory.
 */
void
closedir(dirp)
	register DIR *dirp;
{
	close(dirp->dd_fd);
	dirp->dd_fd = -1;
	dirp->dd_loc = 0;
	free(dirp);
}
'EOF'
echo extracting opendir.c
cat > opendir.c <<'EOF'
static char sccsid[] = "@(#)opendir.c 4.4 11/12/82";

#include <sys/param.h>
#include <dir.h>

extern char *malloc();

/*
 * open a directory.
 */
DIR *
opendir(name)
	char *name;
{
	register DIR *dirp;
	register int fd;

	if ((fd = open(name, 0)) == -1)
		return NULL;
	if ((dirp = (DIR *)malloc(sizeof(DIR))) == NULL) {
		close (fd);
		return NULL;
	}
	dirp->dd_fd = fd;
	dirp->dd_loc = 0;
	return dirp;
}
'EOF'
echo extracting readdir.c
cat > readdir.c <<'EOF'
static char sccsid[] = "@(#)readdir.c	4.1	(Berkeley)	83/03/21";

#include <sys/types.h>
#include <dir.h>

/*
 * read an old stlye directory entry and present it as a new one
 */
#define	ODIRSIZ	14

struct	olddirect {
	ino_t	od_ino;
	char	od_name[ODIRSIZ];
};

/*
 * get next entry in a directory.
 */
struct direct *
readdir(dirp)
	register DIR *dirp;
{
	register struct olddirect *dp;
	static struct direct dir;

	for (;;) {
		if (dirp->dd_loc == 0) {
			dirp->dd_size = read(dirp->dd_fd, dirp->dd_buf, 
			    DIRBLKSIZ);
			if (dirp->dd_size <= 0)
				return NULL;
		}
		if (dirp->dd_loc >= dirp->dd_size) {
			dirp->dd_loc = 0;
			continue;
		}
		dp = (struct olddirect *)(dirp->dd_buf + dirp->dd_loc);
		dirp->dd_loc += sizeof(struct olddirect);
		if (dp->od_ino == 0)
			continue;
		dir.d_ino = dp->od_ino;
		strncpy(dir.d_name, dp->od_name, ODIRSIZ);
		dir.d_name[ODIRSIZ] = '\0'; /* insure null termination */
		dir.d_namlen = strlen(dir.d_name);
		dir.d_reclen = DIRBLKSIZ;
		return (&dir);
	}
}
'EOF'
echo extracting seekdir.c
cat > seekdir.c <<'EOF'
static char sccsid[] = "@(#)seekdir.c 4.9 3/25/83";

#include <sys/param.h>
#include <dir.h>

/*
 * seek to an entry in a directory.
 * Only values returned by "telldir" should be passed to seekdir.
 */
void
seekdir(dirp, loc)
	register DIR *dirp;
	long loc;
{
	long curloc, base, offset;
	struct direct *dp;
	extern long lseek();

	curloc = telldir(dirp);
	if (loc == curloc)
		return;
	base = loc & ~(DIRBLKSIZ - 1);
	offset = loc & (DIRBLKSIZ - 1);
	(void) lseek(dirp->dd_fd, base, 0);
	dirp->dd_loc = 0;
	while (dirp->dd_loc < offset) {
		dp = readdir(dirp);
		if (dp == NULL)
			return;
	}
}
'EOF'
echo extracting telldir.c
cat > telldir.c <<'EOF'
static char sccsid[] = "@(#)telldir.c 4.1 2/21/82";

#include <sys/types.h>
#include <dir.h>

/*
 * return a pointer into a directory
 */
long
telldir(dirp)
	DIR *dirp;
{
	return (lseek(dirp->dd_fd, 0L, 1) - dirp->dd_size + dirp->dd_loc);
}
'EOF'
echo extracting directory.3s
cat > directory.3s <<'EOF'
..TH DIRECTORY 3  "OGC Revision  8/02/82"
.TH DIRECTORY 3X 8/1/82
.UC 4.1b Compatability
.SH NAME
opendir, readdir, telldir, seekdir, rewinddir, closedir \- flexible length directory operations
.SH SYNOPSIS
.B #include <dir.h>
.PP
.SM
.B DIR
.B *opendir(filename)
.br
.B char *filename;
.PP
.SM
.B struct direct
.B *readdir(dirp)
.br
.B DIR *dirp;
.PP
.SM
.B long
.B telldir(dirp)
.br
.B DIR *dirp;
.PP
.SM
.B seekdir(dirp, loc)
.br
.B DIR *dirp;
.br
.B long loc;
.PP
.SM
.B rewinddir(dirp)
.br
.B DIR *dirp;
.PP
.SM
.B closedir(dirp)
.br
.B DIR *dirp;
.PP
.SM
.B cc ... -lndir
.SH DESCRIPTION
The purpose of this library is to simulate
the new flexible length directory names of 4.2bsd Unix
on top of the old directory structure of 4.1bsd.
It allows programs to be converted immediately
to the new directory access interface,
so that they need only be relinked
when 4.2bsd becomes available.
.PP
.I opendir
opens the directory named by
.I filename
and associates a
.I directory stream
with it.
.I opendir
returns a pointer to be used to identify the
.I directory stream
in subsequent operations.
The pointer
.SM
.B NULL
is returned if
.I filename
cannot be accessed or is not a directory.
.PP
.I readdir
returns a pointer to the next directory entry.
It returns
.B NULL
upon reaching the end of the directory or detecting
an invalid
.I seekdir
operation.
.PP
.I telldir
returns the current location associated with the named
.I directory stream.
.PP
.I seekdir
sets the position of the next
.I readdir
operation on the
.I directory stream.
The new position reverts to the one associated with the
.I directory stream
when the
.I telldir
operation was performed.
Values returned by
.I telldir
are good only for the lifetime of the DIR pointer from 
which they are derived.
If the directory is closed and then reopened, 
the 
.I telldir
value may be invalidated
due to undetected directory compaction.
It is safe to use a previous
.I telldir
value immediately after a call to
.I opendir
and before any calls to
.I readdir.
.PP
.I rewinddir
resets the position of the named
.I directory stream
to the beginning of the directory.
.PP
.I closedir
causes the named
.I directory stream
to be closed,
and the structure associated with the DIR pointer to be freed.
.PP
See /usr/include/dir.h for a description of the fields available in
a directory entry.
The preferred way to search the current directory for entry "name" is:
.br
.sp
	len = strlen(name);
.br
	dirp = opendir(".");
.br
	for (dp = readdir(dirp); dp != NULL; dp = readdir(dir))
.br
		if (dp->d_namlen == len && !strcmp(dp->d_name, name)) {
.br
			closedir(dirp);
.br
			return FOUND;
.br
		}
.br
	closedir(dirp);
.br
	return NOT_FOUND;
.SH LINKING
This library is accessed by specifying "-lndir" as the
last argument to the compile line, e.g.:
.br
.sp
	cc -o prog prog.c -lndir
.SH "SEE ALSO"
/usr/include/dir.h,
open(2),
close(2),
read(2),
lseek(2)
.SH AUTHOR
Kirk McKusick.
Report problems to mckusick@berkeley or ucbvax!mckusick.
'EOF'
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com (or guy@sun.arpa)