[comp.sys.amiga.tech] What's my tree?

mwm@mica.berkeley.edu (Mike (I'll think of something yet) Meyer) (07/24/89)
Now that we've all had fun with finding out our names, I can ask the
next "this is harder than it looks" question.

I've got a "generalized" file tree walking routine. I think the
interface is reasonable (at least, it'll do everything I want it to
do), but I'd like to hear from others who might find this usefull.
Better yet, I'd like for some such routine to appear as part of 1.4,
so I don't have to fix mine to deal with links...

Anyway, the interface is like so:

treewalk(rootlock, userfunc, flags) ;
BPTR rootlock ;
int (*userfunc)(dirlock, fib) ;
int flags ;

BPT dirlock ;
struct FileInfoBlock *fib ;

treewalk returns TRUE if everything went fine, and FALSE if there was
an error of some kind.

rootlock is a lock on the directory that is the root of the tree. That
not being a valid lock, or not being a directory, is considered an
error.

flags is currently only used for two things: specifying whether the
directory tree (note: directory tree, not the file tree) is to be
walked in preorder, postorder, or both. The flags are TREE_PRE,
TREE_POST and TREE_BOTH (== TREE_PRE | TREE_POST). If TREE_BOTH is
specified every directory is visited twice, leave directories one
with one visit right after the other. Unfortunatly, asking for
postorder means directories get scanned twice.

userfunc is called to let the user decide what to do with directories
& files. When a directory is visited, userfunc is first called with
fib == NULL and dirlock being a lock on that directory. After that,
it's called with the same lock, and fib being the FileInfoBlock for
each file in that directory. Userfunc returns either TREE_STOP, to
stop the tree walk at that point, or TREE_CONT, to continue it.

The design goals were 1) robustness: treewalk's stack use isn't
dependent on how deep the tree is; and if allowed to exit, it frees
all resources it uses. Ditto for errors. 2) Speed: initial tests show
it about as fast as a PD find, even though the find is optimized, and
treewalk included all the debugging code. 3) Generality: I designed it
so that every tree-walking application I could think of could use
treewalk to do the work. 4) Cost in the user code: Many applications
don't need full paths, so I decided not to track them.  Passing the
lock to userfunc allows it to get it if it needs it.  Likewise, all
files are passed to userfunc, instead of testing to see if it wants
them inside of treewalk.

Alternatives considered and rejected: 1) specify whether usrfunc
should return the directory (fib == 0) or files or both (or neither) I
decided that this was of low utility, and those routines that wanted
to could verify that condition and return immediately. 2) Making
files, as well as the tree, come out postorder.  That seemed to be of
marginal utility, and it wasn't clear how to do it. 3) Indicating
whether a directory was being visitied pre or post order. Same
comments.

That's the function interface. I'd like to know of any appcliation
that can't be recoded to fit. Please describe the application, as you
can do quite a bit with statics in userfunc. The obvious example (a
recurslive list) is:

Preorder traversal, with the following userfunc:

rlist(BPTR dirlock, struct FileInfoBlock *fib) {
	static char pathname[FMSIZE] ;

	if (fib == NULL) {
		if (getpath(dirlock, pathname)) return TREE_STOP ;
		strcat(pathname, strchr(pathname, ':') == NULL ? ":" : "/") ;
		puts(pathname) ;
		return TREE_CONT ;
		}
	printf("%s%s\n", pathname, fib->fib_FileName) ;
	return TREE_CONT ;
	}


Last thing: I'm trying to build a better-than-find tool to put this at
the CLI level. Base plan to date is:

treewalk <flags> [ { filter-commands } ] [shell command]

flags are: -+ - postorder traversal (pre is the default; post requires
		directories to be scanned twice).

	   -- - preorder, only useful as "-+-" to get pre as well as
	   	post order.
			
	   -f "shell command" only takes one argument

	   -d <dir> start at directory <dir>.

{ filter-commands } (the "{}" are included) consists of a find-like
expression (only cleaner) to decide if this file is going to be
included or not. If I can figure out how, I'm going to allow Rexx
scripts to be used as filters - which will also allow commands to be
run on the files via "address command".

Finally, any files that get through the filter will be fed as
arguments to the "shell command". Filenames will be tacked on until there
is no more room on the command line, unless the "-f" flag is given, in
which case there will be one invocation of the command per file that
passes the filter. If no shell command is given, the file names will
be listed to stdout, one file per line, using the absolute path.

Comments on this would also be appreciated.

	Thanx,
	<mike
--
I know the world is flat.				Mike Meyer
Don't try tell me that it's round.			mwm@berkeley.edu
I know the world stands still.				ucbvax!mwm
Don't try to make it turn around.			mwm@ucbjade.BITNET