[comp.lang.perl] Globbing

tneff@bfmny0.UU.NET (Tom Neff) (02/28/90)

A quick thought -- globbing is too slow right now.  If your system has
CSH, even though you never use it in your life otherwise, Perl will
spawn it to do globbing.  If you forcibly turn off #define CSH, Perl
then spawns a Bourne shell PIPE!  ('echo %s | tr')  I can see the
'echo' as a quick&dirty, but a tool as sophisticated as Perl loading
'tr'???  With do_trans() and so forth already coded?  Yeccch.

At minimum in the second case, Perl should take the output of 'echo'
(which is internal to many Bourne shells) and do its own translate.

Ideally, Perl should do its own globbing.  There is enough globbing code
out to there to "borrow" from, and Perl already has 'dirent'.

Better yet, replace traditional metachar globbing (or supplement it)
with Perl regexp globbing.

	@foo = <!/usr/tmp/id[0-9]{2,5}.$XVAL!>;

Now the programmer has REAL power at his fingertips.  (I chose ! for the
default match delimiter above because the ordinary / is too likely to
conflict with the path spec.)

I should get off my duff and code this as a Perl function

	@foo = &REglob("/usr/tmp/id[0-9]{2,5}.$XVAL");

first for demonstration purposes -- unless Randal wants to take up
the challenge.  :-)

tchrist@convex.COM (Tom Christiansen) (02/28/90)

Tom Neff writes:

>I should get off my duff and code this as a Perl function
>
>	@foo = &REglob("/usr/tmp/id[0-9]{2,5}.$XVAL");
>
>first for demonstration purposes -- unless Randal wants to take up
>the challenge.  :-)

Well, this one will work for the "simple" non-recursive case you posted.
It's based on the &glob function I posted a few days ago.

I don't want to code of the recursive one until patch9.  Note that if
Larry changes perl's globbing to be internal with full rexprs, there will have to be
some kind of support for the old method of "?" and "*" for "." and ".*".
I somewhat like the idea of <!expr!> being REglob and <expr> being SHglob, although
if we could find some unused punctuation, you could add a new special 
variable, like ${ that turned on REglob if set.  Other possibilities
exist.

I notice that with csh you can do tilda globbing:

	($file) = <~tchrist/.cshrc>;

If globbing is internalized, support for tilda would be nice to keep.

Here's my glob tester for Tom Neff's example.   I admit to embarrassment
over using rindex and substr over regexps, but it coded up easier that
way.  

Also, I'd love to use /o on a couple things here, but that would
mean only being able to call the function once.  Any way to invalidate the
/o thing?  I would like to do a grep(/$expr/o, @args) where the $expr is
only compiled once per new grep but not at every internal iteration of
grep, and I don't think it's currently feasible.  That way you could do:

	for $expr (@list) {
	    for $item (@other_list) {
		&foo if $item =~ /$expr/o;
	    }
	}

and have the $expr compiled only once per iteration of the outer loop.

You might also want not to always autmatically exclude files starting with 
dot, or not to always sort the entries.  There's a lot of room for 
discussion here.

--tom


#!/usr/bin/perl

$debug = 1;

for ( print "glob> "; <STDIN>; print "glob> ") {
    chop;
    @list = &REglob($_);
    for ($i = 0; $i <= $#list; $i++) {
	print "$i $list[$i]\n";
    } 
} 


sub REglob {
    local($path) = @_;
    local(@retlist) = ();
    local($root,$expr,$pos);
    local($fakedot) = 0;
    local(*METADIR);		

    if (($pos = rindex($path, '/')) >= $[) {
	$root = substr($path, $[, $pos);
	$expr = substr($path, $pos+1, 999);
    } else {
	$fakedot = 1;
	$root = '.';
	$expr = $path;
    } 
    print "REglob: root is $root, expr is $expr\n" if $debug;

    unless (opendir(METADIR, $root)) {
	warn "glob: can't opendir \"$root\": $!\n";
    } else {
	@retlist = sort grep(/^$expr$/, grep(!/^\./, readdir(METADIR)));
	closedir METADIR;
    }
    unless ($fakedot) { 
	for (@retlist) { 
	    s!^!$root/!; 
	} 
    }
    return @retlist;
}
--

    Tom Christiansen                       {uunet,uiucdcs,sun}!convex!tchrist 
    Convex Computer Corporation                            tchrist@convex.COM
		 "EMACS belongs in <sys/errno.h>: Editor too big!"

merlyn@iwarp.intel.com (Randal Schwartz) (02/28/90)

In article <15209@bfmny0.UU.NET>, tneff@bfmny0 (Tom Neff) writes:
| I should get off my duff and code this as a Perl function
| 
| 	@foo = &REglob("/usr/tmp/id[0-9]{2,5}.$XVAL");
| 
| first for demonstration purposes -- unless Randal wants to take up
| the challenge.  :-)

Nope.  I write only one liners, unless it's for my .sig. :-)

I agree though.  I mean, a few more steps, and we can get rid of the
shell entirely!

Does anyone remember UNIX V6 when globbing was done by a separate
process, /etc/glob?  Dennis Ritchie (research!dmr back then) told me
in private mail (OK, so I'm name-dropping :-) that /etc/glob was the
*first* program coded in C for UNIX.

Maybe we can just rip all the useless parts of csh out, and rename it
/usr/lib/perl/glob? :-)

@a=<;echo Just another Perl hacker,>; print join(" ",@a);
(How's *that* for esoteric and undocumented....)
-- 
/=Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ==========\
| on contract to Intel's iWarp project, Beaverton, Oregon, USA, Sol III      |
| merlyn@iwarp.intel.com ...!any-MX-mailer-like-uunet!iwarp.intel.com!merlyn |
\=Cute Quote: "Welcome to Portland, Oregon, home of the California Raisins!"=/

tchrist@convex.COM (Tom Christiansen) (03/01/90)

Here's a recursive glob package complete with SHglob and REglob.
It also handles ~foo constructions.  The only problem is it 
dumps core on most interesting recursive uses.  Hopefully Larry's
recursion fix will make these problems go away.  If he can make 
it work for "/usr/m*/man?" (for the SHglob) without dumping core,
I'll be happy.  There's a testing program for playing with the
package.

--tom

#! /bin/sh
# This is a shell archive, meaning:
# 1. Remove everything above the #! /bin/sh line.
# 2. Save the resulting text in a file.
# 3. Execute the file with /bin/sh (not csh) to create:
#	testglob
#	glob.pl
# This archive created: Wed Feb 28 13:42:22 1990
export PATH; PATH=/bin:/usr/bin:$PATH
echo shar: "extracting 'testglob'" '(686 characters)'
if test -f 'testglob'
then
	echo shar: "will not over-write existing file 'testglob'"
else
sed 's/^	X//' << \SHAR_EOF > 'testglob'
	X#!/usr/bin/perl
	X
	Xsub source {
	X    local($file) = @_;
	X    local($return) = 0;
	X
	X    $return = do $file;
	X    die "couldn't do \"$file\": $!" unless defined $return;
	X    die "couldn't parse \"$file\": $@" if $@;
	X    die "couldn't run \"$file\"" unless $return;
	X}
	X
	X&source('glob.pl');
	X
	X$glob'debug = 1;
	X
	XSTYLE: {
	X    print "Glob style (either RE or SH)? ";
	X    $style = <STDIN>;
	X    redo STYLE unless $style =~ /^\s*(re|sh)/i;
	X    $glob = ($style =~ /^\s*re/i) ? 'REglob' : 'SHglob';
	X}
	X
	X
	Xfor ( print "glob> "; <STDIN>; print "glob> ") {
	X    chop;
	X    @list = &$glob($_);
	X    printf "Globbed %d entries\n", $#list+1;
	X    for ($i = 0; $i <= $#list; $i++) {
	X	print "$i $list[$i]\n";
	X    } 
	X} 
	X
SHAR_EOF
if test 686 -ne "`wc -c < 'testglob'`"
then
	echo shar: "error transmitting 'testglob'" '(should have been 686 characters)'
fi
chmod 775 'testglob'
fi
echo shar: "extracting 'glob.pl'" '(2267 characters)'
if test -f 'glob.pl'
then
	echo shar: "will not over-write existing file 'glob.pl'"
else
sed 's/^	X//' << \SHAR_EOF > 'glob.pl'
	Xpackage glob;
	X
	Xsub main'SHglob {
	X    local($expr) = @_;
	X
	X    printf "SHglob: globbing $expr\n" if $debug;
	X
	X    $expr =~ s/([^\\]?)([.{+\\])/$1\\$2/g;
	X    $expr =~ s/\*/.*/g;
	X    $expr =~ s/\?/./g;
	X
	X    return &main'REglob($expr);
	X} 
	X
	Xsub main'REglob {
	X    local($path) = @_;
	X    local($_);
	X    local(@retlist) = ();
	X    local($root,$expr,$pos);
	X    local($relative) = 0;
	X    local(@dirs);
	X    local($user);
	X
	X    $haveglobbed = 0;
	X
	X    @dirs = split(/\/+/, $path);
	X
	X    if ($dirs[0] =~ m!~(.*)!) {
	X	$dirs[0] = &homedir($1);
	X	return @retlist unless $dirs[0];
	X    } elsif ($dirs[0] eq '') {
	X	$dirs[0] = '/' unless $dirs[0] =~ m!^.{1,2}$!;
	X    } else {
	X	unless ($dirs[0] =~ m!^.{1,2}$!) {
	X	    unshift(@dirs, '.');
	X	    $relative = 1;
	X	}
	X    } 
	X
	X    printf "REglob: globbing %s\n", join('/',@dirs) if $debug;
	X
	X    @retlist = &expand(@dirs);
	X
	X    for (@retlist) {
	X	if ($relative) {
	X	    s!^\./!!o;
	X	}
	X	s!/{2,}!/!g;
	X    } 
	X
	X    return @retlist;
	X}
	X
	Xsub expand {
	X    local($dir, $thisdir, @rest) = @_;
	X    local($nextdir);
	X    local($_);
	X    local(@retlist) = ();
	X    local(*DIR);
	X
	X    unless ($haveglobbed || $thisdir =~ /([^\\]?)[?.*{[+\\]/ && $1 ne '\\') {
	X	@retlist = ($thisdir);
	X    } else {
	X	unless (opendir(DIR,$dir)) {
	X	    warn "glob: can't opendir $dir: $!\n" if $debug;
	X	} else {
	X		@retlist = grep(/^$thisdir$/,readdir(DIR));
	X		@retlist = grep(!/^\./, @retlist) unless $thisdir =~ /^\\\./;
	X		@retlist = sort @retlist;
	X		$haveglobbed++;
	X	} 
	X	closedir DIR;
	X    } 
	X
	X    for (@retlist) {
	X	$_ = $dir . '/' . $_;
	X    }
	X
	X    if ($nextdir = shift @rest) {
	X	local(@newlist) = ();
	X	for (@retlist) {
	X	    push(@newlist,&expand($_,$nextdir,@rest));
	X	} 
	X	@retlist = @newlist;
	X    } 
	X
	X    return @retlist;
	X} 
	X
	Xsub homedir {
	X    local($user) = @_;
	X    local(@pwent);
	X    # global %homedir
	X
	X    if (!$user) {
	X	return $ENV{'HOME'} 		if $ENV{'HOME'};
	X	($user = $ENV{'USER'})  	|| 
	X	    ($user = getlogin) 		|| 
	X	    (($user) = getpwnam($>));
	X	warn "glob'homedir: who are you, user #$>?" unless $user;
	X	return '/';
	X    } 
	X    unless (defined $homedir{$user}) {
	X	if (@pwent = getpwnam($user)) {
	X	    $homedir{$user} = $pwent[$#pwent - 1];
	X	} else {
	X	    warn "glob'homedir: who are you, user #$>?" unless $user;
	X	    $homedir{$user} = '/';
	X	}
	X    }
	X    return $homedir{$user};
	X} 
	X
	X1;
SHAR_EOF
if test 2267 -ne "`wc -c < 'glob.pl'`"
then
	echo shar: "error transmitting 'glob.pl'" '(should have been 2267 characters)'
fi
chmod 664 'glob.pl'
fi
exit 0
#	End of shell archive

ndjc@hobbit.UUCP (Nick Crossley) (03/01/90)

Various people have suggested that perl do its own globbing, rather than
invoke a subshell to do it.  It might be noted that sufficiently recent
versions of System V (3.something onwards?) have a libgen.a, and in that
is the function gmatch, stolen from sh.  This could be yet another thing
for Configure to check.
-- 

<<< standard disclaimers >>>
Nick Crossley, ICL NA, 9801 Muirlands, Irvine, CA 92718-2521, USA 714-458-7282
uunet!ccicpg!ndjc  /  ndjc@ccicpg.UUCP

piet@cs.ruu.nl (Piet van Oostrum) (03/01/90)

In article <15209@bfmny0.UU.NET>, tneff@bfmny0 (Tom Neff) writes:
 `A quick thought -- globbing is too slow right now.  If your system has
 `CSH, even though you never use it in your life otherwise, Perl will
 `spawn it to do globbing.  If you forcibly turn off #define CSH, Perl
 `then spawns a Bourne shell PIPE!  ('echo %s | tr')  I can see the
 `'echo' as a quick&dirty, but a tool as sophisticated as Perl loading
 `'tr'???  With do_trans() and so forth already coded?  Yeccch.
 `
 `At minimum in the second case, Perl should take the output of 'echo'
 `(which is internal to many Bourne shells) and do its own translate.
 `
 `Ideally, Perl should do its own globbing.  There is enough globbing code
 `out to there to "borrow" from, and Perl already has 'dirent'.
 `

Actually, I rewrote the globbing in perl (I mean, I put the code in perl).
I borrowed the globbing code from GNU make.
I have sent the code to Larry, but I never heard anything about it.
Maybe he is afraid to use it :=)
-- 
Piet* van Oostrum, Dept of Computer Science, Utrecht University,
Padualaan 14, P.O. Box 80.089, 3508 TB Utrecht, The Netherlands.
Telephone: +31-30-531806   Uucp:   uunet!mcsun!hp4nl!ruuinf!piet
Telefax:   +31-30-513791   Internet:  piet@cs.ruu.nl   (*`Pete')

lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (03/02/90)

In article <100306@convex.convex.com> tchrist@convex.COM (Tom Christiansen) writes:
: Also, I'd love to use /o on a couple things here, but that would
: mean only being able to call the function once.  Any way to invalidate the
: /o thing?  I would like to do a grep(/$expr/o, @args) where the $expr is
: only compiled once per new grep but not at every internal iteration of
: grep, and I don't think it's currently feasible.  That way you could do:
: 
: 	for $expr (@list) {
: 	    for $item (@other_list) {
: 		&foo if $item =~ /$expr/o;
: 	    }
: 	}
: 
: and have the $expr compiled only once per iteration of the outer loop.

You can do it like this:

	for $expr (@list) {
	    eval "#$expr" . '
		for $item (@other_list) {
		    &foo if $item =~ /$expr/o;
		}
	    ';
	}

The purpose of the #$expr is to force recompilation of the otherwise
identical (and hence not recompiled) string.

This may or may not save you time, depending on how long @other_list is.
There is considerable overhead in recompiling each time through the
outer loop.

I could conceivable reset //o at the same time as I reset ?? searches.

Larry