tneff@bfmny0.UU.NET (Tom Neff) (02/28/90)
A quick thought -- globbing is too slow right now. If your system has CSH, even though you never use it in your life otherwise, Perl will spawn it to do globbing. If you forcibly turn off #define CSH, Perl then spawns a Bourne shell PIPE! ('echo %s | tr') I can see the 'echo' as a quick&dirty, but a tool as sophisticated as Perl loading 'tr'??? With do_trans() and so forth already coded? Yeccch. At minimum in the second case, Perl should take the output of 'echo' (which is internal to many Bourne shells) and do its own translate. Ideally, Perl should do its own globbing. There is enough globbing code out to there to "borrow" from, and Perl already has 'dirent'. Better yet, replace traditional metachar globbing (or supplement it) with Perl regexp globbing. @foo = <!/usr/tmp/id[0-9]{2,5}.$XVAL!>; Now the programmer has REAL power at his fingertips. (I chose ! for the default match delimiter above because the ordinary / is too likely to conflict with the path spec.) I should get off my duff and code this as a Perl function @foo = &REglob("/usr/tmp/id[0-9]{2,5}.$XVAL"); first for demonstration purposes -- unless Randal wants to take up the challenge. :-)
tchrist@convex.COM (Tom Christiansen) (02/28/90)
Tom Neff writes: >I should get off my duff and code this as a Perl function > > @foo = &REglob("/usr/tmp/id[0-9]{2,5}.$XVAL"); > >first for demonstration purposes -- unless Randal wants to take up >the challenge. :-) Well, this one will work for the "simple" non-recursive case you posted. It's based on the &glob function I posted a few days ago. I don't want to code of the recursive one until patch9. Note that if Larry changes perl's globbing to be internal with full rexprs, there will have to be some kind of support for the old method of "?" and "*" for "." and ".*". I somewhat like the idea of <!expr!> being REglob and <expr> being SHglob, although if we could find some unused punctuation, you could add a new special variable, like ${ that turned on REglob if set. Other possibilities exist. I notice that with csh you can do tilda globbing: ($file) = <~tchrist/.cshrc>; If globbing is internalized, support for tilda would be nice to keep. Here's my glob tester for Tom Neff's example. I admit to embarrassment over using rindex and substr over regexps, but it coded up easier that way. Also, I'd love to use /o on a couple things here, but that would mean only being able to call the function once. Any way to invalidate the /o thing? I would like to do a grep(/$expr/o, @args) where the $expr is only compiled once per new grep but not at every internal iteration of grep, and I don't think it's currently feasible. That way you could do: for $expr (@list) { for $item (@other_list) { &foo if $item =~ /$expr/o; } } and have the $expr compiled only once per iteration of the outer loop. You might also want not to always autmatically exclude files starting with dot, or not to always sort the entries. There's a lot of room for discussion here. --tom #!/usr/bin/perl $debug = 1; for ( print "glob> "; <STDIN>; print "glob> ") { chop; @list = &REglob($_); for ($i = 0; $i <= $#list; $i++) { print "$i $list[$i]\n"; } } sub REglob { local($path) = @_; local(@retlist) = (); local($root,$expr,$pos); local($fakedot) = 0; local(*METADIR); if (($pos = rindex($path, '/')) >= $[) { $root = substr($path, $[, $pos); $expr = substr($path, $pos+1, 999); } else { $fakedot = 1; $root = '.'; $expr = $path; } print "REglob: root is $root, expr is $expr\n" if $debug; unless (opendir(METADIR, $root)) { warn "glob: can't opendir \"$root\": $!\n"; } else { @retlist = sort grep(/^$expr$/, grep(!/^\./, readdir(METADIR))); closedir METADIR; } unless ($fakedot) { for (@retlist) { s!^!$root/!; } } return @retlist; } -- Tom Christiansen {uunet,uiucdcs,sun}!convex!tchrist Convex Computer Corporation tchrist@convex.COM "EMACS belongs in <sys/errno.h>: Editor too big!"
merlyn@iwarp.intel.com (Randal Schwartz) (02/28/90)
In article <15209@bfmny0.UU.NET>, tneff@bfmny0 (Tom Neff) writes: | I should get off my duff and code this as a Perl function | | @foo = &REglob("/usr/tmp/id[0-9]{2,5}.$XVAL"); | | first for demonstration purposes -- unless Randal wants to take up | the challenge. :-) Nope. I write only one liners, unless it's for my .sig. :-) I agree though. I mean, a few more steps, and we can get rid of the shell entirely! Does anyone remember UNIX V6 when globbing was done by a separate process, /etc/glob? Dennis Ritchie (research!dmr back then) told me in private mail (OK, so I'm name-dropping :-) that /etc/glob was the *first* program coded in C for UNIX. Maybe we can just rip all the useless parts of csh out, and rename it /usr/lib/perl/glob? :-) @a=<;echo Just another Perl hacker,>; print join(" ",@a); (How's *that* for esoteric and undocumented....) -- /=Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ==========\ | on contract to Intel's iWarp project, Beaverton, Oregon, USA, Sol III | | merlyn@iwarp.intel.com ...!any-MX-mailer-like-uunet!iwarp.intel.com!merlyn | \=Cute Quote: "Welcome to Portland, Oregon, home of the California Raisins!"=/
tchrist@convex.COM (Tom Christiansen) (03/01/90)
Here's a recursive glob package complete with SHglob and REglob. It also handles ~foo constructions. The only problem is it dumps core on most interesting recursive uses. Hopefully Larry's recursion fix will make these problems go away. If he can make it work for "/usr/m*/man?" (for the SHglob) without dumping core, I'll be happy. There's a testing program for playing with the package. --tom #! /bin/sh # This is a shell archive, meaning: # 1. Remove everything above the #! /bin/sh line. # 2. Save the resulting text in a file. # 3. Execute the file with /bin/sh (not csh) to create: # testglob # glob.pl # This archive created: Wed Feb 28 13:42:22 1990 export PATH; PATH=/bin:/usr/bin:$PATH echo shar: "extracting 'testglob'" '(686 characters)' if test -f 'testglob' then echo shar: "will not over-write existing file 'testglob'" else sed 's/^ X//' << \SHAR_EOF > 'testglob' X#!/usr/bin/perl X Xsub source { X local($file) = @_; X local($return) = 0; X X $return = do $file; X die "couldn't do \"$file\": $!" unless defined $return; X die "couldn't parse \"$file\": $@" if $@; X die "couldn't run \"$file\"" unless $return; X} X X&source('glob.pl'); X X$glob'debug = 1; X XSTYLE: { X print "Glob style (either RE or SH)? "; X $style = <STDIN>; X redo STYLE unless $style =~ /^\s*(re|sh)/i; X $glob = ($style =~ /^\s*re/i) ? 'REglob' : 'SHglob'; X} X X Xfor ( print "glob> "; <STDIN>; print "glob> ") { X chop; X @list = &$glob($_); X printf "Globbed %d entries\n", $#list+1; X for ($i = 0; $i <= $#list; $i++) { X print "$i $list[$i]\n"; X } X} X SHAR_EOF if test 686 -ne "`wc -c < 'testglob'`" then echo shar: "error transmitting 'testglob'" '(should have been 686 characters)' fi chmod 775 'testglob' fi echo shar: "extracting 'glob.pl'" '(2267 characters)' if test -f 'glob.pl' then echo shar: "will not over-write existing file 'glob.pl'" else sed 's/^ X//' << \SHAR_EOF > 'glob.pl' Xpackage glob; X Xsub main'SHglob { X local($expr) = @_; X X printf "SHglob: globbing $expr\n" if $debug; X X $expr =~ s/([^\\]?)([.{+\\])/$1\\$2/g; X $expr =~ s/\*/.*/g; X $expr =~ s/\?/./g; X X return &main'REglob($expr); X} X Xsub main'REglob { X local($path) = @_; X local($_); X local(@retlist) = (); X local($root,$expr,$pos); X local($relative) = 0; X local(@dirs); X local($user); X X $haveglobbed = 0; X X @dirs = split(/\/+/, $path); X X if ($dirs[0] =~ m!~(.*)!) { X $dirs[0] = &homedir($1); X return @retlist unless $dirs[0]; X } elsif ($dirs[0] eq '') { X $dirs[0] = '/' unless $dirs[0] =~ m!^.{1,2}$!; X } else { X unless ($dirs[0] =~ m!^.{1,2}$!) { X unshift(@dirs, '.'); X $relative = 1; X } X } X X printf "REglob: globbing %s\n", join('/',@dirs) if $debug; X X @retlist = &expand(@dirs); X X for (@retlist) { X if ($relative) { X s!^\./!!o; X } X s!/{2,}!/!g; X } X X return @retlist; X} X Xsub expand { X local($dir, $thisdir, @rest) = @_; X local($nextdir); X local($_); X local(@retlist) = (); X local(*DIR); X X unless ($haveglobbed || $thisdir =~ /([^\\]?)[?.*{[+\\]/ && $1 ne '\\') { X @retlist = ($thisdir); X } else { X unless (opendir(DIR,$dir)) { X warn "glob: can't opendir $dir: $!\n" if $debug; X } else { X @retlist = grep(/^$thisdir$/,readdir(DIR)); X @retlist = grep(!/^\./, @retlist) unless $thisdir =~ /^\\\./; X @retlist = sort @retlist; X $haveglobbed++; X } X closedir DIR; X } X X for (@retlist) { X $_ = $dir . '/' . $_; X } X X if ($nextdir = shift @rest) { X local(@newlist) = (); X for (@retlist) { X push(@newlist,&expand($_,$nextdir,@rest)); X } X @retlist = @newlist; X } X X return @retlist; X} X Xsub homedir { X local($user) = @_; X local(@pwent); X # global %homedir X X if (!$user) { X return $ENV{'HOME'} if $ENV{'HOME'}; X ($user = $ENV{'USER'}) || X ($user = getlogin) || X (($user) = getpwnam($>)); X warn "glob'homedir: who are you, user #$>?" unless $user; X return '/'; X } X unless (defined $homedir{$user}) { X if (@pwent = getpwnam($user)) { X $homedir{$user} = $pwent[$#pwent - 1]; X } else { X warn "glob'homedir: who are you, user #$>?" unless $user; X $homedir{$user} = '/'; X } X } X return $homedir{$user}; X} X X1; SHAR_EOF if test 2267 -ne "`wc -c < 'glob.pl'`" then echo shar: "error transmitting 'glob.pl'" '(should have been 2267 characters)' fi chmod 664 'glob.pl' fi exit 0 # End of shell archive
ndjc@hobbit.UUCP (Nick Crossley) (03/01/90)
Various people have suggested that perl do its own globbing, rather than invoke a subshell to do it. It might be noted that sufficiently recent versions of System V (3.something onwards?) have a libgen.a, and in that is the function gmatch, stolen from sh. This could be yet another thing for Configure to check. -- <<< standard disclaimers >>> Nick Crossley, ICL NA, 9801 Muirlands, Irvine, CA 92718-2521, USA 714-458-7282 uunet!ccicpg!ndjc / ndjc@ccicpg.UUCP
piet@cs.ruu.nl (Piet van Oostrum) (03/01/90)
In article <15209@bfmny0.UU.NET>, tneff@bfmny0 (Tom Neff) writes:
`A quick thought -- globbing is too slow right now. If your system has
`CSH, even though you never use it in your life otherwise, Perl will
`spawn it to do globbing. If you forcibly turn off #define CSH, Perl
`then spawns a Bourne shell PIPE! ('echo %s | tr') I can see the
`'echo' as a quick&dirty, but a tool as sophisticated as Perl loading
`'tr'??? With do_trans() and so forth already coded? Yeccch.
`
`At minimum in the second case, Perl should take the output of 'echo'
`(which is internal to many Bourne shells) and do its own translate.
`
`Ideally, Perl should do its own globbing. There is enough globbing code
`out to there to "borrow" from, and Perl already has 'dirent'.
`
Actually, I rewrote the globbing in perl (I mean, I put the code in perl).
I borrowed the globbing code from GNU make.
I have sent the code to Larry, but I never heard anything about it.
Maybe he is afraid to use it :=)
--
Piet* van Oostrum, Dept of Computer Science, Utrecht University,
Padualaan 14, P.O. Box 80.089, 3508 TB Utrecht, The Netherlands.
Telephone: +31-30-531806 Uucp: uunet!mcsun!hp4nl!ruuinf!piet
Telefax: +31-30-513791 Internet: piet@cs.ruu.nl (*`Pete')
lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (03/02/90)
In article <100306@convex.convex.com> tchrist@convex.COM (Tom Christiansen) writes: : Also, I'd love to use /o on a couple things here, but that would : mean only being able to call the function once. Any way to invalidate the : /o thing? I would like to do a grep(/$expr/o, @args) where the $expr is : only compiled once per new grep but not at every internal iteration of : grep, and I don't think it's currently feasible. That way you could do: : : for $expr (@list) { : for $item (@other_list) { : &foo if $item =~ /$expr/o; : } : } : : and have the $expr compiled only once per iteration of the outer loop. You can do it like this: for $expr (@list) { eval "#$expr" . ' for $item (@other_list) { &foo if $item =~ /$expr/o; } '; } The purpose of the #$expr is to force recompilation of the otherwise identical (and hence not recompiled) string. This may or may not save you time, depending on how long @other_list is. There is considerable overhead in recompiling each time through the outer loop. I could conceivable reset //o at the same time as I reset ?? searches. Larry