[comp.lang.perl] bug in R.E. ?

brossard@sic.epfl.ch (Alain Brossard EPFL-SIC/SII) (05/03/91)

	While trying to write a perl script, one of my regular
expression didn't work and I believe it is due to a bug in perl 4.003.

sasun1[15]$ perl
$pat = ' fwef ';
print $pat =~ /\s*([\S]+)/;	# doesn't work
sasun1[16]$  perl
$pat = ' fwef ';
print $pat =~ /\s*(\S+)/;	# does work
fwefsasun1[17]$ perl
$pat = ' fwef ';
print $pat =~ /\s*([f]+)/;	# works for [f]
fsasun1[18]$  perl
$pat = ' fwef ';
print $pat =~ /\s*(f+)/;	# yep, [f] and f are equivalent
fsasun1[19]$

	So, shoudn't [\S]+ be equivalent to \S+ or [\S] to \S?

	Another bug I have tickled causes my perl script to core
dump after printing a lot (>>100) lines with an error (?) message:

Word too long.
Word too long.
Word too long.

/sic/news/spool: write failed, file system is full  (#core will be incomplete)

[7]    Segmentation fault    news_du

news%sicsun[338]$ cd ../spool
news%sicsun[339]$ dbx /sic/public/bin/perl
warning: cannot read pcb in core file:  registers' values may be wrong
Reading symbolic information...
Read 19931 symbols
warning: core file read error: address not in data space
warning: core file read error: address not in data space
warning: core file read error: address not in data space
program terminated by signal SEGV (no mapping at the fault address)
(dbx) where
warning: core file read error: address not in data space
warning: core file read error: address not in data space
warning: core file read error: address not in data space
safemalloc(size = 791621423), line 2526 in "install_public/src/sun4-4.1.1/langages/perl-4.003/util.c"
do_subr(arg = 0x2f2f2f2f, gimme = warning: core file read error: address not indata space
bad data address

   I have repeated this "expirement" a few time trying to get a proper
core, but between the file system filling up (core > 36MBytes)
and unusable core due
to missing -g or dynamic linking this is the best I have come up with.
The program worked on subsets of news/spool, but the whole tree makes
it croak.

   This is on a sun4, Sunos 4.1, with -g, or -O, or -O4 with and
without dynamic linking.  I'm including the perl program below in the
hope that it can be reproduced elsewhere.  (Any suggestions on how to
improve it will be appreciated):

#!/usr/bin/perl

$spool = '/sic/news/spool';
$data = '/sic/news/lib/groups_size';
$rec_size = 9000;       # if spool directory > nn blocks, go down recursively
$min_size = 200;        # don't report spool directory if size <= nn
$diff_size = 1000;      # warn if changes is > nn blocks
$percent_change = 10;   # warn if changes is > nn percent

# Directories which are too big, they should not be scanned directly
@bigdir = ( 'alt', 'comp', 'rec', 'comp/sys' );

chdir $spool || die "Couldn't chdir to $spool: $!";

if( open( FILE, "<$data" ) ) {
    while( <FILE> ) {
        chop;
        ($group, $size) = split( ' ', $_ );
        $groups{$group} = $size;
    }
    close FILE;
}

&scan( "" );		# Never exits from scan, core dumps first!

print "After scan\n";
print "End\n";

sub scan {
    local($DIR, $dir ) = @_;	# Forgot to local(@du), could this be it?

    @dirs = <$DIR[a-z]*>;
    $dirs = join( ' ', @dirs );
    foreach $dir ( @bigdir ) {
        if ( $dirs =~ /\b$dir\b/ )  {
            push( @scan, $dir );
            $dirs =~ s#\b$dir\b##;
        }
    }
    foreach $dir ( @scan ) {
        &scan( "$DIR$dir/" );
    }
    open( DU, "du -s $dirs|" ) || die "Couldn't exec du: $!\n";
    while( <DU> ) {
        chop;
        ($size, $group) = split( ' ', $_ );
        if( ! ($old_size = $groups{$group}) ) { # new group
            $new_groups{$group} = $size if $size > $min_size ;
            $groups{$group} = $size if $size > $min_size ;
        } else {
            $diff = $size - $old_size;
            $percent = (100 * $diff)/$old_size;
            if( $percent > $percent_change || $diff > $diff_size ) {
                printf ( "%25s %6d %3d%% increase: %d\n",
                        $group, $size, $percent, $diff);
            }
            $groups{$group} = $size;
        }
        if ( $size > $rec_size ) { &scan( "$group/" ); }
    }
}


-- 

Alain Brossard, Ecole Polytechnique Federale de Lausanne,
	SIC/SII, EL-Ecublens, CH-1015 Lausanne, Suisse
brossard@sasun1.epfl.ch

lwall@jpl-devvax.jpl.nasa.gov (Larry Wall) (05/07/91)

In article <1991May3.175219@sic.epfl.ch> brossard@sasun1.epfl.ch writes:
: 	So, shoudn't [\S]+ be equivalent to \S+ or [\S] to \S?

You won't find any documentation that says it works.  I only implemented
it for the lower case versions--it seemed too easy to say [^\s]+.
I suppose I could be argued out of this...

: 	Another bug I have tickled causes my perl script to core
: dump after printing a lot (>>100) lines with an error (?) message:
: 
: Word too long.
: Word too long.
: Word too long.

This is a message from your shell, not from perl.  Probably because you said:

:     @dirs = <$DIR[a-z]*>;

This makes use of the shell to do globbing, which has its advantages and
its disadvantages.  You've just discovered the primary disadvantage--shells
have arbitrary limits.

When writing a program like this, it's better to use opendir() and readdir().
It won't run into the limits of the shell, and it runs faster too.

Larry

bbs@hankel.rutgers.edu (Barry Schwartz) (05/12/91)

lwall@jpl-devvax.jpl.nasa.gov (Larry Wall) writes:

]This is a message from your shell, not from perl.  Probably because you said:
]
]:     @dirs = <$DIR[a-z]*>;
]
]This makes use of the shell to do globbing, which has its advantages and
]its disadvantages.  You've just discovered the primary disadvantage--shells
]have arbitrary limits.
]
]When writing a program like this, it's better to use opendir() and readdir().
]It won't run into the limits of the shell, and it runs faster too.

I just want to make a pitch for readdir().  At first it would
seem easier to use shell globbing, but using readdir is easy
once you start using it, _plus it gives you the power of Perl
regular expressions as your globbing mechanism_.  That's saved
me trouble on at least one occasion.


-- 
Barry Schwartz       bbs@hankel.rutgers.edu    trashman@kb2ear.uucp