[comp.lang.perl] perl memory usage?

les@chinet.chi.il.us (Leslie Mikesell) (03/20/90)

Does perl release memory previously used by an associative array
when you do %array = (); ?  I have a script that gathers data
into a global array, then calls a subroutine that loads a similar
local array with existing data from a file, then sorts the keys and
merges the items back to the file.  At the end of the subroutine
both arrays are set to ().  After many passes through this, the program
slows to a crawl - should I be using undef %array or something else
to release the data storage and hash table?  The keys for the array
happen to be unique and thus not re-used by subsequent passes in case
that matters.

Les Mikesell
  les@chinet.chi.il.us

lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (03/21/90)

In article <1990Mar19.210743.15896@chinet.chi.il.us> les@chinet.chi.il.us (Leslie Mikesell) writes:
: Does perl release memory previously used by an associative array
: when you do %array = (); ?

Yes.

: I have a script that gathers data
: into a global array, then calls a subroutine that loads a similar
: local array with existing data from a file, then sorts the keys and
: merges the items back to the file.  At the end of the subroutine
: both arrays are set to ().  After many passes through this, the program
: slows to a crawl - should I be using undef %array or something else
: to release the data storage and hash table?  The keys for the array
: happen to be unique and thus not re-used by subsequent passes in case
: that matters.

Unique keys are no problem.  Not here anyway.  An undef should be
unnecessary.  In fact, the %array = () should be unnecessary on the
local array--Perl effectively does an undef when you exit the scope of
the local.

You should first of all check with ps to see if the process is actually
growing.  If it's not, you're barking up the wrong hash table.

If it is, you wouldn't perhaps be on one of those fun machines where
free() is a no-op?  If so, you might try Perl's malloc/free instead.
If that doesn't work, try defining the union overhead strut in malloc.c.

On the other hand, it might be something else you're doing that is
triggering some obscure bug.  Try to reduce it to a minimal test case.

It's also possible that your process is growing because somehow the %array = ()
is being ineffective (typographical error, or whatever).  But that should
only slow you down whenever it decides to double the size of the table--the
hash algorithm's speed is independent of the table size.

Larry

les@chinet.chi.il.us (Leslie Mikesell) (03/23/90)

In article <7480@jpl-devvax.JPL.NASA.GOV> lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) writes:
>: Does perl release memory previously used by an associative array
>: when you do %array = (); ?

>You should first of all check with ps to see if the process is actually
>growing.  If it's not, you're barking up the wrong hash table.

Well, it turns out that the arrays and subroutines don't have anything
to do with it.  Memory is being gobbled by simple scalar assignments
out of the pattern match chunks.  With patchlevel 14 on AT&T SysvR3.2.1
'386 unix, each pass adds about 7 blocks to the memory size shown by ps.
Commenting out the: $show = "$` <-> $'" ; fixes it.  Help!!! 

# test perl memory usage
print `ps -fl |grep perl` ;
$pre =  'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'  ;
$post = 'zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz' ;
for $z (1..50) {
        print "Pass $z \n";
        for  $x (1..1000) {
                $foo{$x} = $pre . $x . $post ; # assoc array
        }
        @keylist = sort(keys(%foo));
        while ( $keyitem = shift(@keylist)) {
                if ( $foo{$keyitem} =~ /4/ ) {
        #### THIS NEXT LINE IS THE PROBLEM ######
                        $show = "$` <-> $'" ;
        #               print "$show \n" ;
                }
        }
        %foo = ();
        print `ps -fl |grep perl` ;
}
#------
Les Mikesell
  les@chinet.chi.il.us

lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (03/23/90)

In article <1990Mar22.210715.4191@chinet.chi.il.us> les@chinet.chi.il.us (Leslie Mikesell) writes:
: Well, it turns out that the arrays and subroutines don't have anything
: to do with it.  Memory is being gobbled by simple scalar assignments
: out of the pattern match chunks.  With patchlevel 14 on AT&T SysvR3.2.1
: '386 unix, each pass adds about 7 blocks to the memory size shown by ps.
: Commenting out the: $show = "$` <-> $'" ; fixes it.  Help!!! 

Well, actually, it turns out to be in the pattern match itself.  The reason
the assignment enabled it is that pattern matching has to do extra junk
if any of $&, $` or $' are mentioned elsewhere in the program.  It was
in the extra stuff that the booboo was.

Fixed in 16.

Larry

les@chinet.chi.il.us (Leslie Mikesell) (03/27/90)

In article <7515@jpl-devvax.JPL.NASA.GOV> lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) writes:

>Well, actually, it turns out to be in the pattern match itself.  The reason
>the assignment enabled it is that pattern matching has to do extra junk
>if any of $&, $` or $' are mentioned elsewhere in the program. 

Perhaps a funky regexp would work better anyway, but I couldn't come
up with one.  I'm trying to merge items like:

identifer (used for key in associative array)
text (multi-line)
SUMMARY:
summary-text (multi-line)
STATUS:
text ...

If I find an updated item without the SUMMARY: entry, I want to grab the
summary-text from the old entry and insert it into the new above the
STATUS line.  My first attempt at pattern-matching with bracketed substrings
failed on these multi-line strings, so I switched to the $` and $' and
some tmp variables.  Is there a better way?  Note that I don't know which
(if either) entry contains the SUMMARY: or that an old entry even exists,
so the ability to test the success of the individual matches is handy.

Les Mikesell
  les@chinet.chi.il.us

lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (03/27/90)

In article <1990Mar26.174959.20102@chinet.chi.il.us> les@chinet.chi.il.us (Leslie Mikesell) writes:
: Perhaps a funky regexp would work better anyway, but I couldn't come
: up with one.  I'm trying to merge items like:
: 
: identifer (used for key in associative array)
: text (multi-line)
: SUMMARY:
: summary-text (multi-line)
: STATUS:
: text ...
: 
: If I find an updated item without the SUMMARY: entry, I want to grab the
: summary-text from the old entry and insert it into the new above the
: STATUS line.  My first attempt at pattern-matching with bracketed substrings
: failed on these multi-line strings, so I switched to the $` and $' and
: some tmp variables.  Is there a better way?  Note that I don't know which
: (if either) entry contains the SUMMARY: or that an old entry even exists,
: so the ability to test the success of the individual matches is handy.

I'd probably write this as

	$* = 1;
	if ($new !~ /\nSUMMARY:\n/) {
	    if (($was) = ($old =~ /^SUMMARY:\n([^\0]*)^STATUS:/)) {
		substr($new,index($new,"STATUS:\n"),0) = $was;
	    }
	}

or some such.  The thing to remember is that . doesn't match newline,
so use [^\0] to match newlines too.  (On older patchlevels you may have
to say \000 instead.)

Depending on the sizes of the relative text sections, it might be faster
to do it all with index, since [^\0]* has to match all the way to the
end and then back off.

	if (index($new, "\nSUMMARY:\n") < $[) {
	    $beg = index($old, "\nSUMMARY:\n");
	    if ($beg >= $[) {
		$end = index($old, "\nSTATUS:\n");
		substr($new,index($new,"STATUS:\n"),0) = 
		    substr($old,$beg + 10, $end - $beg - 9);;
	    }
	}

If there are more headers than that, it often becomes worthwhile to
take a parsing pass on it and put the entries into separate variables
or entries in an associative array.  Then you end up with wonderful
statements like

	$new{'SUMMARY'} = $old{'SUMMARY'} unless $new{'SUMMARY'};

A funky split like

	@new = split(/(^[A-Z]+):\n/,$new);
	unshift(@new,"FRONTSTUFF");
	%new = @new;		# alternating keys and values

comes to mind.  But that's probably not worthwhile for your thing.

Larry

les@chinet.chi.il.us (Leslie Mikesell) (03/28/90)

In article <7556@jpl-devvax.JPL.NASA.GOV> lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) writes:

>	$* = 1;
>	if ($new !~ /\nSUMMARY:\n/) {
>	    if (($was) = ($old =~ /^SUMMARY:\n([^\0]*)^STATUS:/)) {
>		substr($new,index($new,"STATUS:\n"),0) = $was;
>	    }
>	}
Yep, that does it! (Actually, I need the SUBJECT: line also, but I get
the idea).  Thanks!

Now,
  Just One More Thing

Wouldn't it be nice if perl had a built-in "compress" function for things
that are really too small to fork a process for each item (say about
news article size...) and you don't want to mung them all together so
that you have to uncompress the whole mess to find one?

Les Mikesell
  les@chinet.chi.il.us