tbc@hp-lsd.COS.HP.COM (Tim Chambers) (10/09/90)
I have just written my first perl script. Ok, so it's not as cryptic as TECO (which I never did master) but it's still a pretty nonintuitive language to me. So I ask the perl wizards to help me get used to programming in this new beasty. I'm enclosing the perl script, the sh script I started with, and an example of the program's input. My questions are: 1. Why doesn't the "next unless" line I've commented out do what I want it to? (Trust me, it doesn't on my HP 9000 Series 300 HP-UX implementation.) 2. Although I realize this isn't a teaching center, would anyone care to offer some advice on how to optimize my perl implementation of the algorithm? The algorithm I use is (1) capture in an array, (2) sort the array, (3) format and print the array elements. I found it clumsy to do all in perl, but it still runs faster than the filters I use in the shell script. Are there perl-isms I am missing that make the algorithm run more efficiently? (Especially my by_free sorting subroutine.) ############################################################################## #!/usr/local/bin/perl # $Header: mydf,v 1.3 90/10/08 17:17:19 tbc Exp $ sub by_free { @a = split(' ', $a); @b = split(' ', $b); $a[3] lt $b[3] ? 1 : $a[3] gt $b[3] ? -1 : 0; } open (BDF_PIPE, "bdf $* |"); # output print "Filesystem Free (Mb) %Used\n"; print "------------------------------|---------|-----\n"; for ($i=0;<BDF_PIPE>;$i++) { # why doesn't this work?! # next unless (/^.*\%.*$/); next unless /^[^ ]+ /; next if /Mounted/; @lines[$i] = $_; }; close (BDF_PIPE); @sortedlines = sort by_free @lines; foreach $line (@sortedlines) { ($f1, $f2, $f3, $avail, $capacity, $dirname) = split(' ', $line); printf '%s', $dirname; for ($i = 30; $i > length($dirname); $i--) { printf ('.'); } printf "%10.1f%6s\n", $avail / 1000, $capacity; }; print "\n"; ############################################################################## #!/bin/sh # $Header: mydf,v 1.2 90/09/06 08:42:15 tbc Exp $ echo "Filesystem Free (Mb) %Used" echo "------------------------------|---------|-----" # first filter to get around bdf's "pretty" format, then use awk bdf $* | cut -c36-80 | fgrep '%' | sort -nr | \ awk '{ printf("%s", $3) for (i = 30 ; i > length($3) ; i --) printf(".") printf ("%10.1f%6s\n", $1 / 1000, $2) }' echo ############################################################################## # this is what "bdf" does, in case it's not on non-HP-UX systems $ bdf Filesystem kbytes used avail capacity Mounted on /dev/root 487022 405811 32508 93% / /dev/dsk/17s7 278082 240761 23416 91% /dsk17.07 /dev/dsk/17s3 278082 202040 48233 81% /users4 /dev/dsk/0s2 278082 188859 61414 75% /disk3 /dev/dsk/0s4 119429 74980 32506 70% /users2 /dev/dsk/0s3 230127 166056 41058 80% /users hp-lsd:/usr/spool/notes 387657 248645 131258 65% /usr/spool/notes hplsdry:/work2/hp64000/hptwk/twk_db 90607 69076 17000 80% /usr/hp64000/hptwk/twk_db
lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (10/24/90)
In article <18840001@hp-lsd.COS.HP.COM> tbc@hp-lsd.COS.HP.COM (Tim Chambers) writes: : I have just written my first perl script. Ok, so it's not as cryptic as TECO : (which I never did master) but it's still a pretty nonintuitive language to : me. Strange, seems pretty intuitive to me... :-) And no, I wasn't the person who wrote a Fortran compiler in TECO. : So I ask the perl wizards to help me get used to programming in this new : beasty. I'm enclosing the perl script, the sh script I started with, and an : example of the program's input. My questions are: : : 1. Why doesn't the "next unless" line I've commented out do what I want it : to? (Trust me, it doesn't on my HP 9000 Series 300 HP-UX implementation.) Probably because of a bug I just fixed, presuming you're on patchlevel 36 or 37. : 2. Although I realize this isn't a teaching center, would anyone care to : offer some advice on how to optimize my perl implementation of the : algorithm? The algorithm I use is (1) capture in an array, (2) sort the : array, (3) format and print the array elements. I found it clumsy to do : all in perl, but it still runs faster than the filters I use in the shell : script. Are there perl-isms I am missing that make the algorithm run more : efficiently? (Especially my by_free sorting subroutine.) I'll talk about your script point by point, and then give how I'd write it. : ############################################################################## : #!/usr/local/bin/perl : # $Header: mydf,v 1.3 90/10/08 17:17:19 tbc Exp $ : : sub by_free { : @a = split(' ', $a); : @b = split(' ', $b); You realize that this splits each line multiple times? This is most of your extra overhead. : $a[3] lt $b[3] ? 1 : $a[3] gt $b[3] ? -1 : 0; As of 36, you can just say $b[3] <=> $a[3]; : } : : open (BDF_PIPE, "bdf $* |"); A nit, but you want open (BDF_PIPE, "bdf @ARGV |"); or better, open (BDF_PIPE, "bdf @ARGV |") || die "Can't run bdf: $!\n"; : # output : print "Filesystem Free (Mb) %Used\n"; : print "------------------------------|---------|-----\n"; Maybe a hair more efficient and readable (your call) to say: print <<EndHeader; Filesystem Free (Mb) %Used ------------------------------|---------|----- EndHeader : for ($i=0;<BDF_PIPE>;$i++) { In general, use of index variables (such as $i) is a warning sign that there's a better way to do it in Perl. : # why doesn't this work?! : # next unless (/^.*\%.*$/); The bug I mentioned. The optimizer thought it could do tail matching on /%$/. But it's silly to write the pattern that way anyway. Why not next unless /%/; But it's better not to reject lines based on something toward the end of the line. : next unless /^[^ ]+ /; This is better, in that it fails at the beginning of the line on the lines it fails on. But when it succeeds, why check more than one character? I'd also invert the logic: next if /^ /; : next if /Mounted/; Again, this is checking for something at the end. Why not this? next if /^Filesystem/; : @lines[$i] = $_; This also stores info you aren't going to use later. In my example below I'll turn this logic inside out. : }; : : close (BDF_PIPE); Not strictly necessary on an input pipe, but it doesn't hurt anything, and is good documentation. It IS necessary to close OUTPUT pipes when you're done with them. : @sortedlines = sort by_free @lines; This is basically an extra scratch variable, since you can insert the sort into the foreach. (Though it doesn't hurt, because when you insert the sort into the foreach, it ends up making an internal scratch variable anyway...) : foreach $line (@sortedlines) { : ($f1, $f2, $f3, $avail, $capacity, $dirname) = split(' ', $line); Yet another split. If you split on /\s+/, you could do the nfs filesystems too, since $f1 would just be null. I include the nfs filesystems in mine. : printf '%s', $dirname; printf is slower than print. Should be just print $dirname; : for ($i = 30; $i > length($dirname); $i--) { : printf ('.'); : } Again, use print, not printf. Better, don't use a loop at all: print '.' x (30 - length($dirname)); : printf "%10.1f%6s\n", $avail / 1000, $capacity; : }; : : print "\n"; Okay, here's my version: #!/usr/local/bin/perl open (BDF_PIPE, "bdf @ARGV |"); print <<End; Filesystem Free (Mb) %Used ------------------------------|---------|----- End while (<BDF_PIPE>) { ($fs, $kbytes, $used, $avail, $capacity, $dirname) = split; next unless $kbytes > 0; # next unless $fs; # uncomment to delete nfs filesystems $line{$avail} = sprintf("%30.30s%10.1f%6s\n", $dirname . '.' x 30, $avail / 1000, $capacity); } sub revnum { $b <=> $a; } foreach $key (sort revnum keys(%line)) { print $line{$key}; } print "\n"; We split just once per line, using a split that will leave $fs null on the nfs lines. We find a single test that weeds out both the header and the 1st of each pair of nfs lines. We make an associative array keyed on the field we will sort on, formatting the output string right then and there. After we've finished the input, we just do a simple reverse numeric sort on the keys of the line array, and print out each value. Actually, we could scrap that loop too and use a slice like this: print @line{sort revnum keys(%line)}, "\n"; That *doesn't* make a scratch variable internally. Larry
tchrist@convex.COM (Tom Christiansen) (10/24/90)
In article <10080@jpl-devvax.JPL.NASA.GOV> lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) writes an excellent analysis of how to optimize a beginning perl script. One little picky point I have is the pipe open. I first posted about this kind of thing in <5016@convex.convex.com> entitled "Testing successful pipe opens" on 31 January for archive nuts. It's been awhile, so I'll bring it up again for people who weren't reading then. > open (BDF_PIPE, "bdf @ARGV |"); > >or better, > > open (BDF_PIPE, "bdf @ARGV |") || die "Can't run bdf: $!\n"; > That will almost never fail. You might get back EGAIN or EPROCLIM due to fork failure, but far more often, the fork succeeds but the execvp fails, and you get no notification of this. This is exactly what happened to me when I ran the script, since on my system, the command is "df" not "bdf". This is one of those little gotchas in perl. One solution is to test for whether $? is set after the close. Note that Larry said that close is not strictly necessary on an input pipe. This, I would say, is a good enough reason to close it and make sure the close went well. Thus: close BDF_PIPE || die "bdf pipe didn't close: $?"; although at PL18 the close doesn't correctly fail on setting $?; it does do the right thing at later patches. For earlier versions, use: close BDF_PIPE; die "bdf pipe didn't close: $?" if $?; You can also find out at the time of the open, although this is trickier. A solution for a command without metacharacters would be like this: ($pid = open (BDF_PIPE, "bdf @ARGV |")) && kill(0, $pid) || die "can't start pipe: $!"; or for people preferring C syntax to shell: if (!($pid = open (BDF_PIPE, "bdf @ARGV |")) || !kill(0, $pid)) { die "can't start pipe: $!"; } If, however, there are shell meta-characters, the shell may not yet have failed by the time the kill happens, so for that case you would want to introduce a delay with sleep, (although I dislike it) as in: ($pid = open (BDF_PIPE, "cmd >foo |")) && (sleep(1), kill(0, $pid)) || die "can't start pipe: $!"; or ($pid = open(BDF_PIPE, "$cmd >$foo |")) || die "fork failed: $!"; sleep 1; kill(0, $pid) || die "cmd \"$cmd > $foo\" failed: $?"; Note that if you're opening for output to a non-existent command, you will eventually get hit with a SIGPIPE. I often install a plumber function for this: sub plumber { die "$0: plumber: broken pipe"; } SIG{'PIPE'} = 'plumber'; I thought that now that we have caller(), I should be able to catch where this actually happened to me, but that doesn't seem to work. It complains that "There is no caller" when I try to use caller from the &plumber routine, which is too bad. --tom
fuchs@it.uka.de (Harald Fuchs) (10/24/90)
lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) writes: >: open (BDF_PIPE, "bdf $* |"); >A nit, but you want > open (BDF_PIPE, "bdf @ARGV |"); >or better, > open (BDF_PIPE, "bdf @ARGV |") || die "Can't run bdf: $!\n"; Why? open() returns the pid of the subprocess, and this should never be zero even if there is no executable `bdf' in our $PATH. Is there a canonical way to check if an open() involving a pipe succeeded? -- Harald Fuchs <fuchs@it.uka.de> <fuchs%it.uka.de@relay.cs.net> ... <fuchs@telematik.informatik.uni-karlsruhe.dbp.de> *gulp*
dave@vlsi-mentor.jpl.nasa.gov (David Hayes) (10/25/90)
lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) writes: >In general, use of index variables (such as $i) is a warning sign that >there's a better way to do it in Perl. Really? Please enlighten me.... I want to process an array of stuff @stuff. Lets say that I want to do a "tolower" on each element of stuff (an array of text strings). The current way I do this is (yes I am a C programmer to the core): for($i=0; $i<$#stuff; $i++) { $stuff[$i] =~ tr/A-Z/a-z/; } Is there a better way? ---- Dave Hayes dave@vlsi-mentor.jpl.nasa.gov {ucbvax,ames}!elroy!vlsi-mentor!dave "Attempt to find truth by realizing that it will generally find YOU." -- ---- Dave Hayes dave@vlsi-mentor.jpl.nasa.gov {ucbvax,ames}!elroy!vlsi-mentor!dave "Attempt to find truth by realizing that it will generally find YOU."
lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (10/25/90)
In article <1990Oct25.063526.1571@vlsi-mentor.jpl.nasa.gov> dave@vlsi-mentor.jpl.nasa.gov (David Hayes) writes: : lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) writes: : >In general, use of index variables (such as $i) is a warning sign that : >there's a better way to do it in Perl. : : Really? Please enlighten me.... : : I want to process an array of stuff @stuff. Lets say that I want to : do a "tolower" on each element of stuff (an array of text strings). : : The current way I do this is (yes I am a C programmer to the core): : : for($i=0; $i<$#stuff; $i++) { : $stuff[$i] =~ tr/A-Z/a-z/; : } : : Is there a better way? Yes. There are some iterators that cause the loop variable to refer to each array element directly, much like a C programmer would use a pointer to iterate through an array, only without having to dereference anything. You can simply say foreach $elem (@stuff) { $elem =~ tr/A-Z/a-z/; } or, more briefly, for (@stuff) { # equivalent to "foreach $_ (@stuff)" tr/A-Z/a-z/; } or even more briefly, grep(tr/A-Z/a-z/, @stuff); The other place that index variables pop up unnecessarily is when people write linear searches down a numerically indexed array, when they should be using associative arrays to do a direct lookup, or using one of the other built-in searching mechanisms. Larry
lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (10/26/90)
In article <107608@convex.convex.com> tchrist@convex.COM (Tom Christiansen) writes: : You can also find out at the time of the open, although this is trickier. : A solution for a command without metacharacters would be like this: : : ($pid = open (BDF_PIPE, "bdf @ARGV |")) && kill(0, $pid) : || die "can't start pipe: $!"; : : or for people preferring C syntax to shell: : : if (!($pid = open (BDF_PIPE, "bdf @ARGV |")) || !kill(0, $pid)) : { die "can't start pipe: $!"; } : : : If, however, there are shell meta-characters, the shell may not yet : have failed by the time the kill happens... It's worse than that. You're assuming vfork semantics. If they don't have vfork, the parent can continue executing even before the child has a chance to exec. : ...so for that case you would : want to introduce a delay with sleep, (although I dislike it) as in: : : ($pid = open (BDF_PIPE, "cmd >foo |")) && (sleep(1), kill(0, $pid)) : || die "can't start pipe: $!"; : : or : : ($pid = open(BDF_PIPE, "$cmd >$foo |")) || die "fork failed: $!"; : sleep 1; : kill(0, $pid) || die "cmd \"$cmd > $foo\" failed: $?"; sleep(1) isn't necessarily good enough either, since it can sleep anywhere from 0 to 1 seconds. In general this is unnecessary, since you simply get an immediate eof on the pipe read, and then you can pick up the exit status with close, as Tom rightly pointed out. : Note that if you're opening for output to a non-existent command, you : will eventually get hit with a SIGPIPE. I often install a plumber : function for this: : : sub plumber { die "$0: plumber: broken pipe"; } : SIG{'PIPE'} = 'plumber'; You can arrange for a faulty exec on an input pipe to give you a SIGPIPE too: $pid = open(IN,"-|"); die "Can't fork: $!\n" unless defined $pid; unless ($pid) { exec "program", "that", "may", "not", "exist"; kill('PIPE', getppid); exit 1; } : I thought that now that we have caller(), I should be able to catch : where this actually happened to me, but that doesn't seem to work. It : complains that "There is no caller" when I try to use caller from the : &plumber routine, which is too bad. Yeah, that'd be nice, but signals happen asynchronously, and you don't necessarily have a consistent context, so it currently pretends you don't have a caller. A sort subroutine doesn't have a context either, currently, though that has no problems with context consistency, and could be done fairly efficiently (we'd only have to establish the context once per sort, not once for each sub call). I could probably make signal handlers have a context, but I'll have to look at the consequences a little more. Larry
dupuy@cs.columbia.edu (Alexander Dupuy) (10/29/90)
In article <10080@jpl-devvax.JPL.NASA.GOV> Larry Wall writes: | #!/usr/local/bin/perl | | open (BDF_PIPE, "bdf @ARGV |"); | | print <<End; | Filesystem Free (Mb) %Used | ------------------------------|---------|----- | End | | while (<BDF_PIPE>) { | ($fs, $kbytes, $used, $avail, $capacity, $dirname) = split; | next unless $kbytes > 0; | # next unless $fs; # uncomment to delete nfs filesystems | $line{$avail} = sprintf("%30.30s%10.1f%6s\n", | $dirname . '.' x 30, | $avail / 1000, $capacity); | } | | sub revnum { $b <=> $a; } | | foreach $key (sort revnum keys(%line)) { | print $line{$key}; | } | | print "\n"; | | We split just once per line, using a split that will leave $fs null on | the nfs lines. We find a single test that weeds out both the header | and the 1st of each pair of nfs lines. We make an associative array | keyed on the field we will sort on, formatting the output string | right then and there. Just hope that you never have two filesystems that are both 86% full, since you will only see the output for one of them. You have to be careful when manually optimizing a program that you don't change the semantics inadvertently... I would use the $dirname field as the key, and do an assoc array lookup in the revnum function, with the performance penalty, in order to know that I'm getting the right answer. @alex -- -- inet: dupuy@cs.columbia.edu uucp: ...!rutgers!cs.columbia.edu!dupuy