[comp.lang.perl] wc clone

tchrist@convex.COM (Tom Christiansen) (12/20/90)

After reading a comment from Barry Shein on his having written a complete
wc program in just 6 lines of Snobol in alt.religion.computers, I set out
to do so in perl.  Well, I'm down to 7 of perl, but without using
comma-kludges, I can't quite trim off one more line.  I've only thought
about it for 5 minutes or so, I do admit.  Maybe I'm missing something
obvious.

    #!/usr/bin/perl -n
    $tchars += $chars += length;
    $twords += $words += s/(\S+)/$&/g;
    next unless eof;
    printf "%8d %8d %8d %s\n", $., $words, $chars, ($ARGV eq '-'?'':$ARGV);
    $tlines += ($.+0) + ($. = !reset wc);
    next unless $files++ && eof();
    printf "%8d %8d %8d %s\n", $tlines, $twords, $tchars, "total";

--tom
--
Tom Christiansen		tchrist@convex.com	convex!tchrist
"With a kernel dive, all things are possible, but it sure makes it hard
 to look at yourself in the mirror the next morning."  -me

tchrist@convex.COM (Tom Christiansen) (12/20/90)

I save a LOT of time if I make the part that counts words throw out the
string instead of save it:

    $twords += $words += s/\S+//g;

I go from 1.7 seconds on /etc/termcap to 1.0 seconds.    It's still
a long ways from the C wc's 0.25 seconds, but not too shabby either.
Anybody got a cleverer (and faster) algorithm?

I sometimes wish I could just count the number of occurrences without
having to make a new string.  Not very often, though.

--tom
--
Tom Christiansen		tchrist@convex.com	convex!tchrist
"With a kernel dive, all things are possible, but it sure makes it hard
 to look at yourself in the mirror the next morning."  -me

tchrist@convex.COM (Tom Christiansen) (12/21/90)

From the keyboard of bzs@world.std.com (Barry Shein):

:Imagine, in perl, if you could insert any expression midway into a
:pattern so whenever whitespace was hit you could increment $words
:right there, let's say "/(\S+)@$words++@/" was a pattern which
:incremented $words every time a space run was found, that's a common
:thing in snobol. Your example is darn close, tho, at least the loop
:has been eliminated via the use of /g, that was the spirit of the
:thing.
:
:Just various ways to find mapcar nirvana...

(Barry, are you sure you're not just leading me on? :-)

Permit me to introduce you to an eval in another guise, the /e modifier:

[this one even works -- the last one had a bug. :-(]

0  #!/usr/bin/perl -n
1  $chars += length; 
2  s/\S+/$words++/eg;
3  next unless eof;
4  printf "%8d %8d %8d %s\n", $., $words, $chars, ($ARGV eq '-'?'':$ARGV);
5  $tlines += $.; $twords += words; $tchars += $chars; reset 'wc'; $. = 0;
6  next unless $files++ && eof();
7  printf "%8d %8d %8d %s\n", $tlines, $twords, $tchars, "total";

While evals are pretty neat, this one does slow things down a lot.  Putting
it in the line loop like that makes this program run 3.5 times longer
than with line two as simply:

    $words += s/\S+//g;

Better to count when all done in this case, but there are lots of more
complex and interesting things you can do with /e easily.  Let's say you
also want to create a count of all the distinct words and then print them
out in descending numeric order by count at the end of each file:

    #!/usr/bin/perl -n
    s/\S+/$saw{$&}++/eg;
    next unless eof;
    sub down { $saw{$b} <=> $saw{$a}; }
    for (sort down keys %saw) { printf "%8d %s\n", $saw{$_}, $_; };

Is that the kind of thing you are looking for, Barry?  

We've strayed pretty from religion here (I think:-), so I'm redirecting
followups back into comp.lang.perl.  

--tom
--
Tom Christiansen		tchrist@convex.com	convex!tchrist
"With a kernel dive, all things are possible, but it sure makes it hard
 to look at yourself in the mirror the next morning."  -me

tchrist@convex.COM (Tom Christiansen) (12/21/90)

Shorter stil:

0 #!/usr/bin/perl -n
1 $chars += length, $words += s/\S+//eg, next unless eof;
2 printf "%8d %8d %8d %s\n", $., $words, $chars, ($ARGV eq '-'?'':$ARGV);
3 $tlines += $.; $twords += $words; $tchars += $chars;reset 'wc'; $. = 0;
4 printf "%8d %8d %8d %s\n", $tlines, $twords, $tchars, "total" if $files++ && eof();

Thanks to Randal for squishing up the last few lines.

--tom
--
Tom Christiansen		tchrist@convex.com	convex!tchrist
"With a kernel dive, all things are possible, but it sure makes it hard
 to look at yourself in the mirror the next morning."  -me

merlyn@iwarp.intel.com (Randal L. Schwartz) (12/22/90)

In article <1990Dec21.110952.27897@convex.com>, tchrist@convex (Tom Christiansen) writes:
| 3 $tlines += $.; $twords += $words; $tchars += $chars;reset 'wc'; $. = 0;

I don't like the 'wc' in there.

Here's a weirder way, that actually says what you are doing more
clearly:

($tlines, $twords, $tchars, $., $words, $chars) =
	($tlines + $., $twords + $words, $tchars + $chars, 0, 0, 0);

(With short enough variable names, this fits on one line easily.)

@a[3,2,1,0] = ("hacker,","Perl","another","Just"); print "@a"
-- 
/=Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ==========\
| on contract to Intel's iWarp project, Beaverton, Oregon, USA, Sol III      |
| merlyn@iwarp.intel.com ...!any-MX-mailer-like-uunet!iwarp.intel.com!merlyn |
\=Cute Quote: "Intel: putting the 'backward' in 'backward compatible'..."====/