ben@cunixf.cc.columbia.edu (Ben Fried) (12/07/90)
I've written a filter in perl to do printer page counting for DEC ln03 printers. Right now it seems to be working fine, but it runs abysmally slowly. The vast majority of time is taken up in the following loop: mainloop: while(read(STDIN, $_, 2048)) { foreach $i ($[..length($_)) { $num_to_print = $i; $oc=$c; $c = substr($_, $i, 1); if ($tektronixmode) { if (($c eq "\f") && ($oc eq "\033")) { $pageno++; } } else { if ($ascii_mode) { if ($c eq "\n") { $lineno++; if ($lineno == $lines_per_page) { $lineno = 0; $pageno++; } } } else { if ($c eq "\f") { $lineno = 0; $pageno++; } } } if ($pageno >= $max_pages) { $debug && print STDERR "aborted at page $pageno\n"; print STDOUT substr($_, $[, $num_to_print); last mainloop; } } print STDOUT; } Can anyone give me some pointers to speed this up? I already tried using ord to cast $c and $oc to numbers, and using integer comparisons agains the ords of \n, \f, and \033, but that actually ran slower than the above code. All that really matters is that I be able to count the number of pages on the input stream, and be able to break out of the loop after either (a) EOF on stdin, or (b) $pageno == $max_pages, whichever comes first. I was thinking of doing a split(//), and then using grep to count occurrences, but couldn't think of a good way of stopping when $pageno hits $max_pages. Now that I think about it, maybe I should be able to do this, I just have to come up with a more complicated EXPR for grep. Perhaps it's time I went through the JAPH collection more closely. Still, would splitting input into arrays and using grep be any faster? Ben -- Benjamin Fried ben@cunixf.cc.columbia.edu rutgers!columbia!ben
lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (12/07/90)
In article <1990Dec6.194412.9132@cunixf.cc.columbia.edu> ben@cunixf.cc.columbia.edu (Ben Fried) writes: : I've written a filter in perl to do printer page counting for DEC ln03 : printers. Right now it seems to be working fine, but it runs abysmally : slowly. The vast majority of time is taken up in the following loop: : : mainloop: : while(read(STDIN, $_, 2048)) { : foreach $i ($[..length($_)) { : $num_to_print = $i; : $oc=$c; : $c = substr($_, $i, 1); : if ($tektronixmode) { : if (($c eq "\f") && ($oc eq "\033")) { : $pageno++; : } : } else { : if ($ascii_mode) { : if ($c eq "\n") { : $lineno++; : if ($lineno == $lines_per_page) { : $lineno = 0; : $pageno++; : } : } : } else { : if ($c eq "\f") { : $lineno = 0; : $pageno++; : } : } : } : : if ($pageno >= $max_pages) { : $debug && print STDERR "aborted at page $pageno\n"; : print STDOUT substr($_, $[, $num_to_print); : last mainloop; : } : : } : : print STDOUT; : } That looks like a reasonable C solution, though even in C I'd avoid testing all those conditionals on every character. Here's an equivalent in Perl. if ($tektronixmode) { $/ = "\f"; while (<>) { $pageno++ if /\033\f$/; &blowup, last if $pageno > $max_pages; print; } } elsif ($ascii_mode) { $max_lines = $max_pages * $lines_per_page; while (<>) { &blowup, last if $. > $max_lines; print; } } else { $/ = "\f"; while (<>) { &blowup, last if $. > $max_pages; } } sub blowup { $debug && print STDERR "aborted at page $pageno\n"; } I'm not sure that $ascii_mode is a valid distinction, however. Perhaps you really mean: if ($tektronixmode) { $/ = "\f"; while (<>) { $pageno++ if /\033\f$/; &blowup, last if $pageno > $max_pages; print; } } else { while (<>) { $lineno++; $pageno++, $lineno = 0 if /\f/; $pageno++, $lineno = 0 if $lineno > $lines_per_page; &blowup, last if $pageno > $max_pages; print; } } Or something like that... Larry
merlyn@iwarp.intel.com (Randal Schwartz) (12/07/90)
In article <1990Dec6.194412.9132@cunixf.cc.columbia.edu>, ben@cunixf (Ben Fried) writes: | mainloop: | while(read(STDIN, $_, 2048)) { | foreach $i ($[..length($_)) { | $num_to_print = $i; | $oc=$c; | $c = substr($_, $i, 1); | if ($tektronixmode) { | if (($c eq "\f") && ($oc eq "\033")) { | $pageno++; | } | } else { | if ($ascii_mode) { | if ($c eq "\n") { | $lineno++; | if ($lineno == $lines_per_page) { | $lineno = 0; | $pageno++; | } | } | } else { | if ($c eq "\f") { | $lineno = 0; | $pageno++; | } | } | } | | if ($pageno >= $max_pages) { | $debug && print STDERR "aborted at page $pageno\n"; | print STDOUT substr($_, $[, $num_to_print); | last mainloop; | } | | } | | print STDOUT; | } Hmm. That's probably the right way to do it in some other language besides Perl. (It smells vaguely BASIC-like, or maybe C-like.) Anyway, as a first cut, I'd do something like: $lines_per_page = 66; $max_pages = 100; # you forgot this while (<>) { $lineno++; $pageno++, $lineno = 0 if /\f/ || ($lineno >= $lines_per_page); die "Aborted at page $pageno\n" if $pageno >= $max_pages; print; } Yes? Maybe? OK, so I didn't add the tek vs. ascii mode, but that'd only make it one line longer. :-) I think you'll find this runs a teensy-tiny bit faster. :-) print "Just another Perl hacker," -- /=Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ==========\ | on contract to Intel's iWarp project, Beaverton, Oregon, USA, Sol III | | merlyn@iwarp.intel.com ...!any-MX-mailer-like-uunet!iwarp.intel.com!merlyn | \=Cute Quote: "Intel: putting the 'backward' in 'backward compatible'..."====/
ben@cunixf.cc.columbia.edu (Ben Fried) (12/07/90)
In <1990Dec6.205447.13500@iwarp.intel.com> Randal writes: [ My source code ] > Hmm. That's probably the right way to do it in some other language > besides Perl. (It smells vaguely BASIC-like, or maybe C-like.) > > Anyway, as a first cut, I'd do something like: > > $lines_per_page = 66; $max_pages = 100; # you forgot this They're set elsewhere in the program; sorry for not including that. > while (<>) { > $lineno++; > $pageno++, $lineno = 0 if /\f/ || ($lineno >= $lines_per_page); > die "Aborted at page $pageno\n" if $pageno >= $max_pages; > print; > } > > Yes? Maybe? OK, so I didn't add the tek vs. ascii mode, but that'd > only make it one line longer. :-) > > I think you'll find this runs a teensy-tiny bit faster. :-) I failed to adequately explain my problem: only in about 1/3 of the cases are there newlines in the input, in the rest they may not appear at all, or if they do they do not delimit lines, but appear as binary data. In those cases (plain ascii files), Randal's code is 100% correct. However, in the remainder of the cases, I cannot depend on newlines or any other line (or record, or whatever you want to call it) delimiter being present to allow easy use of the while (<>) construct. In fact, one might see a sequence sort of like: \n\f\n\n\n\nxxxxweriuweru925349853495\033\fasdfsdfasdf\033\f9r8\f3495\fasasf\n In this case, the \n's and plain \f's should be ignored - they're data, not line separators - pages are separated by \033\f, and plain \f's are just data, and max_pages might very well be reached after the first \033\f combination there. In other cases, though, \f by itself is a page-eject; it depends on what mode the printer is in. > print "Just another Perl hacker," > -- Whatever... Ben -- Benjamin Fried ben@cunixf.cc.columbia.edu rutgers!columbia!ben
worley@compass.com (Dale Worley) (12/08/90)
From: merlyn@iwarp.intel.com (Randal Schwartz)
$lines_per_page = 66; $max_pages = 100; # you forgot this
while (<>) {
$lineno++;
$pageno++, $lineno = 0 if /\f/ || ($lineno >= $lines_per_page);
die "Aborted at page $pageno\n" if $pageno >= $max_pages;
print;
}
Well, to start with, if there are two form-feeds in one line, $pageno
will only get incremented by 1.
I may be wrong, but to find out the number of times a regexp matches a
string, you have to write an explicit loop.
Dale Worley Compass, Inc. worley@compass.com
--
In _EE_Times_, 16 October 1989, Intel VP David House says: "Bill Gates
[software maker Microsoft's chairman] says no matter how much more power we
can supply, he'll develop some really exciting software that will bring the
machine to its knees."
carl@udwarf.tymnet.com (Carl Baltrunas & Cherie Marinelli 0.1.9) (12/08/90)
In article <1990Dec6.194412.9132@cunixf.cc.columbia.edu>, ben@cunixf.cc.columbia.edu (Ben Fried) writes: > > I've written a filter in perl to do printer page counting for DEC ln03 > printers. Right now it seems to be working fine, but it runs abysmally > slowly. The vast majority of time is taken up in the following loop: > > [perl script deleted] > > All that really matters is that I be able to count the > number of pages on the input stream, and be able to break out of the > loop after either (a) EOF on stdin, or (b) $pageno == $max_pages, > whichever comes first. > ...more stuff deleted... > Ben > -- > Benjamin Fried > ben@cunixf.cc.columbia.edu rutgers!columbia!ben I know that Ben didn't mention this, and so it wasn't in Randal's or Larry's responses, so I thought that I'd bring it up. (I've had to write filters on non-unix machines to create microfiche-able output, so I've had to do this). What about line length? (Page width or whatever you want to call it) ..and what about line wrap on long lines. Some print filters do these things for you, and this changes the the characteristics of the output from straight \n and \f counting. So, can you tell perl that while reading input lines to break lines at the point you want to make it wrap, or do you have to use something like the following modifications to randal's response: $lines_per_page = 66; $max_pages = 100; # you forgot this while (<>) { $lineno += ( (length($_) + $line_width - 1) / $line_width ); $pageno += ( $lineno + $lines_per_page ) / $lines_per_page, \ $lineno = 0 if /\f/; $pageno++, $lineno -= $lines_per_page if ($lineno >= $lines_per_page); die "Aborted at page $pageno\n" if $pageno >= $max_pages; print; } And one thing I left out... what to do about \t's (TABS) in text, since they push the line width quite a bit depending on position. (I won't mention all those [which font?] [what about kerning?] word processing things. They must require handling each byte. :-) PS. I guess an array of flags and point sizes could be built first based on the font used and a state machine that uses the byte value to lookup the width of each character and flags to add kerning offsets and can we break the line here... would make a pretty fast way of processing it. Randal, can this be done easily in perl? How about a 1 liner? -Carl Carl A Baltrunas - Catalyst Art, Cherie Marinelli - Bijoux {sumex, apple}!oliveb!tymix!tardis!udwarf!{carl or cherie} {carl or cherie}%udwarf@tardis.tymnet.com {carl or cherie}@udwarf.tymnet.com
ben@cunixf.cc.columbia.edu (Ben Fried) (12/10/90)
In article <1990Dec7.212127.8161@uvaarpa.Virginia.EDU> Dale Worley writes: > From: merlyn@iwarp.intel.com (Randal Schwartz) > > $lines_per_page = 66; $max_pages = 100; # you forgot this > while (<>) { > $lineno++; > $pageno++, $lineno = 0 if /\f/ || ($lineno >= $lines_per_page); > die "Aborted at page $pageno\n" if $pageno >= $max_pages; > print; > } > > Well, to start with, if there are two form-feeds in one line, $pageno > will only get incremented by 1. > > I may be wrong, but to find out the number of times a regexp matches a > string, you have to write an explicit loop. This is a good point - neither Randal's nor Larry's posted code handles this case. Larry's code for non-plaintext files works just great: in those cases, the _only_ way that the printer can reach end of page is by a \f or an \033\f, depending on mode; his idea of setting $/ to \f (and in the second case matching against /\033\f$/) works great. This made me wish that perl had some operator for determining the count of matches of a pattern; is that something that sounds generally useful? > Dale Worley Compass, Inc. worley@compass.com Ben -- Benjamin Fried ben@cunixf.cc.columbia.edu rutgers!columbia!ben
lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (12/10/90)
In article <1990Dec10.022814.5349@cunixf.cc.columbia.edu> ben@cunixf.cc.columbia.edu (Ben Fried) writes:
: This made me wish that perl had some operator for determining the count
: of matches of a pattern; is that something that sounds generally useful?
Look up what the return value is for either tr/// or s///g. Guess what?
Larry