[comp.lang.perl] perl compared to other Unix tools

stef@zweig.exodus (Stephane Payrard) (12/08/90)

> 
> From: chip@tct.uucp (Chip Salzenberg)
> Newsgroups: comp.lang.perl
> Subject: Re: Needed: a pointer for a perl compare script (long, sorry..)
> Date: 6 Dec 90 17:09:27 GMT
> Organization: Teltronics/TCT, Sarasota, FL
> 
> According to goer@quads.uchicago.edu (Richard L. Goerwitz):
> >Perl is not the only language around that is optimized for file,
> >string, and symbol processing, which has associative arrays, and handles
> >sorting and printing elegantly.  If you can't think of any examples off-
> >hand then mail me, and I'll be glad to provide you with a few.
> 
> Come now, Richard.  If you criticize in public, you must put up your
> facts in public.  Name these other languages.  Oh yes, and please
> include availability and cost information.
> -- 
> Chip Salzenberg at Teltronics/TCT     <chip@tct.uucp>, <uunet!pdn!tct!chip>
>       "I'm really sorry I feel this need to insult some people..."
>             -- John F. Haugh II    (He thinks HE'S sorry?)

I agree very much with Chip; if you know a better tool than perl, you
should not let us in the dark.

I am curious to know which tools are better to do the dirty tasks
which involve string pattern-matching, some non trivial processing and
some system calls with as main input a big file (say .5 to .1 MB) in
a reasonable amount of time (say less than 1 minute) Surely not nawk.

I am sure that nawk, sed ,ex (or any tool (orcombination of) which come
with a "standard Unix distribution) would never allow to write 
the kind of programs I have written with perl:
        -it has not the functionalities offered by perl
        -it has not the performance of perl
        -it offers no direct access to the OS (system-calls)
I don't pretend that perl is an answer to every problem, but it is
certainly the best I know of for the class of program I defined in the
first paragraph of this mail.


The idea of combining basic tools using pipes, backquotes or whatever,
is a UNIX myth propagated by most of the UNIX books.  Each time I have
tried to do a non trivial task this way, it happened to almost be
impossible for a simple minded guy for me ;-).  Each command/shell has
a a different set of metacharacters; this makes the combination of
this atomic tools very tricky ("How many backote should I put before
this character?").  Moreover, none of these tools come with a
debugger.  Anyway, if they did, it would not be very useful if your
script is a complicated combination of those tools.

I am sure when the perl book will come-up, it will be more easy to
learn perl that to acquire the UNIX expertise necessary to use and
combines the UNIX atomic tools or shells (grep, sed, wc, awk.  sh,
csh, expr...).

I am confident that, someday, someone will come with a program which
will allow to use perl as an extensible interactive shell; this will
relegate sh and csh in the rank of historically interesting tools.

In the mean time you need the UNIX expertise because the perl
documentation constantly refers to the UNIX one.

An extreme example of what can be done with perl:

I have written a 600 line program in perl which deals with a
PostScript file generated by FrameMaker; it allows to preview and
interactively browse the corresponding document (using NeWS/TNT); it
extracts information to build menus; one of the menus allows me to go
directly to any chapter of the documentation, keyboard accelerator
allow to go from the current page to the next/previous.  This program
is able to browse a .5MB file, to generate a data file (used for
subsequent runs) and pop-up the browser window in about 30 seconds
using a (Sun 4/110) and assuming the TNT toolkit already loaded.
Subsequents run pop-up the window in 5-6 seconds.  

Dont ask for this program: it used a not yet
released version of TNT and make many assumptions about the browsed
file.

I am quite sure Larry has never intended perl to be used to write
simple windowd tools, but with perl/TNT, it fit the bill.

It is quite exciting to use NeWS with perl because NeWS is an
interpretor as well.  So perl can generate "in the fly" the NeWS code
which deals with the windowed part of the tool.  I prefer not to
imagine a program such as the one I described written with whatever X
toolkit and Display PostScript. fooey.


In fact, perl is so powerful that I am very much tempted to write
stuff I should write in C.  And I will write more in perl, if Larry
come up some day with an equivalent of the C structs, because pack()
and unpack() is an horrible kludge .  I don't know very much how Larry
could fit syntactically and semantically such an extension to the
language.



        stef

--
Stephane Payrard -- stef@eng.sun.com -- (415) 336 3726
SMI 2550 Garcia Avenue M/S 10-09  Mountain View CA 94043

                     
                     

goer@ellis.uchicago.edu (Richard L. Goerwitz) (12/08/90)

In article <STEF.90Dec7130410@zweig.exodus> stef@eng.sun.com writes:
>> 
>> According to goer@quads.uchicago.edu (Richard L. Goerwitz):
>> >Perl is not the only language around that is optimized for file,
>> >string, and symbol processing, which has associative arrays, and handles
>> >sorting and printing elegantly.  If you can't think of any examples off-
>> >hand then mail me, and I'll be glad to provide you with a few.
>> 
>> Come now, Richard.  If you criticize in public, you must put up your
>> facts in public.  Name these other languages.  Oh yes, and please
>> include availability and cost information.
>
>I agree very much with Chip; if you know a better tool than perl, you
>should not let us in the dark.

I think everyone is getting the wrong impression.  When I posted, I had
just read a description of a very specific problem.  I then read a res-
ponse in which someone declared perl uniquely able to handle it.  While
in some cases this is true, it was not true in the case I had just read
about.  The point was not that there were other tools out there which could
replace perl, but rather that certain features found in perl (e.g. good
string handling facilities, associative arrays, and what not) were by no
means unique, and that for problems which required such facilities, perl
was by no means a unique tool.

I fully expect that once perl stabilizes, and the documentation begins
to become readily accessible, it will become widely installed, and will
become the tool of choice for most tasks now whipped together using a
bunch of heterogenous tools, and glued in place with /bin/sh.  Perl is
filling a very important niche.

Please continue perling!

-Richard

les@chinet.chi.il.us (Leslie Mikesell) (12/09/90)

In article <1990Dec8.020706.28417@midway.uchicago.edu> goer@ellis.uchicago.edu (Richard L. Goerwitz) writes:
>The point was not that there were other tools out there which could
>replace perl, but rather that certain features found in perl (e.g. good
>string handling facilities, associative arrays, and what not) were by no
>means unique, and that for problems which required such facilities, perl
>was by no means a unique tool.

Ok, sticking to the text handling features relating to the original question,
there may be other languages that would easily sort text by keys.  But there
was also a mention of needing to manipulate it when a match occured.
Does anything else let you do those wonderful combination test, assign
and regexp extract like perl's:

if (($got1,$got2,$got3) =($var =~ /(pattern1) (pattern2) (pattern3))) {
  ... do whatever you want with $got1 etc.
}

Or handle multi-line regexps like this piece from the example I posted
where it takes everything between a SUMMARY: line and STATUS: line
in one item and inserts it before the STATUS: in an update which lacks
the SUMMARY information?

local ($*) = 1 ;  # multi-line match needed
[...]
# snarf summary from old - note multi-line 
if (($status) = $oitems{$oldid} =~ /(^SUMMARY:\n[^\0]*)^STATUS:/) {
# and insert into new
substr($nitems{$newid},index($nitems{$newid},"STATUS:\n"),0) = $status ;
}

Yes, you could loop over the lines (or characters) explicitly, but why?

Les Mikesell
  les@chinet.chi.il.us

goer@ellis.uchicago.edu (Richard L. Goerwitz) (12/10/90)

In article <1990Dec09.052353.18018@chinet.chi.il.us>
les@chinet.chi.il.us (Leslie Mikesell) writes:
>
>Ok, sticking to the text handling features relating to the original question,
>there may be other languages that would easily sort text by keys.  But there
>was also a mention of needing to manipulate it when a match occured.
>Does anything else let you do those wonderful combination test, assign
>and regexp extract like perl's:
>
>if (($got1,$got2,$got3) =($var =~ /(pattern1) (pattern2) (pattern3))) {
>  ... do whatever you want with $got1 etc.
>}
>
>Or handle multi-line regexps like this piece from the example I posted
>where it takes everything between a SUMMARY: line and STATUS: line
>in one item and inserts it before the STATUS: in an update which lacks
>the SUMMARY information?
>
>local ($*) = 1 ;  # multi-line match needed
>[...]
># snarf summary from old - note multi-line 
>if (($status) = $oitems{$oldid} =~ /(^SUMMARY:\n[^\0]*)^STATUS:/) {
># and insert into new
>substr($nitems{$newid},index($nitems{$newid},"STATUS:\n"),0) = $status ;
>}

Again, it's not the string processing tools that make perl unique.
It's the combination of tools and their particularly facile integration
with the operating system that make perl unique.

The regexp stuff you mention above is peanuts in languages like Snobol
and Icon.  In fact, regular expressions are felt, by Snobol and Icon
programmers, to be insufficiently powerful for the sorts of things they
do.  Multi-line matches, non-regular languages, and other bits of
trickery are the bread and butter of languages like Snobol and Icon.
Note, though, that to do the things you mention above takes more space
in at least Icon than perl - that is, if you restrict yourself to pat-
terns that can be recognized using a deterministic finite state auto-
maton.  And for this restricted pattern-type, perl will probably run
faster than Icon and Snobol (but what about Spitbol?).  There are ups
and downs to everything.

I guess what I'm saying is that statements like the one I'm responding
to above indicate that people really don't know about the grand old
tradition of nonnumeric processing we see in systems like COMIT (ee
gads), SNOBOL4, Spitbol, Icon, and offshoots like awk, nawk, and now
languages which incorporate elements from these, like perl.  I really
never wanted to get into any argument here.  I've never taken a
course from a computer science departement in my life (I'm currently
finishing up a PhD in Near Eastern Languages), and I feel out of my
element.  When people started taking me to task for saying that perl
wasn't uniquely suited to sorting, hashing, and matching tasks, I guess
I felt I had to say something.

As I've said before, perl is neat tool, and if it had no usefulness,
I would not be here.

Keep on perling!

-Richard (goer@sophist.uchicago.edu)

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (12/10/90)

Here are the three biggest things I can't really do in Perl but can do
with (some) other UNIX tools:

1. Compile some large subset of the language to portable C code.

2. Pass descriptors back and forth between programs. This is hellishly
useful for combining programs in different languages, for passing
messages securely, and for minimizing the overhead of a modular resource
controller. Practically every system in existence has some mechanism for
descriptor passing, but Perl doesn't standardize it.

3. Use signal-schedule (aka non-preemptive) threads. In various
languages I can schedule threads to execute when the program receives a
``signal''---including signals such as ``descriptor 2 is writable,''
``we have just taken control of resource x,'' etc. This makes coroutines
and multithreaded programs a joy rather than a pain to write. Different
kinds of signals are available under different UNIX variants, but Perl
could certainly standardize the basic mechanism.

If Perl had these features, my objections about portability, efficiency,
and interoperability would almost disappear.

---Dan

tchrist@convex.COM (Tom Christiansen) (12/11/90)

In article <9592:Dec920:40:5190@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>2. Pass descriptors back and forth between programs. This is hellishly
>useful for combining programs in different languages, for passing
>messages securely, and for minimizing the overhead of a modular resource
>controller. Practically every system in existence has some mechanism for
>descriptor passing, but Perl doesn't standardize it.

I'm not sure what you want here.  It's pretty easy in perl to 
connect processes through a file descriptor:	

    if (open(HANDLE, "|-")) {
        # parent code writes to HANDLE
    } else {
        # child code just reads from STDIN per usual
    }

or else:

    if (open(HANDLE, "-|")) {
        # parent code reads from HANDLE
    } else {
        # child code just writes to STDOUT per usual
    }

(I know -- I didn't check that open returned undefined.)

You can also play more elaborate games using explicit pipe() calls.
For unrelated processes, you're going to have to use named pipes
or sockets.   How does C offer a more standard mechanism for
passing descriptors which Perl can't use?

--tom
--
Tom Christiansen		tchrist@convex.com	convex!tchrist
"With a kernel dive, all things are possible, but it sure makes it hard
 to look at yourself in the mirror the next morning."  -me

allbery@NCoast.ORG (Brandon S. Allbery KB8JRR) (12/12/90)

As quoted from <1990Dec8.020706.28417@midway.uchicago.edu> by goer@ellis.uchicago.edu (Richard L. Goerwitz):
+---------------
| I fully expect that once perl stabilizes, and the documentation begins
| to become readily accessible, it will become widely installed, and will
+---------------

I daresay Perl is more widely installed than Icon.  And more widely installed
than nawk.

As far as documentation goes --- the Perl manpage was enough to get me going
in Perl.  I have yet to find Grswold&Griswold locally, and I'm not in a
position to order it from Prentice-Hall; the Icon interpreter sits, compiled
but unused, on my machine waiting for me to learn enough Icon to try to use
it.  Additionally, the number of Perl examples in the distribution is more
than enough to get one started even without the manual.  (I say "started", not
"fully knowedgeable"... but I can't even get started from the Icon examples.)

On the other hand, I want to learn Icon.  Show me something along the lines of
the Perl manpage and I'll see what I can accomplish.

++Brandon
-- 
Me: Brandon S. Allbery			    VHF/UHF: KB8JRR on 220, 2m, 440
Internet: allbery@NCoast.ORG		    Packet: KB8JRR @ WA8BXN
America OnLine: KB8JRR			    AMPR: KB8JRR.AmPR.ORG [44.70.4.88]
uunet!usenet.ins.cwru.edu!ncoast!allbery    Delphi: ALLBERY

allbery@NCoast.ORG (Brandon S. Allbery KB8JRR) (12/12/90)

As quoted from <110275@convex.convex.com> by tchrist@convex.COM (Tom Christiansen):
+---------------
| In article <9592:Dec920:40:5190@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
| >2. Pass descriptors back and forth between programs. This is hellishly
| 
| I'm not sure what you want here.  It's pretty easy in perl to 
| connect processes through a file descriptor:	
+---------------

I think he means ioctl(streamfd, I_SENDFD, fd) or the socket equivalent.

Problem is, I use plenty of machines that *don't* support it.  This is about
as portable as that alarm() replacement that uses setitimer... less so, in
fact, as SVR3 with Streams support has I_SENDFD.

++Brandon
-- 
Me: Brandon S. Allbery			    VHF/UHF: KB8JRR on 220, 2m, 440
Internet: allbery@NCoast.ORG		    Packet: KB8JRR @ WA8BXN
America OnLine: KB8JRR			    AMPR: KB8JRR.AmPR.ORG [44.70.4.88]
uunet!usenet.ins.cwru.edu!ncoast!allbery    Delphi: ALLBERY

goer@quads.uchicago.edu (Richard L. Goerwitz) (12/12/90)

In article <1990Dec12.005636.17687@NCoast.ORG>
allbery@ncoast.ORG (Brandon S. Allbery KB8JRR) writes:

>On the other hand, I want to learn Icon.  Show me something along the lines of
>the Perl manpage and I'll see what I can accomplish.

Brief overviews of Icon can be ftp'd from a number of sites, the best being
cs.arizona.edu.  Cd to icon/ and grab "technical" report 90-1, 90-2, 90-6.
Don't expect Icon to fill perl's shoes.  It's not a good system administra-
tion language.  It occupies a different niche.

-Richard

worley@compass.com (Dale Worley) (12/13/90)

   X-Name: Brandon S. Allbery KB8JRR

   I have yet to find Grswold&Griswold locally, and I'm not in a
   position to order it from Prentice-Hall;

Most bookstores will special order books.

Dale

Dale Worley		Compass, Inc.			worley@compass.com
--
The workers ceased to be afraid of the bosses.  It's as if they suddenly
threw off their chains. -- a Soviet journalist, about the Donruss coal strike

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (12/14/90)

In article <1990Dec12.010203.18075@NCoast.ORG> allbery@ncoast.ORG (Brandon S. Allbery KB8JRR) writes:
> As quoted from <110275@convex.convex.com> by tchrist@convex.COM (Tom Christiansen):
> | In article <9592:Dec920:40:5190@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> | >2. Pass descriptors back and forth between programs. This is hellishly
> | I'm not sure what you want here.  It's pretty easy in perl to 
> | connect processes through a file descriptor:	
> I think he means ioctl(streamfd, I_SENDFD, fd) or the socket equivalent.

Yes. I seem to have this huge pile of secure resource managers, all of
which create a descriptor pointing to a secure resource, then fork off a
child process with access to that descriptor. In the latest program I
tried an option for passing the descriptor up to another process, which
would take control. You can't imagine how much better life would be if
there were a standard protocol and library routine for this job.

Add non-preemptive threads to this message-passing language, and it
would finally be conceivable that UNIX system resources be implemented
in---and used by---Perl.

> Problem is, I use plenty of machines that *don't* support it.  This is about
> as portable as that alarm() replacement that uses setitimer...  less so, in
> fact, as SVR3 with Streams support has I_SENDFD.

Uh, other way around? SVR3 with Streams does indeed have I_SENDFD, which
is why descriptor passing *is* so portable.

---Dan

allbery@NCoast.ORG (Brandon S. Allbery KB8JRR) (12/16/90)

As quoted from <15024:Dec1322:59:4090@kramden.acf.nyu.edu> by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
+---------------
| In article <1990Dec12.010203.18075@NCoast.ORG> allbery@ncoast.ORG (Brandon S. Allbery KB8JRR) writes:
| > as portable as that alarm() replacement that uses setitimer...  less so, in
| > fact, as SVR3 with Streams support has I_SENDFD.
| 
| Uh, other way around? SVR3 with Streams does indeed have I_SENDFD, which
| is why descriptor passing *is* so portable.
+---------------

Oops.  Mental typo.  Yeah, but the machine I use most often doesn't have
Streams (we have the add-on package but have yet to install it because the
network board we want to use it with is so unreliable...).

Non-preemptive multithreading:  yesterday at work, I laid out a nonpreemptive
thread system of sorts.  It's not particularly easy to rewrite something big
like Perl to use the implementation I came up with, but it's there.  (I have
some fairly bizarre convolutions between a 4GL and a Prolog interpreter at
work to get a job done --- bizarre it may be, but it runs 20x faster than the
4GL-only version.  The threading is for the interface to the Prolog, so if
necessary I can have more than one running.)

++Brandon
-- 
Me: Brandon S. Allbery			    VHF/UHF: KB8JRR on 220, 2m, 440
Internet: allbery@NCoast.ORG		    Packet: KB8JRR @ WA8BXN
America OnLine: KB8JRR			    AMPR: KB8JRR.AmPR.ORG [44.70.4.88]
uunet!usenet.ins.cwru.edu!ncoast!allbery    Delphi: ALLBERY