[comp.lang.perl] perl

tchrist@convex.COM (Tom Christiansen) (06/14/90)

In article <18498@well.sf.ca.us> gregs@well.sf.ca.us (Greg Strockbine) writes:
>I'm just starting to look at perl. Is there a good reason
>to use it instead of sed, awk, etc.?

That's a good question, the quick answer to which, IMHO, is yes.  I know
this'll probably spark yet another net jihad, but I'm nonetheless going to
try to substantiate that claim.

Most of us have written, or at least seen, shell scripts from hell.  While
often touted as one of UNIX's strengths because they're conglomerations of
small, single-purpose tools, these shell scripts quickly grow complex that
they're cumbersome and hard to understand, modify and maintain.

Because perl is one program rather than a dozen others (sh, awk, sed, tr,
wc, sort, grep, ...), it is usually clearer to express yourself in perl
than in sh and allies, and often more efficient as well.  You don't need
as many pipes, temporary files, or separate processes to do the job.  You
don't need to go shoving your data stream out to tr and back and to sed
and back and to awk and back and to sort back and then back to sed and
back again.  Doing so can often be slow, awkward, and/or confusing.

Anyone who's ever tried to pass command line arguments into a sed script
of moderate complexity or above can attest to the fact that getting the
quoting right is not a pleasant task.  In fact, quoting in general in the
shell is just not a pleasant thing to code or to read.

In a heterogeneous computing environment, the available versions of many
tools varies too much from one system to the next to be utterly reliable.
Does your sh understand functions on all your machines?  What about your
awk?  What about local variables?  It is very difficult to do complex
programming without being able to break a problem up into subproblems of
lesser complexity.  You're forced to resort to using the shell to call
other shell scripts and allow UNIX's power of spawning processes serve as
your subroutine mechanism, which is inefficient at best.  That means your
script will require several separate scripts to run, and getting all these
installed, working, and maintained on all the different machines in your
local configuration is painful.  

Maybe if nawk had been available sooner and for free and for all
architectures, I would use it for more, but it isn't free (yes, there's
gawk, but that's not been out long) and actually isn't powerful enough for
some of the things I need to do.  Perl is free, and its Configure script
has knowledge of how to compile perl for a veritable plethora of different
hardware and software platforms.

Besides being faster, perl is a more powerful tool than sh, sed, or awk.
I realize these are fighting words in some camps, but so be it.  There
exists a substantial niche between shell programming and C programming
that perl conveniently fills.  Tasks of this nature seem to arise
extremely often in the realm of systems administration.  Since a system
administrator almost invariably has far too much to do to devote a week to
coding up every task before him in C, perl is especially useful for him.
Larry Wall, perl's author, has been known to call it "a shell for C
programmers."

In what ways is perl more powerful than the individual tools?  This list
is pretty long, so what follows is not necessarily an exhaustive list.
To begin with, you don't have to worry about arbitrary and annoying
restrictions on string length, input line length, or number of elements in
an array.  These are all virtually unlimited, i.e. limited to your
system's address space and virtual memory size.

Perl's regular expression handling is far and above the best I've ever
seen.  For one thing, you don't have to remember which tool wants which
particular flavor of regular expressions, or lament that fact that one
tool doesn't allow (..|..) constructs or +'s \b's or whatever.   With
perl, it's all the same, and as far as I can tell, a proper superset of
all the others.

Perl has a fully functional symbolic debugger (written, of course, in
perl) that is an indispensable aid in debugging complex programs.  Neither
the shell nor sed/awk/sort/tr/... have such a thing.

Perl has a loop control mechanism that's more powerful even than C's.  You
can do the equivalent of a break or continue (last and next in perl) of
any arbitrary loop, not merely the nearest enclosing one.  You can even do
a kind of continue that doesn't trigger the re-initialization part of a
loop, something you do from time to time want to do.

Perl's data-types and operators are richer than the shells' or awk's,
because you have scalars, numerically-indexed arrays (lists), and
string-indexed (hashed) arrays.  Each of these holds arbitrary data
values, including floating point numbers, for which mathematic built-in
subroutines and power operators are available.

As for operators, to start with, you've got all of C's (except for
addressing operators, which aren't relevant) so unlink you don't have to
remember whether ~ or ^ or ^= or whatever are really there, as you do in
awk.  Furthermore, you've got distinct relational operators for strings
versus numeric operations: == for numeric equality (0x10 == 16) and 'eq'
for string equality ('010' ne '8'), and all the other possibilities as
well.  You've got a range operator, so you can have expressions like
(1..10) or even ('a'..'zzz'.)   You can use it to say things like
    if (/^From/ .. /^$/) { # process mail header
or 
    if (/^$/ .. eof) { # process mail body

There's a string repetition operator, so ('-' x 72) is a row of dashes.

You can operate on entire arrays conveniently, and not just with things like
push and pop and join and split, but also array slices:
    @a = @b[$i..$j];
and built-in mapcar-like abilities for arrays, like
    for (@list) { s/^foo//; }
and
    for $x (@list) { $x *= 3; }
or
    @x = grep(!/^#/, @y);

Speaking of lisp, you can generate strings, perhaps with sprintf(), and
then eval them.  That way you can generate code on the fly.  You can even
do lambda-type functions that return newly-created functions that you can
call later. The scoping of variables is dynamic, fully recursive subroutines
are supported, and you can pass or return any type of data into or out 
of your subroutines.

You have a built-in automatic formatter for generating pretty-printed
forms with automatic pagination and headers and center-justified and
text-filled fields like "%(|fmt)s" if you can imagine what that would
actually be were it legal.

There's a mechanism for writing suid programs that can be made more secure
than even C programs thanks to an elaborate data-tracing mechanism that
understands the "taintedness" of data derived from external sources.  It
won't let you do anything really stupid that you might not have thought of.

You have access to just about any system-related function or system call,
like ioctl's, fcntl, select, pipe and fork, getc, socket and bind and
connect and attach, and indirect syscall() invocation, as well as things
like getpwuid(), gethostbyname(), etc.  You can read in binary data laid
out by a C program or system call using structure-conversion templates.

At the same time you can get at the high-level shell-type operations like
the -r or -w tests on files or `backquote` command interpolation.  You can
do file-globbing with the <*.[ch]> notation or do low-level readdir()s as
suits your fancy.

Dbm files can be accessed using simple array notation.  This is really
nice for dealing with system databases (aliases, news, ...), efficient
access mechanisms over large data-sets, and for keeping persistent data.

Don't be dismayed by the apparent complexity of what I've just discussed.
Perl is actually very easy to learn because so much of it derives from 
existing tools.  It's like interpreter C with sh, sed, awk, and a lot
more built in to it.  

I hope this answers your question.

--tom
--

    Tom Christiansen                       {uunet,uiucdcs,sun}!convex!tchrist 
    Convex Computer Corporation                            tchrist@convex.COM
		 "EMACS belongs in <sys/errno.h>: Editor too big!"

wwm@pmsmam.uucp (Bill Meahan) (06/15/90)

OK, I'm convinced - where can I get the LATEST version of perl?

BTW I don't have Internet access so I can't FTP from anywhere.
I used to have a method, but it doesn't seem to work any more.
-- 
Bill Meahan  WA8TZG		uunet!mailrus!umich!pmsmam!wwm
I don't speak for Ford - the PR department does that!

"stupid cat" is unnecessarily redundant

jv@mh.nl (Johan Vromans) (06/16/90)

In article <103056@convex.convex.com> tchrist@convex.COM (Tom Christiansen) writes:
| ... lots of reasons why to use perl ...

To which I would like to add the splendid programming support
environment (including multi-window symbolic debugger) available in
GNU Emacs.

|		 "EMACS belongs in <sys/errno.h>: Editor too big!"

That's the one...

	Johan
--
Johan Vromans				       jv@mh.nl via internet backbones
Multihouse Automatisering bv		       uucp: ..!{uunet,hp4nl}!mh.nl!jv
Doesburgweg 7, 2803 PL Gouda, The Netherlands  phone/fax: +31 1820 62944/62500
------------------------ "Arms are made for hugging" -------------------------

merlyn@iwarp.intel.com (Randal Schwartz) (09/13/90)

In article <1990Sep11.211401.1556@ccu1.aukuni.ac.nz>, russell@ccu1 (Russell J Fulton;ccc032u) writes:
| I noticed the two scripts posted in response to a request for reaper programs
| were Pearl scripts. We are relatively new to UNIX an I have not come across
| Pearl before. Could some kind soul please send me a brief description of
| Pearl, and information on where to get it. (Or a pointer to where I can get
| the information.)
| 
| It looks like a powerful tool for doing admin work!

A public reply, in case there are other lurkers with the same request...

Perl is a freely available (under the GNU Copyleft) program written by
Larry Wall, the author of the 'rn' newsreader (and a prolific hacker
and writer, I might add).

Perl is a mixture of sed, awk, sh, C, and your favorite wishlist.
It's best at handling nearly any task that you would have used a
convoluted shell script for, and then some.  It's also *very*
portable, thanks to a fairly robust Configure script-- it's even
running on MS-DOS.

Well, here, let me quote the manpage...

NAME
     perl - Practical Extraction and Report Language

DESCRIPTION
     Perl is an interpreted language optimized for scanning arbi-
     trary  text  files,  extracting  information from those text
     files, and printing reports based on that information.  It's
     also  a good language for many system management tasks.  The
     language is intended to be practical  (easy  to  use,  effi-
     cient,  complete)  rather  than  beautiful  (tiny,  elegant,
     minimal).  It combines (in  the  author's  opinion,  anyway)
     some  of the best features of C, sed, awk, and sh, so people
     familiar with those languages should have little  difficulty
     with  it.  (Language historians will also note some vestiges
     of csh, Pascal,  and  even  BASIC-PLUS.)  Expression  syntax
     corresponds  quite  closely  to C expression syntax.  Unlike
     most Unix utilities, perl does  not  arbitrarily  limit  the
     size  of your data--if you've got the memory, perl can slurp
     in your whole file as a  single  string.   Recursion  is  of
     unlimited  depth.   And  the hash tables used by associative
     arrays grow as necessary to  prevent  degraded  performance.
     Perl  uses sophisticated pattern matching techniques to scan
     large amounts of data very quickly.  Although optimized  for
     scanning  text, perl can also deal with binary data, and can
     make dbm files look like associative arrays  (where  dbm  is
     available).   Setuid  perl scripts are safer than C programs
     through a dataflow tracing  mechanism  which  prevents  many
     stupid  security  holes.   If  you have a problem that would
     ordinarily use sed or awk or sh, but it exceeds their  capa-
     bilities  or must run a little faster, and you don't want to
     write the silly thing in C, then perl may be for you.  There
     are  also  translators to turn your sed and awk scripts into
     perl scripts.  OK, enough hype.


Perl can be fetched anon-ftp from devvax.jpl.nasa.gov:/pub/perl.3.0,
as well as anon-uucp from osu-cis.  Many other sites also stock Perl.
Perl was posted to comp.sources.unix a while back.

Support is excellent.  Perl has its own newsgroup "comp.lang.perl" and
mailing list "perl-users-request@virginia.edu".  Larry reads and posts
frequently, and in his spare time even answers private mail questions.
He's fast to respond to bugs/feature-wishlists.  (Perl 3.0 is already
at patchlevel 28 after having been released only 9 months ago.)

Perl comes with a 70-page manpage, and will soon be documented in a
Nutshell Handbook with over 200 pages of full examples, detailed
descriptions of operations, tutorials, and cookbook-style Perl
recipes.

Read (and post to!) comp.lang.perl for further information.

print "Just another Perl hacker,"
-- 
/=Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ==========\
| on contract to Intel's iWarp project, Beaverton, Oregon, USA, Sol III      |
| merlyn@iwarp.intel.com ...!any-MX-mailer-like-uunet!iwarp.intel.com!merlyn |
\=Cute Quote: "Welcome to Portland, Oregon, home of the California Raisins!"=/

lisch@mentor.com (Ray Lischner) (10/19/90)

Compiling Perl, patchlevel 36, on an HPollo DN4500,
running Domain/OS 10.3, using the native C compiler,
version 6.7 (316), the optimizer generates bad code
for eval.c.  Using -opt 2 works (or -W0,-opt,2 for /bin/cc).

-- 
Ray Lischner        UUCP: {uunet,apollo,decwrl}!mntgfx!lisch

david@cs.odu.edu (Wm. David Vegh) (02/27/91)

I am rather new to the perl scene and I am looking for a book/reference
that will help me learn it, any suggestions?

Also, which is the latest version of perl? (and where to ftp from...)


Thanks in advance,

	David

================
david@cs.odu.edu

emv@ox.com (Ed Vielmetti) (04/03/91)

(is there a uucp mapping project member in the house?)

i'm interested in developing code that does sanity checking on uucp
map entries.  ideally you would feed it in one of the many files in
comp.mail.maps and it would flag any errors and format the
information in the map uniformly.  it would be good to make it
interactive or batch so that errors could be fixed or so that it could
be used to explore the uucp maps.

this could well make use of a number of external databases or servers;
for instance, you'd like to get the latitude/longitude right, either
from the zip code or from the city name.  the telephone number should
match up similarly.  map entries which were too old (per the #W line)
would be flagged as such.

with the help of pathalias, it could form a nice browser; if you want
to see who is connected to who and both ends of the link, it should be
straightforward to trace the path.  

just thinking about all the possible things you might want to do, and
what all the external dbm's you would want to keep around between runs
so that you could make lookups arbitrarily quick.  say you want to
answer the query "show me all the sites in alabama connected to
uunet".  or "where is handwriting research corp".  or to the extent
that people put real information in their maps "who is running news
and mail on a mac".

thoughts?  ideas?  specs?  working code :-) ?  the geographic name
server at martini.eecs.umich.edu port 3000 is a home for some of this
information.  there has to be a telco database lying around somewhere
to get at least rough agreements on phone numbers.  i have a skeleton
command line parser that could be thrown at the interactive part.

send me mail or post, i'll summarize as needed.  if you have ideas,
comp.mail.uucp would be best; if you have code, contact me and i'll
try to coordinate mushing things together.

-- 
 Msen	Edward Vielmetti
/|---	moderator, comp.archives
	emv@msen.com

"With all of the attention and publicity focused on gigabit networks,
not much notice has been given to small and largely unfunded research
efforts which are studying innovative approaches for dealing with
technical issues within the constraints of economic science."  
							RFC 1216