[comp.unix.questions] perl

rfinch@caldwr.UUCP (Ralph Finch) (04/08/89)

Perl:

What is it?
What does it do?
How do I get it?

Thanks,
-- 
Ralph Finch
...ucbvax!ucdavis!caldwr!rfinch

gregs@well.sf.ca.us (Greg Strockbine) (06/14/90)

I'm just starting to look at perl. Is there a good reason
to use it instead of sed, awk, etc.?

brnstnd@kramden.acf.nyu.edu (06/14/90)

In article <18498@well.sf.ca.us> gregs@well.sf.ca.us (Greg Strockbine) writes:
> I'm just starting to look at perl. Is there a good reason
> to use it instead of sed, awk, etc.?

Sure: it can handle much longer lines. Much much longer lines. It can do
anything your shell can. It has a reasonably pleasant syntax.

It is, on the other hand, somewhat more difficult to program efficiently
than a judicious combination of sh, sed, and awk.

---Dan

tchrist@convex.COM (Tom Christiansen) (06/14/90)

In article <18498@well.sf.ca.us> gregs@well.sf.ca.us (Greg Strockbine) writes:
>I'm just starting to look at perl. Is there a good reason
>to use it instead of sed, awk, etc.?

That's a good question, the quick answer to which, IMHO, is yes.  I know
this'll probably spark yet another net jihad, but I'm nonetheless going to
try to substantiate that claim.

Most of us have written, or at least seen, shell scripts from hell.  While
often touted as one of UNIX's strengths because they're conglomerations of
small, single-purpose tools, these shell scripts quickly grow complex that
they're cumbersome and hard to understand, modify and maintain.

Because perl is one program rather than a dozen others (sh, awk, sed, tr,
wc, sort, grep, ...), it is usually clearer to express yourself in perl
than in sh and allies, and often more efficient as well.  You don't need
as many pipes, temporary files, or separate processes to do the job.  You
don't need to go shoving your data stream out to tr and back and to sed
and back and to awk and back and to sort back and then back to sed and
back again.  Doing so can often be slow, awkward, and/or confusing.

Anyone who's ever tried to pass command line arguments into a sed script
of moderate complexity or above can attest to the fact that getting the
quoting right is not a pleasant task.  In fact, quoting in general in the
shell is just not a pleasant thing to code or to read.

In a heterogeneous computing environment, the available versions of many
tools varies too much from one system to the next to be utterly reliable.
Does your sh understand functions on all your machines?  What about your
awk?  What about local variables?  It is very difficult to do complex
programming without being able to break a problem up into subproblems of
lesser complexity.  You're forced to resort to using the shell to call
other shell scripts and allow UNIX's power of spawning processes serve as
your subroutine mechanism, which is inefficient at best.  That means your
script will require several separate scripts to run, and getting all these
installed, working, and maintained on all the different machines in your
local configuration is painful.  

Maybe if nawk had been available sooner and for free and for all
architectures, I would use it for more, but it isn't free (yes, there's
gawk, but that's not been out long) and actually isn't powerful enough for
some of the things I need to do.  Perl is free, and its Configure script
has knowledge of how to compile perl for a veritable plethora of different
hardware and software platforms.

Besides being faster, perl is a more powerful tool than sh, sed, or awk.
I realize these are fighting words in some camps, but so be it.  There
exists a substantial niche between shell programming and C programming
that perl conveniently fills.  Tasks of this nature seem to arise
extremely often in the realm of systems administration.  Since a system
administrator almost invariably has far too much to do to devote a week to
coding up every task before him in C, perl is especially useful for him.
Larry Wall, perl's author, has been known to call it "a shell for C
programmers."

In what ways is perl more powerful than the individual tools?  This list
is pretty long, so what follows is not necessarily an exhaustive list.
To begin with, you don't have to worry about arbitrary and annoying
restrictions on string length, input line length, or number of elements in
an array.  These are all virtually unlimited, i.e. limited to your
system's address space and virtual memory size.

Perl's regular expression handling is far and above the best I've ever
seen.  For one thing, you don't have to remember which tool wants which
particular flavor of regular expressions, or lament that fact that one
tool doesn't allow (..|..) constructs or +'s \b's or whatever.   With
perl, it's all the same, and as far as I can tell, a proper superset of
all the others.

Perl has a fully functional symbolic debugger (written, of course, in
perl) that is an indispensable aid in debugging complex programs.  Neither
the shell nor sed/awk/sort/tr/... have such a thing.

Perl has a loop control mechanism that's more powerful even than C's.  You
can do the equivalent of a break or continue (last and next in perl) of
any arbitrary loop, not merely the nearest enclosing one.  You can even do
a kind of continue that doesn't trigger the re-initialization part of a
loop, something you do from time to time want to do.

Perl's data-types and operators are richer than the shells' or awk's,
because you have scalars, numerically-indexed arrays (lists), and
string-indexed (hashed) arrays.  Each of these holds arbitrary data
values, including floating point numbers, for which mathematic built-in
subroutines and power operators are available.

As for operators, to start with, you've got all of C's (except for
addressing operators, which aren't relevant) so unlink you don't have to
remember whether ~ or ^ or ^= or whatever are really there, as you do in
awk.  Furthermore, you've got distinct relational operators for strings
versus numeric operations: == for numeric equality (0x10 == 16) and 'eq'
for string equality ('010' ne '8'), and all the other possibilities as
well.  You've got a range operator, so you can have expressions like
(1..10) or even ('a'..'zzz'.)   You can use it to say things like
    if (/^From/ .. /^$/) { # process mail header
or 
    if (/^$/ .. eof) { # process mail body

There's a string repetition operator, so ('-' x 72) is a row of dashes.

You can operate on entire arrays conveniently, and not just with things like
push and pop and join and split, but also array slices:
    @a = @b[$i..$j];
and built-in mapcar-like abilities for arrays, like
    for (@list) { s/^foo//; }
and
    for $x (@list) { $x *= 3; }
or
    @x = grep(!/^#/, @y);

Speaking of lisp, you can generate strings, perhaps with sprintf(), and
then eval them.  That way you can generate code on the fly.  You can even
do lambda-type functions that return newly-created functions that you can
call later. The scoping of variables is dynamic, fully recursive subroutines
are supported, and you can pass or return any type of data into or out 
of your subroutines.

You have a built-in automatic formatter for generating pretty-printed
forms with automatic pagination and headers and center-justified and
text-filled fields like "%(|fmt)s" if you can imagine what that would
actually be were it legal.

There's a mechanism for writing suid programs that can be made more secure
than even C programs thanks to an elaborate data-tracing mechanism that
understands the "taintedness" of data derived from external sources.  It
won't let you do anything really stupid that you might not have thought of.

You have access to just about any system-related function or system call,
like ioctl's, fcntl, select, pipe and fork, getc, socket and bind and
connect and attach, and indirect syscall() invocation, as well as things
like getpwuid(), gethostbyname(), etc.  You can read in binary data laid
out by a C program or system call using structure-conversion templates.

At the same time you can get at the high-level shell-type operations like
the -r or -w tests on files or `backquote` command interpolation.  You can
do file-globbing with the <*.[ch]> notation or do low-level readdir()s as
suits your fancy.

Dbm files can be accessed using simple array notation.  This is really
nice for dealing with system databases (aliases, news, ...), efficient
access mechanisms over large data-sets, and for keeping persistent data.

Don't be dismayed by the apparent complexity of what I've just discussed.
Perl is actually very easy to learn because so much of it derives from 
existing tools.  It's like interpreter C with sh, sed, awk, and a lot
more built in to it.  

I hope this answers your question.

--tom
--

    Tom Christiansen                       {uunet,uiucdcs,sun}!convex!tchrist 
    Convex Computer Corporation                            tchrist@convex.COM
		 "EMACS belongs in <sys/errno.h>: Editor too big!"

chris@utgard.uucp (Chris Anderson) (06/14/90)

In article <18498@well.sf.ca.us> gregs@well.sf.ca.us (Greg Strockbine) writes:
>I'm just starting to look at perl. Is there a good reason
>to use it instead of sed, awk, etc.?

Absolutely!  It does much more for you than any of the other standard
utilities.  Anything you can do in them, you can do in perl... usually
faster and more portably.  It's regular expression handling is better
than that supplied with egrep or sed, it is much more efficient than
anything that I've used before for text manipulation, and you can use
it with binary files as well.

I use it a lot for systems administration duties, since the scripts 
will run without change on multiple machines (be careful, though,
perl includes functions for dealing with sockets, symbolic links, and
file locking if you compile it on a BSD machine; AT&T doesn't have those
features yet).  But you can test at runtime for missing features, so
you can still write fairly portable scripts using those functions.

Read comp.lang.perl for awhile, and don't be scared off by the syntax.

Chris


-- 
| Chris Anderson  						       |
| QMA, Inc.		        email : {csusac,sactoh0}!utgard!chris  |
|----------------------------------------------------------------------|
| My employer never listens to me, so why should he care what I say?   |

wwm@pmsmam.uucp (Bill Meahan) (06/15/90)

OK, I'm convinced - where can I get the LATEST version of perl?

BTW I don't have Internet access so I can't FTP from anywhere.
I used to have a method, but it doesn't seem to work any more.
-- 
Bill Meahan  WA8TZG		uunet!mailrus!umich!pmsmam!wwm
I don't speak for Ford - the PR department does that!

"stupid cat" is unnecessarily redundant

jv@mh.nl (Johan Vromans) (06/16/90)

In article <103056@convex.convex.com> tchrist@convex.COM (Tom Christiansen) writes:
| ... lots of reasons why to use perl ...

To which I would like to add the splendid programming support
environment (including multi-window symbolic debugger) available in
GNU Emacs.

|		 "EMACS belongs in <sys/errno.h>: Editor too big!"

That's the one...

	Johan
--
Johan Vromans				       jv@mh.nl via internet backbones
Multihouse Automatisering bv		       uucp: ..!{uunet,hp4nl}!mh.nl!jv
Doesburgweg 7, 2803 PL Gouda, The Netherlands  phone/fax: +31 1820 62944/62500
------------------------ "Arms are made for hugging" -------------------------

awol@vpnet.chi.il.us (Al Oomens) (12/21/90)

Does anyone have any documentation on the perl language? I have a copy
of perl that was ported too MS-DOS, but there was no documentation
whatsoever! If you have any documentation which you could mail, please
let me know. Please don't just mail me a large doc file, I'll respond 
by mail, that way I won't get several copies mailed to me from all over.
			Thanks!
			Al