[comp.lang.perl] Should I SWITCH to perl ?

jimmy@therien.cs.UAlberta.CA (Jimmy the X-man) (02/02/91)

Folks,

I have been a sh/awk/sed user for a number of years; however, I was 
told by someone recently that perl can replace ALL of the above. Is
this true ? Was this the intention of the original design ? In other
words, if I concentrate in developing my perl skills, can I forget
about ever using the other utilities ?

I looked at the man page; it is quite detailed. However, learning 
this language requires a non-trivial effort and I would like to get
some input from experienced users before I change directions.

As a start, can anyone mail me the perl translations of the following
scripts ?  If anyone wants to respond, it would help if you put
comments which described what a given line does. This will help me
learn perl and decide if I should switch to using this language.

---------------------------------------------------------------------
#!/bin/sh -v
# Copy selected files to a backup directory for project quant. No
# arguments required.
QUA=/usr/alta/edm/jones/quant
   if [ ! -d $HOME/Bak_quant ];then
   mkdir $HOME/Bak_quant
   fi
# find files which do NOT match files in the name-list; do not descend
# into the /usr/alta/edm/jones/quant/inc directory
   for f in `find $QUA ! \(  -name "Makefile" -o -name "Makefile.bak" \
             -o -name "lib*.a" -o -name "quant" -o -name '*.o' \) \
             -print -name inc -prune | sed '1d'`
   do
   cp -r $f $HOME/Bak_quant/
   done
) &

---------------------------------------------------------------------

A perl script needed to put single quotes in the hex numbers in a
Fortran DATA statement of the form

       DATA A,B, /3, Z0F/, C, D, E,
     C /ZABC012, 10.0, ZFFF/

The DATA statement has 1 or more lines and all lines have
spaces/tabs at the beginning. If a line is a continuation
of the previous line, there is a character in the 6th 
column/field. There are many other lines of code in the program;
this operation needs to be done only one the lines beginning
with ^[ TAB]*DATA (line-type 1) 
      OR 
lines following line-type 1 and having ANY character in 
column/field 6 (line-type 2). A line type 2 may have spaces/tabs
between the character in column/field 6 and the following 
characters, if any
      OR 
line-type 2's following other line-type 2 lines

Thus, the above should generate
       DATA A,B, /3, Z'0F'/, C, D, E,
     C /Z'ABC012', 10.0, Z'FFF'/

---------------------------------------------------------------------

Thanks in advance for the responses, folks.


Jimmy Mason
jimmy@cs.UAlberta.CA
--
jimmy@cs.UAlberta.CA

tchrist@convex.COM (Tom Christiansen) (02/03/91)

From the keyboard of jimmy@therien.cs.UAlberta.CA (Jimmy the X-man):
:I have been a sh/awk/sed user for a number of years; however, I was 
:told by someone recently that perl can replace ALL of the above. Is
:this true ? Was this the intention of the original design ? In other
:words, if I concentrate in developing my perl skills, can I forget
:about ever using the other utilities ?

:I looked at the man page; it is quite detailed. However, learning 
:this language requires a non-trivial effort and I would like to get
:some input from experienced users before I change directions.

Oh my!  In some newsgroups, that's sufficient incitement to start a riot,
if not an outright jihad.  Of course, you've found a safe haven for such
revolutionary thoughts here, so we'll be more gentle on you here.  Just
don't tell the folks in alt.religion.computers what heresy's afoot here.

I'll leave the exposition of the originl design goals to perl's author,
and give you the impressions of a mere user.

Donning my vestments as advocatus diaboli, just because you learn
something new, doesn't mean you should entirely forget the old.  UNIX is a
pluralistic environment in which many paths can lead to the solution, some
more circuitously than others.  Different problems can call for different
solutions.  If you force yourself to program in nothing but perl, you may
be short-changing yourself and taking the more tortuous route for some
problems.

Now, that being said, I shall now reveal my true colors as perl disciple
and perhaps not infrequent evangelist.  Perl is without question the
greatest single program to appear to the UNIX community (although it runs
elsewhere too) in the last 10 years.  It makes progamming fun again.  It's
simple enough to get a quick start on, but rich enough for some very
complex tasks.  I frequently learn new things about it despite having used
it nearly daily since Larry first released it to the general public about
four years ago or so.  Heck, sometimes even Larry learns something new
about perl!  The Artist is not always aware of the breadth and depth of
his own work.

[You can skip ahead to the translation of with /^: if you want.  In the
 next few pages I elaborate on why a programmer would want to hone his
 perl skills.  I plagiarize myself (and one or two others) a good deal
 here from things I've posted earlier.]

It is indeed the case that perl is a strict superset of sed and awk, so
much so that s2p and a2p translators exist for these utilities.  You can
do anything in perl that you can do in the shell, although perl is not
strictly speaking a command interpreter.  It's more of a programming
language.

Most of us have written, or at least seen, shell scripts from hell.  While
often touted as one of UNIX's strengths because they're conglomerations of
small, single-purpose tools, these shell scripts quickly grow complex that
they're cumbersome and hard to understand, modify and maintain.  After
a certain point of complexity, the strength of the UNIX philosophy of
having many programs that each does one thing well becomes its weakness.

The big problem with piping tools together is that there is only one
pipe.  This means that several different data streams have to get
multiplexed into a single data stream, then demuxed on the other end of
the pipe.  This wastes processor time as well as human brain power.

For example, you might be shuffling through a pipe a list of filenames,
but you also want to indicate that certain files have a particular
attribute, and others don't.  (E.g., certain files are more than ten
days old.)  Typically, this information is encoded in the data stream
by appending or prepending some special marker string to the filename.
This means that both the pipe feeder and the pipe reader need to know
about it.  Not a pretty sight.

Because perl is one program rather than a dozen others (sh, awk, sed, tr,
wc, sort, grep, ...), it is usually clearer to express yourself in perl
than in sh and allies, and often more efficient as well.  You don't need
as many pipes, temporary files, or separate processes to do the job.  You
don't need to go shoving your data stream out to tr and back and to sed
and back and to awk and back and to sort back and then back to sed and
back again.  Doing so can often be slow, awkward, and/or confusing.

Anyone who's ever tried to pass command line arguments into a sed script
of moderate complexity or above can attest to the fact that getting the
quoting right is not a pleasant task.  In fact, quoting in general in the
shell is just not a pleasant thing to code or to read.

In a heterogeneous computing environment, the available versions of many
tools varies too much from one system to the next to be utterly reliable.
Does your sh understand functions on all your machines?  What about your
awk?  What about local variables?  It is very difficult to do complex
programming without being able to break a problem up into subproblems of
lesser complexity.  You're forced to resort to using the shell to call
other shell scripts and allow UNIX's power of spawning processes serve as
your subroutine mechanism, which is inefficient at best.  That means your
script will require several separate scripts to run, and getting all these
installed, working, and maintained on all the different machines in your
local configuration is painful.  With perl, all you need do is get
it installed on the system -- which is really pretty easy thanks to
Larry's Configure program -- and after that you're home free.

Perl is even beginning to be included by some software and hardware
vendor's standard software distributions.  I predict we'll see a lot
more of this in the next couple years.

Besides being faster, perl is a more powerful tool than sh, sed, or awk.
I realize these are fighting words in some camps, but so be it.  There
exists a substantial niche between shell programming and C programming
that perl conveniently fills.  Tasks of this nature seem to arise with
extreme frequency in the realm of systems administration.  Since a system
administrator almost invariably has far too much to do to devote a week to
coding up every task before him in C, perl is especially useful for him.
Larry Wall, perl's author, has been known to call it "a shell for C
programmers."  I like to think of it as a "BASIC for UNIX."  I realize
that this carries both good and bad connotations.  So be it.

In what ways is perl more powerful than the individual tools?  This list
is pretty long, so what follows is not necessarily an exhaustive list.
To begin with, you don't have to worry about arbitrary and annoying
restrictions on string length, input line length, or number of elements in
an array.  These are all virtually unlimited, i.e. limited to your
system's address space and virtual memory size.

Perl's regular expression handling is far and above the best I've ever
seen.  For one thing, you don't have to remember which tool wants which
particular flavor of regular expressions, or lament that fact that one
tool doesn't allow (..|..) constructs or +'s \b's or whatever.   With
perl, it's all the same, and as far as I can tell, a proper superset of
all the others.

Perl has a fully functional symbolic debugger (written, of course, in
perl) that is an indispensable aid in debugging complex programs.  Neither
the shell nor sed/awk/sort/tr/... have such a thing.

Perl has a loop control mechanism that's more powerful even than C's.  You
can do the equivalent of a break or continue (last and next in perl) of
any arbitrary loop, not merely the nearest enclosing one.  You can even do
a kind of continue that doesn't trigger the re-initialization part of a
loop, something you do from time to time want to do.

Perl's data-types and operators are richer than the shells' or awk's,
because you have scalars, numerically-indexed arrays (lists), and
string-indexed (hashed) arrays.  Each of these holds arbitrary data
values, including floating point numbers, for which mathematic built-in
subroutines and power operators are available.  In can handle
binary data of arbitrary size.

Speaking of lisp, you can generate strings, perhaps with sprintf(), and
then eval them.  That way you can generate code on the fly.  You can even
do lambda-type functions that return newly-created functions that you can
call later. The scoping of variables is dynamic, fully recursive subroutines
are supported, and you can pass or return any type of data into or out
of your subroutines.

You have a built-in automatic formatter for generating pretty-printed
forms with automatic pagination and headers and center-justified and
text-filled fields like "%(|fmt)s" if you can imagine what that would
actually be were it legal.

There's a mechanism for writing suid programs that can be made more secure
than even C programs thanks to an elaborate data-tracing mechanism that
understands the "taintedness" of data derived from external sources.  It
won't let you do anything really stupid that you might not have thought of.

You have access to just about any system-related function or system call,
like ioctl's, fcntl, select, pipe and fork, getc, socket and bind and
connect and attach, and indirect syscall() invocation, as well as things
like getpwuid(), gethostbyname(), etc.  You can read in binary data laid
out by a C program or system call using structure-conversion templates.

At the same time you can get at the high-level shell-type operations like
the -r or -w tests on files or `backquote` command interpolation.  You can
do file-globbing with the <*.[ch]> notation or do low-level readdir()s as
suits your fancy.

Dbm files can be accessed using simple array notation.  This is really
nice for dealing with system databases (aliases, news, ...), efficient
access mechanisms over large data-sets, and for keeping persistent data.

Don't be dismayed by the apparent complexity of what I've just discussed.
Perl is actually very easy to learn because so much of it derives from
existing tools.  It's like interpreter C with sh, sed, awk, and a lot
more built in to it.  There's a very considerable quantity of code out 
there already written in perl, including libraries to handle things
you don't feel like reimplementing.  

:As a start, can anyone mail me the perl translations of the following
:scripts ?  If anyone wants to respond, it would help if you put
:comments which described what a given line does. This will help me
:learn perl and decide if I should switch to using this language.

:#!/bin/sh -v
:# Copy selected files to a backup directory for project quant. No
:# arguments required.
:QUA=/usr/alta/edm/jones/quant
:   if [ ! -d $HOME/Bak_quant ];then
:   mkdir $HOME/Bak_quant
:   fi
:# find files which do NOT match files in the name-list; do not descend
:# into the /usr/alta/edm/jones/quant/inc directory
:   for f in `find $QUA ! \(  -name "Makefile" -o -name "Makefile.bak" \
:             -o -name "lib*.a" -o -name "quant" -o -name '*.o' \) \
:             -print -name inc -prune | sed '1d'`
:   do
:   cp -r $f $HOME/Bak_quant/
:   done
:) &

Well, I don't konw what the trailling ") &" means -- it looks like
something was truncated.

This is actually something that I might well do in shell.  One advantage
though to using perl is that you can write a short-ciruit find in it -- it
runs faster because it doesn't have to stat all the child nodes.  There's
a good example of this on pages 304-305 of the Camel Book (Larry and
Randal's book on Perl), so I won't do that here.  Instead, I'll just use
what you have there and do basically a verbatim translation (untested).

    #!/usr/bin/perl
    $QUA='/usr/alta/edm/jones/quant';  # gotta love that Latin
    $bak = "$ENV{'HOME'}/Bak_quant";
    if (! -d $bak) {
	mkdir($bak,0777) || die "can't mkdir $bak: $!";
    } 
    for $f (`find $QUA blah blah blah`) {
	chop $f;
	print `cp -r $f $bak`;
    } 

As you see, there's not a whole lot of difference, so I wouldn't bother,
unless I were concerned about speed, in which case I'd use the fast-find
mentioned above.

:A perl script needed to put single quotes in the hex numbers in a
:Fortran DATA statement of the form
:
:       DATA A,B, /3, Z0F/, C, D, E,
:     C /ZABC012, 10.0, ZFFF/
:
:The DATA statement has 1 or more lines and all lines have
:spaces/tabs at the beginning. If a line is a continuation
:of the previous line, there is a character in the 6th 
:column/field. There are many other lines of code in the program;
:this operation needs to be done only one the lines beginning
:with ^[ TAB]*DATA (line-type 1) 
:      OR 
:lines following line-type 1 and having ANY character in 
:column/field 6 (line-type 2). A line type 2 may have spaces/tabs
:between the character in column/field 6 and the following 
:characters, if any
:      OR 
:line-type 2's following other line-type 2 lines
:
:Thus, the above should generate
:       DATA A,B, /3, Z'0F'/, C, D, E,
:     C /Z'ABC012', 10.0, Z'FFF'/

Now, here's a problem that's more to perl's liking.  Perl was designed to
be a text-processing language, and while it's grown to be far more than
that, able to handle files and processes and binary data as well, the
degree to which your application meets this criterion will determine how
good a fit perl is as a solution.

I think this may do your job for you.  It seemed to work on the few test
cases I put together.  I didn't really do that first line like that the
first time.  It had if clauses.  I scrunched it together into ?: after I
thought it worked for the sake of brevity, the soul of job security. :-)

    #!/usr/bin/perl -p
    next unless $in_data = ($in_data ? /^ {5}./ : /^(\t| {8})data/i);
    s%/(.*Z)([0-9A-F]+).*/%$1Z'$2'$3%gi; # joe

You could make it to an in-place edit by changing the invocation line to

    #!/usr/bin/perl -pi.bak

which would also keep a back-up for you.  There are probably many other
ways of writing this.  If anyone read this far, perhaps they'll offer some.

--tom
--
"Hey, did you hear Stallman has replaced /vmunix with /vmunix.el?  Now
 he can finally have the whole O/S built-in to his editor like he
 always wanted!" --me (Tom Christiansen <tchrist@convex.com>)