jimmy@therien.cs.UAlberta.CA (Jimmy the X-man) (02/02/91)
Folks, I have been a sh/awk/sed user for a number of years; however, I was told by someone recently that perl can replace ALL of the above. Is this true ? Was this the intention of the original design ? In other words, if I concentrate in developing my perl skills, can I forget about ever using the other utilities ? I looked at the man page; it is quite detailed. However, learning this language requires a non-trivial effort and I would like to get some input from experienced users before I change directions. As a start, can anyone mail me the perl translations of the following scripts ? If anyone wants to respond, it would help if you put comments which described what a given line does. This will help me learn perl and decide if I should switch to using this language. --------------------------------------------------------------------- #!/bin/sh -v # Copy selected files to a backup directory for project quant. No # arguments required. QUA=/usr/alta/edm/jones/quant if [ ! -d $HOME/Bak_quant ];then mkdir $HOME/Bak_quant fi # find files which do NOT match files in the name-list; do not descend # into the /usr/alta/edm/jones/quant/inc directory for f in `find $QUA ! \( -name "Makefile" -o -name "Makefile.bak" \ -o -name "lib*.a" -o -name "quant" -o -name '*.o' \) \ -print -name inc -prune | sed '1d'` do cp -r $f $HOME/Bak_quant/ done ) & --------------------------------------------------------------------- A perl script needed to put single quotes in the hex numbers in a Fortran DATA statement of the form DATA A,B, /3, Z0F/, C, D, E, C /ZABC012, 10.0, ZFFF/ The DATA statement has 1 or more lines and all lines have spaces/tabs at the beginning. If a line is a continuation of the previous line, there is a character in the 6th column/field. There are many other lines of code in the program; this operation needs to be done only one the lines beginning with ^[ TAB]*DATA (line-type 1) OR lines following line-type 1 and having ANY character in column/field 6 (line-type 2). A line type 2 may have spaces/tabs between the character in column/field 6 and the following characters, if any OR line-type 2's following other line-type 2 lines Thus, the above should generate DATA A,B, /3, Z'0F'/, C, D, E, C /Z'ABC012', 10.0, Z'FFF'/ --------------------------------------------------------------------- Thanks in advance for the responses, folks. Jimmy Mason jimmy@cs.UAlberta.CA -- jimmy@cs.UAlberta.CA
tchrist@convex.COM (Tom Christiansen) (02/03/91)
From the keyboard of jimmy@therien.cs.UAlberta.CA (Jimmy the X-man): :I have been a sh/awk/sed user for a number of years; however, I was :told by someone recently that perl can replace ALL of the above. Is :this true ? Was this the intention of the original design ? In other :words, if I concentrate in developing my perl skills, can I forget :about ever using the other utilities ? :I looked at the man page; it is quite detailed. However, learning :this language requires a non-trivial effort and I would like to get :some input from experienced users before I change directions. Oh my! In some newsgroups, that's sufficient incitement to start a riot, if not an outright jihad. Of course, you've found a safe haven for such revolutionary thoughts here, so we'll be more gentle on you here. Just don't tell the folks in alt.religion.computers what heresy's afoot here. I'll leave the exposition of the originl design goals to perl's author, and give you the impressions of a mere user. Donning my vestments as advocatus diaboli, just because you learn something new, doesn't mean you should entirely forget the old. UNIX is a pluralistic environment in which many paths can lead to the solution, some more circuitously than others. Different problems can call for different solutions. If you force yourself to program in nothing but perl, you may be short-changing yourself and taking the more tortuous route for some problems. Now, that being said, I shall now reveal my true colors as perl disciple and perhaps not infrequent evangelist. Perl is without question the greatest single program to appear to the UNIX community (although it runs elsewhere too) in the last 10 years. It makes progamming fun again. It's simple enough to get a quick start on, but rich enough for some very complex tasks. I frequently learn new things about it despite having used it nearly daily since Larry first released it to the general public about four years ago or so. Heck, sometimes even Larry learns something new about perl! The Artist is not always aware of the breadth and depth of his own work. [You can skip ahead to the translation of with /^: if you want. In the next few pages I elaborate on why a programmer would want to hone his perl skills. I plagiarize myself (and one or two others) a good deal here from things I've posted earlier.] It is indeed the case that perl is a strict superset of sed and awk, so much so that s2p and a2p translators exist for these utilities. You can do anything in perl that you can do in the shell, although perl is not strictly speaking a command interpreter. It's more of a programming language. Most of us have written, or at least seen, shell scripts from hell. While often touted as one of UNIX's strengths because they're conglomerations of small, single-purpose tools, these shell scripts quickly grow complex that they're cumbersome and hard to understand, modify and maintain. After a certain point of complexity, the strength of the UNIX philosophy of having many programs that each does one thing well becomes its weakness. The big problem with piping tools together is that there is only one pipe. This means that several different data streams have to get multiplexed into a single data stream, then demuxed on the other end of the pipe. This wastes processor time as well as human brain power. For example, you might be shuffling through a pipe a list of filenames, but you also want to indicate that certain files have a particular attribute, and others don't. (E.g., certain files are more than ten days old.) Typically, this information is encoded in the data stream by appending or prepending some special marker string to the filename. This means that both the pipe feeder and the pipe reader need to know about it. Not a pretty sight. Because perl is one program rather than a dozen others (sh, awk, sed, tr, wc, sort, grep, ...), it is usually clearer to express yourself in perl than in sh and allies, and often more efficient as well. You don't need as many pipes, temporary files, or separate processes to do the job. You don't need to go shoving your data stream out to tr and back and to sed and back and to awk and back and to sort back and then back to sed and back again. Doing so can often be slow, awkward, and/or confusing. Anyone who's ever tried to pass command line arguments into a sed script of moderate complexity or above can attest to the fact that getting the quoting right is not a pleasant task. In fact, quoting in general in the shell is just not a pleasant thing to code or to read. In a heterogeneous computing environment, the available versions of many tools varies too much from one system to the next to be utterly reliable. Does your sh understand functions on all your machines? What about your awk? What about local variables? It is very difficult to do complex programming without being able to break a problem up into subproblems of lesser complexity. You're forced to resort to using the shell to call other shell scripts and allow UNIX's power of spawning processes serve as your subroutine mechanism, which is inefficient at best. That means your script will require several separate scripts to run, and getting all these installed, working, and maintained on all the different machines in your local configuration is painful. With perl, all you need do is get it installed on the system -- which is really pretty easy thanks to Larry's Configure program -- and after that you're home free. Perl is even beginning to be included by some software and hardware vendor's standard software distributions. I predict we'll see a lot more of this in the next couple years. Besides being faster, perl is a more powerful tool than sh, sed, or awk. I realize these are fighting words in some camps, but so be it. There exists a substantial niche between shell programming and C programming that perl conveniently fills. Tasks of this nature seem to arise with extreme frequency in the realm of systems administration. Since a system administrator almost invariably has far too much to do to devote a week to coding up every task before him in C, perl is especially useful for him. Larry Wall, perl's author, has been known to call it "a shell for C programmers." I like to think of it as a "BASIC for UNIX." I realize that this carries both good and bad connotations. So be it. In what ways is perl more powerful than the individual tools? This list is pretty long, so what follows is not necessarily an exhaustive list. To begin with, you don't have to worry about arbitrary and annoying restrictions on string length, input line length, or number of elements in an array. These are all virtually unlimited, i.e. limited to your system's address space and virtual memory size. Perl's regular expression handling is far and above the best I've ever seen. For one thing, you don't have to remember which tool wants which particular flavor of regular expressions, or lament that fact that one tool doesn't allow (..|..) constructs or +'s \b's or whatever. With perl, it's all the same, and as far as I can tell, a proper superset of all the others. Perl has a fully functional symbolic debugger (written, of course, in perl) that is an indispensable aid in debugging complex programs. Neither the shell nor sed/awk/sort/tr/... have such a thing. Perl has a loop control mechanism that's more powerful even than C's. You can do the equivalent of a break or continue (last and next in perl) of any arbitrary loop, not merely the nearest enclosing one. You can even do a kind of continue that doesn't trigger the re-initialization part of a loop, something you do from time to time want to do. Perl's data-types and operators are richer than the shells' or awk's, because you have scalars, numerically-indexed arrays (lists), and string-indexed (hashed) arrays. Each of these holds arbitrary data values, including floating point numbers, for which mathematic built-in subroutines and power operators are available. In can handle binary data of arbitrary size. Speaking of lisp, you can generate strings, perhaps with sprintf(), and then eval them. That way you can generate code on the fly. You can even do lambda-type functions that return newly-created functions that you can call later. The scoping of variables is dynamic, fully recursive subroutines are supported, and you can pass or return any type of data into or out of your subroutines. You have a built-in automatic formatter for generating pretty-printed forms with automatic pagination and headers and center-justified and text-filled fields like "%(|fmt)s" if you can imagine what that would actually be were it legal. There's a mechanism for writing suid programs that can be made more secure than even C programs thanks to an elaborate data-tracing mechanism that understands the "taintedness" of data derived from external sources. It won't let you do anything really stupid that you might not have thought of. You have access to just about any system-related function or system call, like ioctl's, fcntl, select, pipe and fork, getc, socket and bind and connect and attach, and indirect syscall() invocation, as well as things like getpwuid(), gethostbyname(), etc. You can read in binary data laid out by a C program or system call using structure-conversion templates. At the same time you can get at the high-level shell-type operations like the -r or -w tests on files or `backquote` command interpolation. You can do file-globbing with the <*.[ch]> notation or do low-level readdir()s as suits your fancy. Dbm files can be accessed using simple array notation. This is really nice for dealing with system databases (aliases, news, ...), efficient access mechanisms over large data-sets, and for keeping persistent data. Don't be dismayed by the apparent complexity of what I've just discussed. Perl is actually very easy to learn because so much of it derives from existing tools. It's like interpreter C with sh, sed, awk, and a lot more built in to it. There's a very considerable quantity of code out there already written in perl, including libraries to handle things you don't feel like reimplementing. :As a start, can anyone mail me the perl translations of the following :scripts ? If anyone wants to respond, it would help if you put :comments which described what a given line does. This will help me :learn perl and decide if I should switch to using this language. :#!/bin/sh -v :# Copy selected files to a backup directory for project quant. No :# arguments required. :QUA=/usr/alta/edm/jones/quant : if [ ! -d $HOME/Bak_quant ];then : mkdir $HOME/Bak_quant : fi :# find files which do NOT match files in the name-list; do not descend :# into the /usr/alta/edm/jones/quant/inc directory : for f in `find $QUA ! \( -name "Makefile" -o -name "Makefile.bak" \ : -o -name "lib*.a" -o -name "quant" -o -name '*.o' \) \ : -print -name inc -prune | sed '1d'` : do : cp -r $f $HOME/Bak_quant/ : done :) & Well, I don't konw what the trailling ") &" means -- it looks like something was truncated. This is actually something that I might well do in shell. One advantage though to using perl is that you can write a short-ciruit find in it -- it runs faster because it doesn't have to stat all the child nodes. There's a good example of this on pages 304-305 of the Camel Book (Larry and Randal's book on Perl), so I won't do that here. Instead, I'll just use what you have there and do basically a verbatim translation (untested). #!/usr/bin/perl $QUA='/usr/alta/edm/jones/quant'; # gotta love that Latin $bak = "$ENV{'HOME'}/Bak_quant"; if (! -d $bak) { mkdir($bak,0777) || die "can't mkdir $bak: $!"; } for $f (`find $QUA blah blah blah`) { chop $f; print `cp -r $f $bak`; } As you see, there's not a whole lot of difference, so I wouldn't bother, unless I were concerned about speed, in which case I'd use the fast-find mentioned above. :A perl script needed to put single quotes in the hex numbers in a :Fortran DATA statement of the form : : DATA A,B, /3, Z0F/, C, D, E, : C /ZABC012, 10.0, ZFFF/ : :The DATA statement has 1 or more lines and all lines have :spaces/tabs at the beginning. If a line is a continuation :of the previous line, there is a character in the 6th :column/field. There are many other lines of code in the program; :this operation needs to be done only one the lines beginning :with ^[ TAB]*DATA (line-type 1) : OR :lines following line-type 1 and having ANY character in :column/field 6 (line-type 2). A line type 2 may have spaces/tabs :between the character in column/field 6 and the following :characters, if any : OR :line-type 2's following other line-type 2 lines : :Thus, the above should generate : DATA A,B, /3, Z'0F'/, C, D, E, : C /Z'ABC012', 10.0, Z'FFF'/ Now, here's a problem that's more to perl's liking. Perl was designed to be a text-processing language, and while it's grown to be far more than that, able to handle files and processes and binary data as well, the degree to which your application meets this criterion will determine how good a fit perl is as a solution. I think this may do your job for you. It seemed to work on the few test cases I put together. I didn't really do that first line like that the first time. It had if clauses. I scrunched it together into ?: after I thought it worked for the sake of brevity, the soul of job security. :-) #!/usr/bin/perl -p next unless $in_data = ($in_data ? /^ {5}./ : /^(\t| {8})data/i); s%/(.*Z)([0-9A-F]+).*/%$1Z'$2'$3%gi; # joe You could make it to an in-place edit by changing the invocation line to #!/usr/bin/perl -pi.bak which would also keep a back-up for you. There are probably many other ways of writing this. If anyone read this far, perhaps they'll offer some. --tom -- "Hey, did you hear Stallman has replaced /vmunix with /vmunix.el? Now he can finally have the whole O/S built-in to his editor like he always wanted!" --me (Tom Christiansen <tchrist@convex.com>)