bevan@cs.man.ac.uk (Stephen J Bevan) (03/05/91)
In the past I've written various programs to extract information from files. To do this I've used :- Common Lisp, Emacs Lisp, awk, sh, ksh and csh. As this is a bit of nightmare as regards maintenance, I'd like to move to a single language for doing these sort of tasks. The likely contenders for this seem to be Perl, Python and Icon. Rather than FTP all of them and wade through the documentation, I was wondering if anybody has experiences with them that they'd like to share? I'm particularly interested in comments from people who have used (or at least looked at) more than one of them. As a guide to the sort of things I'm interested in :- + Does the language have any arbitrary limits? e.g. the length of a line ... etc. + How fast is it? This can be compared to whatever you like, but each other preferably. I'm not really interested if XXX is only, X% quicker than YYY on average (whatever that maybe). + Does it give `reasonable' error messages? i.e. something better than the equivalent of `awk bailing out on line X'. + Does it have a debugger? If not, are there any extra facilities for debuggging above and beyond simply inserting `printf' (change as appropriate) statements. + Does it allow conditional interpretation/compilation? i.e. anything like +FEATURE in Lisp or #ifdef FEATURE/#endif in C. Some other points to note :- + The scripts won't be distributed, so arguments about XXX is installed on more machines than YYY aren't relevant. + The fact that Perl has a C like syntax is NOT an advantage in my book. (I'm not saying its a disadvantage either, I just don't think it's important either way). email/post as you think is appropriate (note the followup to comp.lang.misc). I will summarize email replies after a suitable period. Thanks in advance, Stephen J. Bevan bevan@cs.man.ac.uk
tchrist@convex.COM (Tom Christiansen) (03/07/91)
From the keyboard of bevan@cs.man.ac.uk (Stephen J Bevan): I'll answer from the questions from a perl standpoint, and let others address the others. I'm afraid I'm not eminently qualified to make comparisons, and I've only look at, not actually programmed in, the other languages mentioned. :As a guide to the sort of things I'm interested in :- : : + Does the language have any arbitrary limits? e.g. the length of a : line ... etc. This is one of perl's strong points: it has no such arbitrary limits. Any variable (including of course the current input line) can be as big as your virtual memory space allows, your regexps can be as long as you want, you can have any number of elements in your lists and tables, and binary data is handled gracefully, meaning strings with null bytes or 8-bit data don't confuse anything. If you say: $unix = `cat vmunix`; and you can malloc that much, you'll get the whole kernel in your string. : + How fast is it? This can be compared to whatever you like, but each : other preferably. I'm not really interested if XXX is only, X% : quicker than YYY on average (whatever that maybe). It really depends on what you're trying to do. For some things, Perl has a speed comparable to C: these are things that require a lot of pattern-matching. Perl's regexp facility are very rich, powerful, and highly optimized. Perl does B-M one better. Instead of just compiling your regexps, you can do something like compiling the pattern space with the study operator, which can make your program really scream. On the other hand, for most general programming, it is definitely going to be slower than C (2-5x), but faster than if you'd stitched the component sed and awk pieces together with sh. : + Does it give `reasonable' error messages? i.e. something better : than the equivalent of `awk bailing out on line X'. Perl always tells you the file and line in error, as well as printing out the next two tokens it was looking at when the parser got indigestion. It can find run-away strings or regexps (newlines aren't end of statement tokens -- they're just like spaces) and tell you where you went wrong. It doesn't just bail-out on the first error, but tries to recover and give you as much as it can find. Usually it makes a decent stab at this. Furthermore, it's easy for you do generate your own error messages, even of this form. You can get at the current errno/strerror message, the current file and line information for you (or your caller), and even any syntax or other fatal runtime errors that occurred in code protected by an exception handler. : + Does it have a debugger? If not, are there any extra facilities : for debuggging above and beyond simply inserting `printf' (change : as appropriate) statements. Yes, perl comes with a fully-featured symbolic debugger that does most of what you're used to if you're an sdb or dbx user: breakpoints, searching, tracing, examining or setting of variables, etc. In fact, the perl debugger is often used as a form of interactive perl to test out new things, since you are free to type legit (or otherwise) perl at the debugger, and you'll get immediate feedback. : + Does it allow conditional interpretation/compilation? i.e. : anything like +FEATURE in Lisp or #ifdef FEATURE/#endif in C. Sure -- perl will call cpp if you add a -P flag to it. That way you can say things like #if defined(sun) || defined(vax) An alternative is to execute system-dependent code (symlinks, dbm stuff, etc) in an active exception handler to trap any fatals that might occur if one of these isn't supported by your O/S. A few other features perhaps worth noting: + It's easy to develop library routines for run-time inclusion by your other programs. These library modules have control over visibility of identifiers so you don't accidentally stomp on commonly-used variable or function names. Scoping is in general of the dynamic nature, although the package protection mentioned above provides for a sort of static scoping on a module basis, i.e. modules can have their own private "file static" kind of identifiers. Modules can also have private initialization code. + Perl is very well integrated into the UNIX environment. It's easy to open pipes to and from other processes, set up co-routines via pipes and forks, and get at low-level things like file descriptors to dup or fcntl or ioctl. Perl already has hooks for most of the common system calls and C library routines, so you have to call fewer external programs, which speeds up your program. For system calls not covered, you can get at them via the syscall() function if your system supports it. You can even link in your own C functions if you want, such as for interacting with SQL or the like. Extensive library routines are already provided, doing things like getopts, tcsh-like completion, "infinite precision" arithmetic, termcap functions, plus a bunch more. + While only scalars (atoms) and lists are what the purists might call first-class data types (anonymous temporaries), you can pass all three basic data types (scalars, lists, tables AKA assoc arrays) as function params or return values. For other semi-data types, like filehandles, picture formats, regular expressions, and subroutines, you can use a scalar variable instead of a literal identifier and indirect through it. And of course, if all else fails, you can always use an eval. This means you can write a function that builds a new, uniquely-named function and returns you the name of the new function, assign that to a variable, and call the new function through the variable. + Other support facilities include translators for your old sed and awk scripts, a sort of "perl lint" switch (-w), and a security checking mechanism for setuid programs that catches brain-dead moves that are likely to be exploitable by a cracker, something even C can't do. + Because Perl derives so heavily from C, sed, and awk, if you already know these, your ramp-up time will be much less than if you were really starting from scratch. --tom -- I get so tired of utilities with arbitrary, undocumented, compiled-in limits. Don't you? Tom Christiansen tchrist@convex.com convex!tchrist
chrise@hpnmdla.hp.com (Chris Eich) (03/07/91)
+ Does the language have any arbitrary limits? e.g. the length of a
line ... etc.
The perl man page says:
While none of the built-in data types have any arbitrary size limits
(apart from memory size), there are still a few arbitrary limits: a
given identifier may not be longer than 255 characters; sprintf is
limited on many machines to 128 characters per field (unless the
format specifier is exactly %s); and no component of your PATH may
be longer than 255 if you use -S.
There is another limit (due to the use of yacc): expression complexity.
On my HP-UX 7.0 system (3.044 perl), the following code:
$Expr = '(1)';
while (eval '$Val = ' . $Expr) {
print $Val, "\n";
$Expr = '(' . $Expr . '+1)';
}
print $@, "\n";
demonstrates that limit like so:
1
2
3
...
140
141
142
yacc stack overflow in file (eval) at line 1, next 2 tokens "1)"
I ran into this today while trying to get around the exact same problem
in bc(1)! Larry, is this worth a mention on the man page?
Chris