[comp.lang.perl] Which to use :- Perl, Python, Icon, ... ?

bevan@cs.man.ac.uk (Stephen J Bevan) (03/05/91)

In the past I've written various programs to extract information from
files.  To do this I've used :- Common Lisp, Emacs Lisp, awk, sh, ksh
and csh.  As this is a bit of nightmare as regards maintenance, I'd
like to move to a single language for doing these sort of tasks.  The
likely contenders for this seem to be Perl, Python and Icon.

Rather than FTP all of them and wade through the documentation, I was
wondering if anybody has experiences with them that they'd like to
share?
I'm particularly interested in comments from people who have used (or
at least looked at) more than one of them.

As a guide to the sort of things I'm interested in :-

  + Does the language have any arbitrary limits? e.g. the length of a
    line ... etc.

  + How fast is it?  This can be compared to whatever you like, but each
    other preferably.  I'm not really interested if XXX is only, X%
    quicker than YYY on average (whatever that maybe).

  + Does it give `reasonable' error messages?  i.e. something better
    than the equivalent of `awk bailing out on line X'.

  + Does it have a debugger?  If not, are there any extra facilities
    for debuggging above and beyond simply inserting `printf' (change
    as appropriate) statements.

  + Does it allow conditional interpretation/compilation? i.e.
    anything like +FEATURE in Lisp or #ifdef FEATURE/#endif in C.

Some other points to note :-

  + The scripts won't be distributed, so arguments about XXX is
    installed on more machines than YYY aren't relevant.

  + The fact that Perl has a C like syntax is NOT an advantage in my
    book.  (I'm not saying its a disadvantage either, I just don't
    think it's important either way).

email/post as you think is appropriate (note the followup to
comp.lang.misc).  I will summarize email replies after a suitable period. 

Thanks in advance,

Stephen J. Bevan		bevan@cs.man.ac.uk

tchrist@convex.COM (Tom Christiansen) (03/07/91)

From the keyboard of bevan@cs.man.ac.uk (Stephen J Bevan):

I'll answer from the questions from a perl standpoint, and let
others address the others.  I'm afraid I'm not eminently qualified
to make comparisons, and I've only look at, not actually programmed in,
the other languages mentioned.

:As a guide to the sort of things I'm interested in :-
:
:  + Does the language have any arbitrary limits? e.g. the length of a
:    line ... etc.

This is one of perl's strong points: it has no such arbitrary limits.  Any
variable (including of course the current input line) can be as big as
your virtual memory space allows, your regexps can be as long as you want,
you can have any number of elements in your lists and tables, and binary
data is handled gracefully, meaning strings with null bytes or 8-bit data
don't confuse anything.

If you say:

    $unix = `cat vmunix`;

and you can malloc that much, you'll get the whole kernel in your string.


:  + How fast is it?  This can be compared to whatever you like, but each
:    other preferably.  I'm not really interested if XXX is only, X%
:    quicker than YYY on average (whatever that maybe).

It really depends on what you're trying to do.  For some things, Perl
has a speed comparable to C: these are things that require a lot of
pattern-matching.  Perl's regexp facility are very rich, powerful, and
highly optimized.  Perl does B-M one better.  Instead of just compiling
your regexps, you can do something like compiling the pattern space with
the study operator, which can make your program really scream.  On the
other hand, for most general programming, it is definitely going to be
slower than C (2-5x), but faster than if you'd stitched the component sed
and awk pieces together with sh.


:  + Does it give `reasonable' error messages?  i.e. something better
:    than the equivalent of `awk bailing out on line X'.

Perl always tells you the file and line in error, as well as printing out
the next two tokens it was looking at when the parser got indigestion.
It can find run-away strings or regexps (newlines aren't end of statement
tokens -- they're just like spaces) and tell you where you went wrong.
It doesn't just bail-out on the first error, but tries to recover 
and give you as much as it can find.  Usually it makes a decent stab
at this.

Furthermore, it's easy for you do generate your own error messages, even
of this form.  You can get at the current errno/strerror message, the
current file and line information for you (or your caller), and even any
syntax or other fatal runtime errors that occurred in code protected by an
exception handler.

:  + Does it have a debugger?  If not, are there any extra facilities
:    for debuggging above and beyond simply inserting `printf' (change
:    as appropriate) statements.

Yes, perl comes with a fully-featured symbolic debugger that does most
of what you're used to if you're an sdb or dbx user: breakpoints,
searching, tracing, examining or setting of variables, etc.  In fact,
the perl debugger is often used as a form of interactive perl to test
out new things, since you are free to type legit (or otherwise) perl 
at the debugger, and you'll get immediate feedback.

:  + Does it allow conditional interpretation/compilation? i.e.
:    anything like +FEATURE in Lisp or #ifdef FEATURE/#endif in C.

Sure -- perl will call cpp if you add a -P flag to it.  That way
you can say things like

#if defined(sun) || defined(vax)

An alternative is to execute system-dependent code (symlinks, dbm
stuff, etc) in an active exception handler to trap any fatals that
might occur if one of these isn't supported by your O/S.

A few other features perhaps worth noting:

+  It's easy to develop library routines for run-time inclusion by your 
   other programs.  These library modules have control over visibility of
   identifiers so you don't accidentally stomp on commonly-used variable
   or function names.  Scoping is in general of the dynamic nature,
   although the package protection mentioned above provides for a sort of
   static scoping on a module basis, i.e. modules can have their own
   private "file static" kind of identifiers.  Modules can also have
   private initialization code.

+  Perl is very well integrated into the UNIX environment.  It's easy 
   to open pipes to and from other processes, set up co-routines via pipes
   and forks, and get at low-level things like file descriptors to dup or
   fcntl or ioctl.  Perl already has hooks for most of the common system
   calls and C library routines, so you have to call fewer external
   programs, which speeds up your program.  For system calls not covered,
   you can get at them via the syscall() function if your system supports
   it.  You can even link in your own C functions if you want, such as for
   interacting with SQL or the like.  Extensive library routines are
   already provided, doing things like getopts, tcsh-like completion, 
   "infinite precision" arithmetic, termcap functions, plus a bunch more.

+  While only scalars (atoms) and lists are what the purists might call
   first-class data types (anonymous temporaries), you can pass all three
   basic data types (scalars, lists, tables AKA assoc arrays) as function
   params or return values.  For other semi-data types, like filehandles,
   picture formats, regular expressions, and subroutines, you can use a
   scalar variable instead of a literal identifier and indirect through
   it.  And of course, if all else fails, you can always use an eval.
   This means you can write a function that builds a new, uniquely-named
   function and returns you the name of the new function, assign that to a
   variable, and call the new function through the variable.

+  Other support facilities include translators for your old sed and awk 
   scripts, a sort of "perl lint" switch (-w), and a security checking
   mechanism for setuid programs that catches brain-dead moves that are
   likely to be exploitable by a cracker, something even C can't do.

+  Because Perl derives so heavily from C, sed, and awk, if you already 
   know these, your ramp-up time will be much less than if you were
   really starting from scratch.

--tom
--
	I get so tired of utilities with arbitrary, undocumented,
	compiled-in limits.  Don't you?

Tom Christiansen		tchrist@convex.com	convex!tchrist

chrise@hpnmdla.hp.com (Chris Eich) (03/07/91)

  + Does the language have any arbitrary limits? e.g. the length of a
    line ... etc.

The perl man page says:

    While none of the built-in data types have any arbitrary size limits
    (apart from memory size), there are still a few arbitrary limits: a
    given identifier may not be longer than 255 characters; sprintf is
    limited on many machines to 128 characters per field (unless the
    format specifier is exactly %s); and no component of your PATH may
    be longer than 255 if you use -S.

There is another limit (due to the use of yacc):  expression complexity.

On my HP-UX 7.0 system (3.044 perl), the following code:

    $Expr = '(1)';
    while (eval '$Val = ' . $Expr) {
	print $Val, "\n";
	$Expr = '(' . $Expr . '+1)';
    }
    print $@, "\n";

demonstrates that limit like so:

    1
    2
    3
    ...
    140
    141
    142
    yacc stack overflow in file (eval) at line 1, next 2 tokens "1)"

I ran into this today while trying to get around the exact same problem
in bc(1)!  Larry, is this worth a mention on the man page?

Chris