[comp.lang.misc] Which to use :- Perl, Python, Icon, ... ?

bevan@cs.man.ac.uk (Stephen J Bevan) (03/05/91)

In the past I've written various programs to extract information from
files.  To do this I've used :- Common Lisp, Emacs Lisp, awk, sh, ksh
and csh.  As this is a bit of nightmare as regards maintenance, I'd
like to move to a single language for doing these sort of tasks.  The
likely contenders for this seem to be Perl, Python and Icon.

Rather than FTP all of them and wade through the documentation, I was
wondering if anybody has experiences with them that they'd like to
share?
I'm particularly interested in comments from people who have used (or
at least looked at) more than one of them.

As a guide to the sort of things I'm interested in :-

  + Does the language have any arbitrary limits? e.g. the length of a
    line ... etc.

  + How fast is it?  This can be compared to whatever you like, but each
    other preferably.  I'm not really interested if XXX is only, X%
    quicker than YYY on average (whatever that maybe).

  + Does it give `reasonable' error messages?  i.e. something better
    than the equivalent of `awk bailing out on line X'.

  + Does it have a debugger?  If not, are there any extra facilities
    for debuggging above and beyond simply inserting `printf' (change
    as appropriate) statements.

  + Does it allow conditional interpretation/compilation? i.e.
    anything like +FEATURE in Lisp or #ifdef FEATURE/#endif in C.

Some other points to note :-

  + The scripts won't be distributed, so arguments about XXX is
    installed on more machines than YYY aren't relevant.

  + The fact that Perl has a C like syntax is NOT an advantage in my
    book.  (I'm not saying its a disadvantage either, I just don't
    think it's important either way).

email/post as you think is appropriate (note the followup to
comp.lang.misc).  I will summarize email replies after a suitable period. 

Thanks in advance,

Stephen J. Bevan		bevan@cs.man.ac.uk

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (03/06/91)

In article <BEVAN.91Mar5123224@tiger.cs.man.ac.uk> bevan@cs.man.ac.uk (Stephen J Bevan) writes:
>   + Does the language have any arbitrary limits? e.g. the length of a
>     line ... etc.

*Languages* rarely set arbitrary (semantic) limits. I regularly write
text-processing utilities in C that accept lines of any length. All the
GNU utilities do too.

Yes, all of the languages in question were designed for text processing,
and have the unrestricted builtins you want.

>   + How fast is it?  This can be compared to whatever you like, but each
>     other preferably.  I'm not really interested if XXX is only, X%
>     quicker than YYY on average (whatever that maybe).

All these languages are interpreters and hence are relatively slow. The
exact answer depends on your application.

>   + Does it give `reasonable' error messages?  i.e. something better
>     than the equivalent of `awk bailing out on line X'.

I have no idea what you mean by ``reasonable.''

>   + Does it have a debugger?  If not, are there any extra facilities
>     for debuggging above and beyond simply inserting `printf' (change
>     as appropriate) statements.

Perl does. I haven't seen one for Icon, and I'm not yet familiar enough
with the Python package.

>   + Does it allow conditional interpretation/compilation? i.e.
>     anything like +FEATURE in Lisp or #ifdef FEATURE/#endif in C.

Who cares? We're talking about interpreters.

---Dan

guido@cwi.nl (Guido van Rossum) (03/06/91)

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:

>>   + Does it have a debugger?  If not, are there any extra facilities
>>     for debuggging above and beyond simply inserting `printf' (change
>>     as appropriate) statements.

>Perl does. I haven't seen one for Icon, and I'm not yet familiar enough
>with the Python package.

Python currently has some debugging support: when an unhandled
exception occurs, it prints a stack backtrace showing source code
lines.  (The backtrace is arguably upside-down, but then most stacks
are...)

Interactively, you can then use a traceback module (not yet
documented, but supplied with the distribution: lib/tb.py) which
allows you to inspect the stack frames: print local and global
variables, and even execute statements in the context of a given stack
frame.

There is also a disassembly module for the bytecode used by the
Python interpreter (lib/dis.py).

There is no single-step or breakpoint facility, but it should not be
hard to implement this in the Python interpreter -- I may do so in the
next release.  (You can force a dump at a particular place by writing
"raise SystemError", but you can't continue yet.)

There is rudimentary support for reloading modules after you've
editing them without leaving the interpreter, but there are problems
that I haven't fully solved yet: you may or may not have to re-execute
the initialization code of other modules that reference the reloaded
module, depending on what use the other module makes of the reloaded
module.

But actually, I do most of my debugging by inserting print statements
in the rare cases where the stack backtrace doesn't immediately show
what went wrong.

--Guido van Rossum, CWI, Amsterdam <guido@cwi.nl>
"It's probably pining for the fiords"

tchrist@convex.COM (Tom Christiansen) (03/07/91)

From the keyboard of bevan@cs.man.ac.uk (Stephen J Bevan):

I'll answer from the questions from a perl standpoint, and let
others address the others.  I'm afraid I'm not eminently qualified
to make comparisons, and I've only look at, not actually programmed in,
the other languages mentioned.

:As a guide to the sort of things I'm interested in :-
:
:  + Does the language have any arbitrary limits? e.g. the length of a
:    line ... etc.

This is one of perl's strong points: it has no such arbitrary limits.  Any
variable (including of course the current input line) can be as big as
your virtual memory space allows, your regexps can be as long as you want,
you can have any number of elements in your lists and tables, and binary
data is handled gracefully, meaning strings with null bytes or 8-bit data
don't confuse anything.

If you say:

    $unix = `cat vmunix`;

and you can malloc that much, you'll get the whole kernel in your string.


:  + How fast is it?  This can be compared to whatever you like, but each
:    other preferably.  I'm not really interested if XXX is only, X%
:    quicker than YYY on average (whatever that maybe).

It really depends on what you're trying to do.  For some things, Perl
has a speed comparable to C: these are things that require a lot of
pattern-matching.  Perl's regexp facility are very rich, powerful, and
highly optimized.  Perl does B-M one better.  Instead of just compiling
your regexps, you can do something like compiling the pattern space with
the study operator, which can make your program really scream.  On the
other hand, for most general programming, it is definitely going to be
slower than C (2-5x), but faster than if you'd stitched the component sed
and awk pieces together with sh.


:  + Does it give `reasonable' error messages?  i.e. something better
:    than the equivalent of `awk bailing out on line X'.

Perl always tells you the file and line in error, as well as printing out
the next two tokens it was looking at when the parser got indigestion.
It can find run-away strings or regexps (newlines aren't end of statement
tokens -- they're just like spaces) and tell you where you went wrong.
It doesn't just bail-out on the first error, but tries to recover 
and give you as much as it can find.  Usually it makes a decent stab
at this.

Furthermore, it's easy for you do generate your own error messages, even
of this form.  You can get at the current errno/strerror message, the
current file and line information for you (or your caller), and even any
syntax or other fatal runtime errors that occurred in code protected by an
exception handler.

:  + Does it have a debugger?  If not, are there any extra facilities
:    for debuggging above and beyond simply inserting `printf' (change
:    as appropriate) statements.

Yes, perl comes with a fully-featured symbolic debugger that does most
of what you're used to if you're an sdb or dbx user: breakpoints,
searching, tracing, examining or setting of variables, etc.  In fact,
the perl debugger is often used as a form of interactive perl to test
out new things, since you are free to type legit (or otherwise) perl 
at the debugger, and you'll get immediate feedback.

:  + Does it allow conditional interpretation/compilation? i.e.
:    anything like +FEATURE in Lisp or #ifdef FEATURE/#endif in C.

Sure -- perl will call cpp if you add a -P flag to it.  That way
you can say things like

#if defined(sun) || defined(vax)

An alternative is to execute system-dependent code (symlinks, dbm
stuff, etc) in an active exception handler to trap any fatals that
might occur if one of these isn't supported by your O/S.

A few other features perhaps worth noting:

+  It's easy to develop library routines for run-time inclusion by your 
   other programs.  These library modules have control over visibility of
   identifiers so you don't accidentally stomp on commonly-used variable
   or function names.  Scoping is in general of the dynamic nature,
   although the package protection mentioned above provides for a sort of
   static scoping on a module basis, i.e. modules can have their own
   private "file static" kind of identifiers.  Modules can also have
   private initialization code.

+  Perl is very well integrated into the UNIX environment.  It's easy 
   to open pipes to and from other processes, set up co-routines via pipes
   and forks, and get at low-level things like file descriptors to dup or
   fcntl or ioctl.  Perl already has hooks for most of the common system
   calls and C library routines, so you have to call fewer external
   programs, which speeds up your program.  For system calls not covered,
   you can get at them via the syscall() function if your system supports
   it.  You can even link in your own C functions if you want, such as for
   interacting with SQL or the like.  Extensive library routines are
   already provided, doing things like getopts, tcsh-like completion, 
   "infinite precision" arithmetic, termcap functions, plus a bunch more.

+  While only scalars (atoms) and lists are what the purists might call
   first-class data types (anonymous temporaries), you can pass all three
   basic data types (scalars, lists, tables AKA assoc arrays) as function
   params or return values.  For other semi-data types, like filehandles,
   picture formats, regular expressions, and subroutines, you can use a
   scalar variable instead of a literal identifier and indirect through
   it.  And of course, if all else fails, you can always use an eval.
   This means you can write a function that builds a new, uniquely-named
   function and returns you the name of the new function, assign that to a
   variable, and call the new function through the variable.

+  Other support facilities include translators for your old sed and awk 
   scripts, a sort of "perl lint" switch (-w), and a security checking
   mechanism for setuid programs that catches brain-dead moves that are
   likely to be exploitable by a cracker, something even C can't do.

+  Because Perl derives so heavily from C, sed, and awk, if you already 
   know these, your ramp-up time will be much less than if you were
   really starting from scratch.

--tom
--
	I get so tired of utilities with arbitrary, undocumented,
	compiled-in limits.  Don't you?

Tom Christiansen		tchrist@convex.com	convex!tchrist

peter@ficc.ferranti.com (Peter da Silva) (03/07/91)

In article <BEVAN.91Mar5123224@tiger.cs.man.ac.uk> bevan@cs.man.ac.uk (Stephen J Bevan) writes:
> As this is a bit of nightmare as regards maintenance, I'd
> like to move to a single language for doing these sort of tasks.  The
> likely contenders for this seem to be Perl, Python and Icon.

You should probably consider TCL as well. I'm not familiar with Python
and only have a reading-the-manuals acquaintance with Perl and a bit more
with Icon. Icon is certainly the most "real" language, in terms of having a
consistent design, of any of these. TCL has the advantage that it is almost
trivial to extend it with C code, and merge it with existing C applications.
So...

>   + Does the language have any arbitrary limits? e.g. the length of a
>     line ... etc.

No.

>   + How fast is it?  This can be compared to whatever you like, but each
>     other preferably.  I'm not really interested if XXX is only, X%
>     quicker than YYY on average (whatever that maybe).

Well, we replaced some shell scripts with TCL scripts here and got on the
order of a factor of a hundred speedup. So I'd say it's "fast enough". I
suspect that the other languages you're considering are on the same order of
magnitude. 

>   + Does it give `reasonable' error messages?  i.e. something better
>     than the equivalent of `awk bailing out on line X'.

You can get a complete call trace of the error if that's your thing.

>   + Does it have a debugger?  If not, are there any extra facilities
>     for debuggging above and beyond simply inserting `printf' (change
>     as appropriate) statements.

You can trace code execution on a statement by statement basis. You can also
redefine "proc" to put whatever code wrappers you want around procedures,
and redefine any of the functions in the language similarly.

>   + Does it allow conditional interpretation/compilation? i.e.
>     anything like +FEATURE in Lisp or #ifdef FEATURE/#endif in C.

Yes, since it simply executes the input file you can put a proc inside
an if with no problem.

>   + The fact that Perl has a C like syntax is NOT an advantage in my
>     book.  (I'm not saying its a disadvantage either, I just don't
>     think it's important either way).

TCL is like a text-oriented Lisp, but lets you write algebraic expressions
for simplicity and to avoid scaring people away.
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

cjeffery@cs.arizona.edu (Clinton Jeffery) (03/08/91)

From article <BEVAN.91Mar5123224@tiger.cs.man.ac.uk>, by bevan@cs.man.ac.uk (Stephen J Bevan):
> In the past I've written various programs to extract information from
> files...[deletions] I'd like to move to a single language for doing
> these sort of tasks. The likely contenders for this seem to be Perl,
> Python and Icon. I was wondering if anybody has experiences with them
> that they'd like to share?

Would someone please post or e-mail me a reference for Python?

Generally, my guess is that you would find Icon better for complex
string analysis tasks that don't fit into neat regular expressions,
or for programs that use extracted information in complex ways.
Perl looks better for the jobs that it was designed to handle.

<Insert admitted Icon bias here.>
-- 
| Clint Jeffery, U. of Arizona Dept. of Computer Science
| cjeffery@cs.arizona.edu -or- {noao allegra}!arizona!cjeffery
--

tchrist@convex.COM (Tom Christiansen) (03/08/91)

From the keyboard of cjeffery@cs.arizona.edu (Clinton Jeffery):
:From article <BEVAN.91Mar5123224@tiger.cs.man.ac.uk>, by bevan@cs.man.ac.uk (Stephen J Bevan):
:Generally, my guess is that you would find Icon better for complex
:string analysis tasks that don't fit into neat regular expressions,
:or for programs that use extracted information in complex ways.
:Perl looks better for the jobs that it was designed to handle.

As in?  What kind of "complex ways" are you talking about?  I think
that's what the poster wants.

--tom
--
	I get so tired of utilities with arbitrary, undocumented,
	compiled-in limits.  Don't you?

Tom Christiansen		tchrist@convex.com	convex!tchrist

bevan@cs.man.ac.uk (Stephen J Bevan) (03/08/91)

> Would someone please post or e-mail me a reference for Python?

Seeing as I started this with mentioning Python, here's what I know
about it.  It was advertised in a few news groups (it definitely
appeared in comp.archives) and the source was posted to alt.sources.
Anyway here's part of the post.

From Guido van Rossum <guido@cwi.nl> (the author of Python)

> I have placed tarred, compressed versions of Python and STDWIN on the
> anonymous ftp archive server "wuarchive.wustl.edu", in pub, under the names
> python0.9.1.tar.Z and stdwin0.9.4.tar.Z.  This includes official patch#1
> for Python.  I will also place the Postscript of the manuals there under
> the names pythondoc1.ps.Z and pythondoc2.ps.Z.

I now have the python documentation and have just finished reading it
(about ten minutes ago).  As I haven't read much about Perl and Icon,
I'm not in a position to comment on how it compares yet.

> Generally, my guess is that you would find Icon better for complex
> string analysis tasks that don't fit into neat regular expressions,
> or for programs that use extracted information in complex ways.
> Perl looks better for the jobs that it was designed to handle.

I'm specifically after a language which allow flexible extraction of
information from multiple files and the creation of subsequent files.
I currently use awk for most of these tasks, but the built in line
length limits, not being able to define my own functions and terrible
error messages are a pain%.

> <Insert admitted Icon bias here.>

Well if I can show my bias here, I'd forget all about Perl, AWK,
Python, Icon ... etc. if Scheme had things like regexps.  Granted I
could implement them myself (e.g. an extension to ELK), but I'm trying
not to invent yet another language but use one that is already
defined.

Stephen J. Bevan			bevan@cs.man.ac.uk

% Some, if not all, of these may be solved by using gawk fron GNU
  rather than the awk that comes with SunOS