bevan@cs.man.ac.uk (Stephen J Bevan) (03/05/91)
In the past I've written various programs to extract information from
files. To do this I've used :- Common Lisp, Emacs Lisp, awk, sh, ksh
and csh. As this is a bit of nightmare as regards maintenance, I'd
like to move to a single language for doing these sort of tasks. The
likely contenders for this seem to be Perl, Python and Icon.
Rather than FTP all of them and wade through the documentation, I was
wondering if anybody has experiences with them that they'd like to
share?
I'm particularly interested in comments from people who have used (or
at least looked at) more than one of them.
As a guide to the sort of things I'm interested in :-
+ Does the language have any arbitrary limits? e.g. the length of a
line ... etc.
+ How fast is it? This can be compared to whatever you like, but each
other preferably. I'm not really interested if XXX is only, X%
quicker than YYY on average (whatever that maybe).
+ Does it give `reasonable' error messages? i.e. something better
than the equivalent of `awk bailing out on line X'.
+ Does it have a debugger? If not, are there any extra facilities
for debuggging above and beyond simply inserting `printf' (change
as appropriate) statements.
+ Does it allow conditional interpretation/compilation? i.e.
anything like +FEATURE in Lisp or #ifdef FEATURE/#endif in C.
Some other points to note :-
+ The scripts won't be distributed, so arguments about XXX is
installed on more machines than YYY aren't relevant.
+ The fact that Perl has a C like syntax is NOT an advantage in my
book. (I'm not saying its a disadvantage either, I just don't
think it's important either way).
email/post as you think is appropriate (note the followup to
comp.lang.misc). I will summarize email replies after a suitable period.
Thanks in advance,
Stephen J. Bevan bevan@cs.man.ac.ukbrnstnd@kramden.acf.nyu.edu (Dan Bernstein) (03/06/91)
In article <BEVAN.91Mar5123224@tiger.cs.man.ac.uk> bevan@cs.man.ac.uk (Stephen J Bevan) writes: > + Does the language have any arbitrary limits? e.g. the length of a > line ... etc. *Languages* rarely set arbitrary (semantic) limits. I regularly write text-processing utilities in C that accept lines of any length. All the GNU utilities do too. Yes, all of the languages in question were designed for text processing, and have the unrestricted builtins you want. > + How fast is it? This can be compared to whatever you like, but each > other preferably. I'm not really interested if XXX is only, X% > quicker than YYY on average (whatever that maybe). All these languages are interpreters and hence are relatively slow. The exact answer depends on your application. > + Does it give `reasonable' error messages? i.e. something better > than the equivalent of `awk bailing out on line X'. I have no idea what you mean by ``reasonable.'' > + Does it have a debugger? If not, are there any extra facilities > for debuggging above and beyond simply inserting `printf' (change > as appropriate) statements. Perl does. I haven't seen one for Icon, and I'm not yet familiar enough with the Python package. > + Does it allow conditional interpretation/compilation? i.e. > anything like +FEATURE in Lisp or #ifdef FEATURE/#endif in C. Who cares? We're talking about interpreters. ---Dan
guido@cwi.nl (Guido van Rossum) (03/06/91)
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >> + Does it have a debugger? If not, are there any extra facilities >> for debuggging above and beyond simply inserting `printf' (change >> as appropriate) statements. >Perl does. I haven't seen one for Icon, and I'm not yet familiar enough >with the Python package. Python currently has some debugging support: when an unhandled exception occurs, it prints a stack backtrace showing source code lines. (The backtrace is arguably upside-down, but then most stacks are...) Interactively, you can then use a traceback module (not yet documented, but supplied with the distribution: lib/tb.py) which allows you to inspect the stack frames: print local and global variables, and even execute statements in the context of a given stack frame. There is also a disassembly module for the bytecode used by the Python interpreter (lib/dis.py). There is no single-step or breakpoint facility, but it should not be hard to implement this in the Python interpreter -- I may do so in the next release. (You can force a dump at a particular place by writing "raise SystemError", but you can't continue yet.) There is rudimentary support for reloading modules after you've editing them without leaving the interpreter, but there are problems that I haven't fully solved yet: you may or may not have to re-execute the initialization code of other modules that reference the reloaded module, depending on what use the other module makes of the reloaded module. But actually, I do most of my debugging by inserting print statements in the rare cases where the stack backtrace doesn't immediately show what went wrong. --Guido van Rossum, CWI, Amsterdam <guido@cwi.nl> "It's probably pining for the fiords"
tchrist@convex.COM (Tom Christiansen) (03/07/91)
From the keyboard of bevan@cs.man.ac.uk (Stephen J Bevan):
I'll answer from the questions from a perl standpoint, and let
others address the others. I'm afraid I'm not eminently qualified
to make comparisons, and I've only look at, not actually programmed in,
the other languages mentioned.
:As a guide to the sort of things I'm interested in :-
:
: + Does the language have any arbitrary limits? e.g. the length of a
: line ... etc.
This is one of perl's strong points: it has no such arbitrary limits. Any
variable (including of course the current input line) can be as big as
your virtual memory space allows, your regexps can be as long as you want,
you can have any number of elements in your lists and tables, and binary
data is handled gracefully, meaning strings with null bytes or 8-bit data
don't confuse anything.
If you say:
$unix = `cat vmunix`;
and you can malloc that much, you'll get the whole kernel in your string.
: + How fast is it? This can be compared to whatever you like, but each
: other preferably. I'm not really interested if XXX is only, X%
: quicker than YYY on average (whatever that maybe).
It really depends on what you're trying to do. For some things, Perl
has a speed comparable to C: these are things that require a lot of
pattern-matching. Perl's regexp facility are very rich, powerful, and
highly optimized. Perl does B-M one better. Instead of just compiling
your regexps, you can do something like compiling the pattern space with
the study operator, which can make your program really scream. On the
other hand, for most general programming, it is definitely going to be
slower than C (2-5x), but faster than if you'd stitched the component sed
and awk pieces together with sh.
: + Does it give `reasonable' error messages? i.e. something better
: than the equivalent of `awk bailing out on line X'.
Perl always tells you the file and line in error, as well as printing out
the next two tokens it was looking at when the parser got indigestion.
It can find run-away strings or regexps (newlines aren't end of statement
tokens -- they're just like spaces) and tell you where you went wrong.
It doesn't just bail-out on the first error, but tries to recover
and give you as much as it can find. Usually it makes a decent stab
at this.
Furthermore, it's easy for you do generate your own error messages, even
of this form. You can get at the current errno/strerror message, the
current file and line information for you (or your caller), and even any
syntax or other fatal runtime errors that occurred in code protected by an
exception handler.
: + Does it have a debugger? If not, are there any extra facilities
: for debuggging above and beyond simply inserting `printf' (change
: as appropriate) statements.
Yes, perl comes with a fully-featured symbolic debugger that does most
of what you're used to if you're an sdb or dbx user: breakpoints,
searching, tracing, examining or setting of variables, etc. In fact,
the perl debugger is often used as a form of interactive perl to test
out new things, since you are free to type legit (or otherwise) perl
at the debugger, and you'll get immediate feedback.
: + Does it allow conditional interpretation/compilation? i.e.
: anything like +FEATURE in Lisp or #ifdef FEATURE/#endif in C.
Sure -- perl will call cpp if you add a -P flag to it. That way
you can say things like
#if defined(sun) || defined(vax)
An alternative is to execute system-dependent code (symlinks, dbm
stuff, etc) in an active exception handler to trap any fatals that
might occur if one of these isn't supported by your O/S.
A few other features perhaps worth noting:
+ It's easy to develop library routines for run-time inclusion by your
other programs. These library modules have control over visibility of
identifiers so you don't accidentally stomp on commonly-used variable
or function names. Scoping is in general of the dynamic nature,
although the package protection mentioned above provides for a sort of
static scoping on a module basis, i.e. modules can have their own
private "file static" kind of identifiers. Modules can also have
private initialization code.
+ Perl is very well integrated into the UNIX environment. It's easy
to open pipes to and from other processes, set up co-routines via pipes
and forks, and get at low-level things like file descriptors to dup or
fcntl or ioctl. Perl already has hooks for most of the common system
calls and C library routines, so you have to call fewer external
programs, which speeds up your program. For system calls not covered,
you can get at them via the syscall() function if your system supports
it. You can even link in your own C functions if you want, such as for
interacting with SQL or the like. Extensive library routines are
already provided, doing things like getopts, tcsh-like completion,
"infinite precision" arithmetic, termcap functions, plus a bunch more.
+ While only scalars (atoms) and lists are what the purists might call
first-class data types (anonymous temporaries), you can pass all three
basic data types (scalars, lists, tables AKA assoc arrays) as function
params or return values. For other semi-data types, like filehandles,
picture formats, regular expressions, and subroutines, you can use a
scalar variable instead of a literal identifier and indirect through
it. And of course, if all else fails, you can always use an eval.
This means you can write a function that builds a new, uniquely-named
function and returns you the name of the new function, assign that to a
variable, and call the new function through the variable.
+ Other support facilities include translators for your old sed and awk
scripts, a sort of "perl lint" switch (-w), and a security checking
mechanism for setuid programs that catches brain-dead moves that are
likely to be exploitable by a cracker, something even C can't do.
+ Because Perl derives so heavily from C, sed, and awk, if you already
know these, your ramp-up time will be much less than if you were
really starting from scratch.
--tom
--
I get so tired of utilities with arbitrary, undocumented,
compiled-in limits. Don't you?
Tom Christiansen tchrist@convex.com convex!tchristpeter@ficc.ferranti.com (Peter da Silva) (03/07/91)
In article <BEVAN.91Mar5123224@tiger.cs.man.ac.uk> bevan@cs.man.ac.uk (Stephen J Bevan) writes: > As this is a bit of nightmare as regards maintenance, I'd > like to move to a single language for doing these sort of tasks. The > likely contenders for this seem to be Perl, Python and Icon. You should probably consider TCL as well. I'm not familiar with Python and only have a reading-the-manuals acquaintance with Perl and a bit more with Icon. Icon is certainly the most "real" language, in terms of having a consistent design, of any of these. TCL has the advantage that it is almost trivial to extend it with C code, and merge it with existing C applications. So... > + Does the language have any arbitrary limits? e.g. the length of a > line ... etc. No. > + How fast is it? This can be compared to whatever you like, but each > other preferably. I'm not really interested if XXX is only, X% > quicker than YYY on average (whatever that maybe). Well, we replaced some shell scripts with TCL scripts here and got on the order of a factor of a hundred speedup. So I'd say it's "fast enough". I suspect that the other languages you're considering are on the same order of magnitude. > + Does it give `reasonable' error messages? i.e. something better > than the equivalent of `awk bailing out on line X'. You can get a complete call trace of the error if that's your thing. > + Does it have a debugger? If not, are there any extra facilities > for debuggging above and beyond simply inserting `printf' (change > as appropriate) statements. You can trace code execution on a statement by statement basis. You can also redefine "proc" to put whatever code wrappers you want around procedures, and redefine any of the functions in the language similarly. > + Does it allow conditional interpretation/compilation? i.e. > anything like +FEATURE in Lisp or #ifdef FEATURE/#endif in C. Yes, since it simply executes the input file you can put a proc inside an if with no problem. > + The fact that Perl has a C like syntax is NOT an advantage in my > book. (I'm not saying its a disadvantage either, I just don't > think it's important either way). TCL is like a text-oriented Lisp, but lets you write algebraic expressions for simplicity and to avoid scaring people away. -- Peter da Silva. `-_-' peter@ferranti.com +1 713 274 5180. 'U` "Have you hugged your wolf today?"
cjeffery@cs.arizona.edu (Clinton Jeffery) (03/08/91)
From article <BEVAN.91Mar5123224@tiger.cs.man.ac.uk>, by bevan@cs.man.ac.uk (Stephen J Bevan): > In the past I've written various programs to extract information from > files...[deletions] I'd like to move to a single language for doing > these sort of tasks. The likely contenders for this seem to be Perl, > Python and Icon. I was wondering if anybody has experiences with them > that they'd like to share? Would someone please post or e-mail me a reference for Python? Generally, my guess is that you would find Icon better for complex string analysis tasks that don't fit into neat regular expressions, or for programs that use extracted information in complex ways. Perl looks better for the jobs that it was designed to handle. <Insert admitted Icon bias here.> -- | Clint Jeffery, U. of Arizona Dept. of Computer Science | cjeffery@cs.arizona.edu -or- {noao allegra}!arizona!cjeffery --
tchrist@convex.COM (Tom Christiansen) (03/08/91)
From the keyboard of cjeffery@cs.arizona.edu (Clinton Jeffery): :From article <BEVAN.91Mar5123224@tiger.cs.man.ac.uk>, by bevan@cs.man.ac.uk (Stephen J Bevan): :Generally, my guess is that you would find Icon better for complex :string analysis tasks that don't fit into neat regular expressions, :or for programs that use extracted information in complex ways. :Perl looks better for the jobs that it was designed to handle. As in? What kind of "complex ways" are you talking about? I think that's what the poster wants. --tom -- I get so tired of utilities with arbitrary, undocumented, compiled-in limits. Don't you? Tom Christiansen tchrist@convex.com convex!tchrist
bevan@cs.man.ac.uk (Stephen J Bevan) (03/08/91)
> Would someone please post or e-mail me a reference for Python? Seeing as I started this with mentioning Python, here's what I know about it. It was advertised in a few news groups (it definitely appeared in comp.archives) and the source was posted to alt.sources. Anyway here's part of the post. From Guido van Rossum <guido@cwi.nl> (the author of Python) > I have placed tarred, compressed versions of Python and STDWIN on the > anonymous ftp archive server "wuarchive.wustl.edu", in pub, under the names > python0.9.1.tar.Z and stdwin0.9.4.tar.Z. This includes official patch#1 > for Python. I will also place the Postscript of the manuals there under > the names pythondoc1.ps.Z and pythondoc2.ps.Z. I now have the python documentation and have just finished reading it (about ten minutes ago). As I haven't read much about Perl and Icon, I'm not in a position to comment on how it compares yet. > Generally, my guess is that you would find Icon better for complex > string analysis tasks that don't fit into neat regular expressions, > or for programs that use extracted information in complex ways. > Perl looks better for the jobs that it was designed to handle. I'm specifically after a language which allow flexible extraction of information from multiple files and the creation of subsequent files. I currently use awk for most of these tasks, but the built in line length limits, not being able to define my own functions and terrible error messages are a pain%. > <Insert admitted Icon bias here.> Well if I can show my bias here, I'd forget all about Perl, AWK, Python, Icon ... etc. if Scheme had things like regexps. Granted I could implement them myself (e.g. an extension to ELK), but I'm trying not to invent yet another language but use one that is already defined. Stephen J. Bevan bevan@cs.man.ac.uk % Some, if not all, of these may be solved by using gawk fron GNU rather than the awk that comes with SunOS