bevan@cs.man.ac.uk (Stephen J Bevan) (03/05/91)
In the past I've written various programs to extract information from files. To do this I've used :- Common Lisp, Emacs Lisp, awk, sh, ksh and csh. As this is a bit of nightmare as regards maintenance, I'd like to move to a single language for doing these sort of tasks. The likely contenders for this seem to be Perl, Python and Icon. Rather than FTP all of them and wade through the documentation, I was wondering if anybody has experiences with them that they'd like to share? I'm particularly interested in comments from people who have used (or at least looked at) more than one of them. As a guide to the sort of things I'm interested in :- + Does the language have any arbitrary limits? e.g. the length of a line ... etc. + How fast is it? This can be compared to whatever you like, but each other preferably. I'm not really interested if XXX is only, X% quicker than YYY on average (whatever that maybe). + Does it give `reasonable' error messages? i.e. something better than the equivalent of `awk bailing out on line X'. + Does it have a debugger? If not, are there any extra facilities for debuggging above and beyond simply inserting `printf' (change as appropriate) statements. + Does it allow conditional interpretation/compilation? i.e. anything like +FEATURE in Lisp or #ifdef FEATURE/#endif in C. Some other points to note :- + The scripts won't be distributed, so arguments about XXX is installed on more machines than YYY aren't relevant. + The fact that Perl has a C like syntax is NOT an advantage in my book. (I'm not saying its a disadvantage either, I just don't think it's important either way). email/post as you think is appropriate (note the followup to comp.lang.misc). I will summarize email replies after a suitable period. Thanks in advance, Stephen J. Bevan bevan@cs.man.ac.uk
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (03/06/91)
In article <BEVAN.91Mar5123224@tiger.cs.man.ac.uk> bevan@cs.man.ac.uk (Stephen J Bevan) writes: > + Does the language have any arbitrary limits? e.g. the length of a > line ... etc. *Languages* rarely set arbitrary (semantic) limits. I regularly write text-processing utilities in C that accept lines of any length. All the GNU utilities do too. Yes, all of the languages in question were designed for text processing, and have the unrestricted builtins you want. > + How fast is it? This can be compared to whatever you like, but each > other preferably. I'm not really interested if XXX is only, X% > quicker than YYY on average (whatever that maybe). All these languages are interpreters and hence are relatively slow. The exact answer depends on your application. > + Does it give `reasonable' error messages? i.e. something better > than the equivalent of `awk bailing out on line X'. I have no idea what you mean by ``reasonable.'' > + Does it have a debugger? If not, are there any extra facilities > for debuggging above and beyond simply inserting `printf' (change > as appropriate) statements. Perl does. I haven't seen one for Icon, and I'm not yet familiar enough with the Python package. > + Does it allow conditional interpretation/compilation? i.e. > anything like +FEATURE in Lisp or #ifdef FEATURE/#endif in C. Who cares? We're talking about interpreters. ---Dan
guido@cwi.nl (Guido van Rossum) (03/06/91)
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >> + Does it have a debugger? If not, are there any extra facilities >> for debuggging above and beyond simply inserting `printf' (change >> as appropriate) statements. >Perl does. I haven't seen one for Icon, and I'm not yet familiar enough >with the Python package. Python currently has some debugging support: when an unhandled exception occurs, it prints a stack backtrace showing source code lines. (The backtrace is arguably upside-down, but then most stacks are...) Interactively, you can then use a traceback module (not yet documented, but supplied with the distribution: lib/tb.py) which allows you to inspect the stack frames: print local and global variables, and even execute statements in the context of a given stack frame. There is also a disassembly module for the bytecode used by the Python interpreter (lib/dis.py). There is no single-step or breakpoint facility, but it should not be hard to implement this in the Python interpreter -- I may do so in the next release. (You can force a dump at a particular place by writing "raise SystemError", but you can't continue yet.) There is rudimentary support for reloading modules after you've editing them without leaving the interpreter, but there are problems that I haven't fully solved yet: you may or may not have to re-execute the initialization code of other modules that reference the reloaded module, depending on what use the other module makes of the reloaded module. But actually, I do most of my debugging by inserting print statements in the rare cases where the stack backtrace doesn't immediately show what went wrong. --Guido van Rossum, CWI, Amsterdam <guido@cwi.nl> "It's probably pining for the fiords"
tchrist@convex.COM (Tom Christiansen) (03/07/91)
From the keyboard of bevan@cs.man.ac.uk (Stephen J Bevan): I'll answer from the questions from a perl standpoint, and let others address the others. I'm afraid I'm not eminently qualified to make comparisons, and I've only look at, not actually programmed in, the other languages mentioned. :As a guide to the sort of things I'm interested in :- : : + Does the language have any arbitrary limits? e.g. the length of a : line ... etc. This is one of perl's strong points: it has no such arbitrary limits. Any variable (including of course the current input line) can be as big as your virtual memory space allows, your regexps can be as long as you want, you can have any number of elements in your lists and tables, and binary data is handled gracefully, meaning strings with null bytes or 8-bit data don't confuse anything. If you say: $unix = `cat vmunix`; and you can malloc that much, you'll get the whole kernel in your string. : + How fast is it? This can be compared to whatever you like, but each : other preferably. I'm not really interested if XXX is only, X% : quicker than YYY on average (whatever that maybe). It really depends on what you're trying to do. For some things, Perl has a speed comparable to C: these are things that require a lot of pattern-matching. Perl's regexp facility are very rich, powerful, and highly optimized. Perl does B-M one better. Instead of just compiling your regexps, you can do something like compiling the pattern space with the study operator, which can make your program really scream. On the other hand, for most general programming, it is definitely going to be slower than C (2-5x), but faster than if you'd stitched the component sed and awk pieces together with sh. : + Does it give `reasonable' error messages? i.e. something better : than the equivalent of `awk bailing out on line X'. Perl always tells you the file and line in error, as well as printing out the next two tokens it was looking at when the parser got indigestion. It can find run-away strings or regexps (newlines aren't end of statement tokens -- they're just like spaces) and tell you where you went wrong. It doesn't just bail-out on the first error, but tries to recover and give you as much as it can find. Usually it makes a decent stab at this. Furthermore, it's easy for you do generate your own error messages, even of this form. You can get at the current errno/strerror message, the current file and line information for you (or your caller), and even any syntax or other fatal runtime errors that occurred in code protected by an exception handler. : + Does it have a debugger? If not, are there any extra facilities : for debuggging above and beyond simply inserting `printf' (change : as appropriate) statements. Yes, perl comes with a fully-featured symbolic debugger that does most of what you're used to if you're an sdb or dbx user: breakpoints, searching, tracing, examining or setting of variables, etc. In fact, the perl debugger is often used as a form of interactive perl to test out new things, since you are free to type legit (or otherwise) perl at the debugger, and you'll get immediate feedback. : + Does it allow conditional interpretation/compilation? i.e. : anything like +FEATURE in Lisp or #ifdef FEATURE/#endif in C. Sure -- perl will call cpp if you add a -P flag to it. That way you can say things like #if defined(sun) || defined(vax) An alternative is to execute system-dependent code (symlinks, dbm stuff, etc) in an active exception handler to trap any fatals that might occur if one of these isn't supported by your O/S. A few other features perhaps worth noting: + It's easy to develop library routines for run-time inclusion by your other programs. These library modules have control over visibility of identifiers so you don't accidentally stomp on commonly-used variable or function names. Scoping is in general of the dynamic nature, although the package protection mentioned above provides for a sort of static scoping on a module basis, i.e. modules can have their own private "file static" kind of identifiers. Modules can also have private initialization code. + Perl is very well integrated into the UNIX environment. It's easy to open pipes to and from other processes, set up co-routines via pipes and forks, and get at low-level things like file descriptors to dup or fcntl or ioctl. Perl already has hooks for most of the common system calls and C library routines, so you have to call fewer external programs, which speeds up your program. For system calls not covered, you can get at them via the syscall() function if your system supports it. You can even link in your own C functions if you want, such as for interacting with SQL or the like. Extensive library routines are already provided, doing things like getopts, tcsh-like completion, "infinite precision" arithmetic, termcap functions, plus a bunch more. + While only scalars (atoms) and lists are what the purists might call first-class data types (anonymous temporaries), you can pass all three basic data types (scalars, lists, tables AKA assoc arrays) as function params or return values. For other semi-data types, like filehandles, picture formats, regular expressions, and subroutines, you can use a scalar variable instead of a literal identifier and indirect through it. And of course, if all else fails, you can always use an eval. This means you can write a function that builds a new, uniquely-named function and returns you the name of the new function, assign that to a variable, and call the new function through the variable. + Other support facilities include translators for your old sed and awk scripts, a sort of "perl lint" switch (-w), and a security checking mechanism for setuid programs that catches brain-dead moves that are likely to be exploitable by a cracker, something even C can't do. + Because Perl derives so heavily from C, sed, and awk, if you already know these, your ramp-up time will be much less than if you were really starting from scratch. --tom -- I get so tired of utilities with arbitrary, undocumented, compiled-in limits. Don't you? Tom Christiansen tchrist@convex.com convex!tchrist
peter@ficc.ferranti.com (Peter da Silva) (03/07/91)
In article <BEVAN.91Mar5123224@tiger.cs.man.ac.uk> bevan@cs.man.ac.uk (Stephen J Bevan) writes: > As this is a bit of nightmare as regards maintenance, I'd > like to move to a single language for doing these sort of tasks. The > likely contenders for this seem to be Perl, Python and Icon. You should probably consider TCL as well. I'm not familiar with Python and only have a reading-the-manuals acquaintance with Perl and a bit more with Icon. Icon is certainly the most "real" language, in terms of having a consistent design, of any of these. TCL has the advantage that it is almost trivial to extend it with C code, and merge it with existing C applications. So... > + Does the language have any arbitrary limits? e.g. the length of a > line ... etc. No. > + How fast is it? This can be compared to whatever you like, but each > other preferably. I'm not really interested if XXX is only, X% > quicker than YYY on average (whatever that maybe). Well, we replaced some shell scripts with TCL scripts here and got on the order of a factor of a hundred speedup. So I'd say it's "fast enough". I suspect that the other languages you're considering are on the same order of magnitude. > + Does it give `reasonable' error messages? i.e. something better > than the equivalent of `awk bailing out on line X'. You can get a complete call trace of the error if that's your thing. > + Does it have a debugger? If not, are there any extra facilities > for debuggging above and beyond simply inserting `printf' (change > as appropriate) statements. You can trace code execution on a statement by statement basis. You can also redefine "proc" to put whatever code wrappers you want around procedures, and redefine any of the functions in the language similarly. > + Does it allow conditional interpretation/compilation? i.e. > anything like +FEATURE in Lisp or #ifdef FEATURE/#endif in C. Yes, since it simply executes the input file you can put a proc inside an if with no problem. > + The fact that Perl has a C like syntax is NOT an advantage in my > book. (I'm not saying its a disadvantage either, I just don't > think it's important either way). TCL is like a text-oriented Lisp, but lets you write algebraic expressions for simplicity and to avoid scaring people away. -- Peter da Silva. `-_-' peter@ferranti.com +1 713 274 5180. 'U` "Have you hugged your wolf today?"
cjeffery@cs.arizona.edu (Clinton Jeffery) (03/08/91)
From article <BEVAN.91Mar5123224@tiger.cs.man.ac.uk>, by bevan@cs.man.ac.uk (Stephen J Bevan): > In the past I've written various programs to extract information from > files...[deletions] I'd like to move to a single language for doing > these sort of tasks. The likely contenders for this seem to be Perl, > Python and Icon. I was wondering if anybody has experiences with them > that they'd like to share? Would someone please post or e-mail me a reference for Python? Generally, my guess is that you would find Icon better for complex string analysis tasks that don't fit into neat regular expressions, or for programs that use extracted information in complex ways. Perl looks better for the jobs that it was designed to handle. <Insert admitted Icon bias here.> -- | Clint Jeffery, U. of Arizona Dept. of Computer Science | cjeffery@cs.arizona.edu -or- {noao allegra}!arizona!cjeffery --
tchrist@convex.COM (Tom Christiansen) (03/08/91)
From the keyboard of cjeffery@cs.arizona.edu (Clinton Jeffery): :From article <BEVAN.91Mar5123224@tiger.cs.man.ac.uk>, by bevan@cs.man.ac.uk (Stephen J Bevan): :Generally, my guess is that you would find Icon better for complex :string analysis tasks that don't fit into neat regular expressions, :or for programs that use extracted information in complex ways. :Perl looks better for the jobs that it was designed to handle. As in? What kind of "complex ways" are you talking about? I think that's what the poster wants. --tom -- I get so tired of utilities with arbitrary, undocumented, compiled-in limits. Don't you? Tom Christiansen tchrist@convex.com convex!tchrist
bevan@cs.man.ac.uk (Stephen J Bevan) (03/08/91)
> Would someone please post or e-mail me a reference for Python? Seeing as I started this with mentioning Python, here's what I know about it. It was advertised in a few news groups (it definitely appeared in comp.archives) and the source was posted to alt.sources. Anyway here's part of the post. From Guido van Rossum <guido@cwi.nl> (the author of Python) > I have placed tarred, compressed versions of Python and STDWIN on the > anonymous ftp archive server "wuarchive.wustl.edu", in pub, under the names > python0.9.1.tar.Z and stdwin0.9.4.tar.Z. This includes official patch#1 > for Python. I will also place the Postscript of the manuals there under > the names pythondoc1.ps.Z and pythondoc2.ps.Z. I now have the python documentation and have just finished reading it (about ten minutes ago). As I haven't read much about Perl and Icon, I'm not in a position to comment on how it compares yet. > Generally, my guess is that you would find Icon better for complex > string analysis tasks that don't fit into neat regular expressions, > or for programs that use extracted information in complex ways. > Perl looks better for the jobs that it was designed to handle. I'm specifically after a language which allow flexible extraction of information from multiple files and the creation of subsequent files. I currently use awk for most of these tasks, but the built in line length limits, not being able to define my own functions and terrible error messages are a pain%. > <Insert admitted Icon bias here.> Well if I can show my bias here, I'd forget all about Perl, AWK, Python, Icon ... etc. if Scheme had things like regexps. Granted I could implement them myself (e.g. an extension to ELK), but I'm trying not to invent yet another language but use one that is already defined. Stephen J. Bevan bevan@cs.man.ac.uk % Some, if not all, of these may be solved by using gawk fron GNU rather than the awk that comes with SunOS