bevan@cs.man.ac.uk (Stephen J Bevan) (03/30/91)
[Note I've crossposted to all the groups I send my original message to. This was at the request of some of the respondents (sp?)] Here are the results of my question regarding which language to use for writing programs to extract information from files, generate reports ... etc. I initially suggested languages like Perl, Icon, Python ... As part of my original message I said :- > Rather than FTP all of them and wade through the documentation, I was > wondering if anybody has experiences with them that they'd like to > share? I would like to thank the following people for replying :- Dan Bernstein - brnstnd@kramden.acf.nyu.edu Tom Christiansen - tchrist@convex.COM Chris Eich - chrise@hpnmdla.hp.com Richard L. Goerwitz - goer@midway.uchicago.edu Clinton Jeffery - cjeffery@cs.arizona.edu Guido van Rossum - guido@cwi.nl Randal L. Schwartz - merlyn@iWarp.intel.com Peter da Silva - peter@ficc.ferranti.com Alan Thew - QQ11@LIVERPOOL.AC.UK Edward Vielmetti - emv@ox.com ?? - russell@ccu1.aukuni.ac.nz Most of the replies were about Perl, so I didn't learn much about the other languages I suggested (other than very general things). Even though I was originally hoping not to have to ftp any stuff, I ended up getting the source to Python, GAWK, TCL, Icon and the texinfo manual for Perl. To save you going through my list of good and bad points of the languages I looked at, here is the summary of what I see the languages as :- TCL - an embedded language i.e. an extension language for large programs (IMHO only if you haven't got, or don't like, Scheme based ones like ELK). Perl - the de facto UNIX scripting language. You name it, and you can probably cobble a solution together in Perl. Beyond the fact that a lot of people use it, I can see nothing to recommend it. It's a bit like C in that respect. Python - Good prototyping language with a consistent design. It might not have all the low level UNIX stuff built in, but by using modules, its easy to add the necessary things in an ordered way. Icon - the `nearly' language. Well designed language, that never seemed to make it into general use. Seems to cover the ground all the way from AWK type applications to Prolog/Lisp ones. If I wasn't already happy with Scheme, I'd use this for more general programming. I would recommend people at least look at this language. GAWK - simple scripting language. Definitely better than `old' awk. I would only use it if the job were really simple or if something like Python or TCL were not available. Note I wouldn't expect anybody to make a choice on what I say. I suggest you get the source/manuals yourself and have a good long look at the language/implementation before you decide. For the types of things _I_ want to do, it would be a tie between Icon and Python. Having said that, given that I'd have to extend both to cover the sort of things I want to do, I'll probably use Scheme instead (ELK in particular). The reason I didn't just use Scheme in the first place is that I was hoping one of the languages would have all the facilities I want without me having to extend them myself. Before, the summary of the languages themselves, I thought I'd try and list some of the things I was looking for. (Actually, I showed an earlier version of this summary to somebody and they didn't understand some of the terms I was using, so this is my attempt at an explanation). Note that most of the things are to do with structuring the code and alike. This is not the sort of thing you usually worry about when writing small scripts, but I plan to convert and write a number of tools, some of which are around the 1000 LOC mark. For example, I'd like to convert a particular lex/yacc/C program I have into the chosen language. You can skip ahead to the actual summary by searching for SUMMARY. (Well I can do this in GNUS, I don't know about other news readers like rn) Packages/Modules ---------------- These are a mechanism for splitting up the name space so that function name clashes are reduced. Most systems work by declaring a package and then all functions listed from then on are members of that package. You then access the functions using the package prefix, or import the whole package so that you don't have to use the prefix. The following is an example in CommonLisp :- ;;; foo.lsp ;;; bar.lsp (in-package 'foo) (in-package 'bar) (export '(bob)) (export '(bob)) (defun bob (a b) ...) (defun bob (x) ...) ;;; main.lsp (foo:bob 10 20) (bar:bob 3) Packages are not perfect, but they do help. You can get the same effect by declaring implicit package prefixes :- ;;; foo.lsp ;; bar.lsp (defun foo-bob (a b) ...) (defun bar-bob (x) ...) ;; main.lsp (foo-bob 10 30) (bar-bob 4) The advantage of packages over this is that you don't have to use a package prefix in the package itself when you want to call a function. This can be a saving if you have lots of functions in a package, and only a few are exported. Exception Handling ------------------ This is useful for dealing with error that shouldn't happen. e.g. reaching the end of the file when you were looking for some valid data. For example, in CommonLisp :- (defun foo (x y) ... (if (catch 'some-unexpexted-error (bar x y) nil) (handle-the-exception ...) (define bar (a b) ... (if (something-wrong) (throw 'some-unexpected-error t)) ...) Here the function `foo' calls `bar', and if any error occurs whilst processing, it is handled by the exception handler. (The example is a bit primitive as I'm trying to save space). The advantage of this is that you don't have to explicitly pass back all sorts of error codes from your functions to handle unusual errors. It also usually means you won't have so many nested `if's to handle the special cases, therefore, making your code clearer. Records/Tuples/Aggregates/Structs --------------------------------- It's handy to be to define objects that contain certain number of elements. You can then pass these objects around and access the individual bits. For example in CommonLisp :- (defstruct point x y) This declares `point' as a type containing two items called `x' and `y'. Some languages don't name the items, they rely on position instead. I see these as equivalent (assuming you have some sort of pattern matching) Provide/Require --------------- This is a primitive facility for declaring that one package depends on another one. For example in CommonLisp :- ;;; foo.lsp (defun bob (a b) ...) (provide 'foo) ;;; main.lsp (require 'foo) (bob 10 3) The above declares that the file `foo' provides the function `bob' and that the file `main' requires `foo' to be loaded for it to work. So when you load in `main' and `foo' hasn't been loaded, it is automatically loaded by the system. C Interface ----------- How easy is it to call C from the language. Is there a dynamic loading facility i.e. do I have to recompile the program to use some arbitrary C code, or can it load in a .o file at runtime? Arbitrary Restrictions ---------------------- This really applies to the implementations rather than the languages. However, as there is only one implementation for most of the languages I'm looking at, they tend to be synonymous If there is one thing I hate about an [implementation of] a languages its arbitrary restrictions. For example, `the length of the input line must not exceed 80 characters', or "strings must be less than 255 characters long". I can except some initial restrictions if :- 1) they are documented. 2) they will be removed in future versions. Note. I realise that some restrictions are not arbitrary, or at least not under the control of the language implementor e.g. the number of open files under UNIX. SUMMARY ------- If you want to know more about the languages, there follows a brief description of the languages, how to get an implementation and some good and bad points as I see them. Each point is preceded by a character indicating the type of point :- + good point - bad point * just a point to note ! subjective point Other than the `*' items, I guess it is all subjective, however, I've tried to put things that are generally good/bad in `+'/`-' and limit really subjective statements to `!'. TCL - version 4.0 patch level 1 ------------------------------- TCL (Tool Command Language) was developed by John Ousterhout at Berkeley. It started out as a small language that could be embedded in applications. It has now been extended by some people at hackercorp into more of a general purpose shell type programming language. It is described by Peter Da Silva (one of the people who extended it) as :- > TCL is like a text-oriented Lisp, but lets you write algebraic > expressions for simplicity and to avoid scaring people away. The language itself for some reason reminds me of csh even though I can only point to two things (the use of `set' and `$') which a definitely like csh. Unless you have other ideas about what an extension language should look like (e.g. IMO it should be Scheme), then I'd definitely recommend this. It's small, and integrates easily with other C programs (you can even have multiple TCL interpreters in an application!) Version 5.0 is available by anonymous ftp from sprite.berkeley.edu as tk.tar.Z (its part of an X toolkit called Tk). Note, although it has a higher number than the one above, does not include the extensions mentioned above. These will apparently be integrated soon. Version 4.0 pl1 is available by anonymous ftp from media-lab.ai.mit.edu (sorry can't remember the exact path) + exceptions. + packages, called libraries However there is only one name-space. The libraries are used as a way of storing single versions of code rather than as a solution to the name space pollution problem. + provide/require + C interface is excellent. You can easily go TCL->C and C->TCL. - No dynamic loading ability that I'm aware of. - Arbitrary line length limit on `gets' and `scan'. i.e. the commands that read lines from files/strings. I would guess this will go away in the next version. - No records. The main data types are strings/lists/associative arrays + extensive test suite included. ! doesn't look to have been tested on many systems. The above version actually failed to link on a SPARCstation running SunOS 4.1 as the source refers to `strerror'. This has apparently been fixed in patch level 2. + lots of example code included in distribution. + extensive documentation (all in nroff) + Can trace execution. ! To make arguments evaluate, you must enclose them in {} or [] This shouldn't be a problem, except that being used to Lisp like languages I expect to quote constants. ! The extensions though useful, are not seamless. e.g. some string facilities are in the core language and some in the extensions. This might happen when the hackercorp extensions are officially merged with the Berkeley core language and released by Berkeley. + As part of the extensions, you get tclsh. This is a shell which you can type command directly into. + scan contexts. This is sort of regular expressions on files rather than strings. Python - version 0.9.1 ---------------------- Available by anonymous ftp from wuarchive.wustl.edu as pub/python0.9.1.tar.Z or for Europeans via the info server at hp4nl.nluug.nl I couldn't think of a good way to describe this, so I'm blatantly copying the following from the Python tutorial :- Python is a simple, yet powerful programming language that bridges the gap between C and shell programming, and is thus ideally suited for rapid prototyping. Its syntax is put together from constructs borrowed from a variety of other languages; most prominent are influences from ABC, C, Modula-3 and Icon So far so good, here's some more from the tutorial :- Because of its more general data types Python is applicable to a much larger problem domain that Awk or even Perl, yet most simple things are at least as easy in Python as in those languages. i.e. Python seems to be designed for larger tasks than you would undertake using the shell/awk/perl. + packages. + exceptions (based on Modula 2/3 modules) + records (actually tuples. I'm not sure they do everything I want as the documentation is a bit vague in this area) Other main types are lists, sets, tables (associative arrays) + C interface is good. No dynamic linking that I am aware of. - Arbitrary Restrictions line length limit on readline. This has been fixed and I would guess will appear in the next release. + lots of example python programs included. There is even a TCL (version 2ish) interpreter! + Object oriented features. Based on Modula 3 i.e. classes with methods, all of which are virtual (to use a C++ term). * any un caught errors produce a stack trace. + disassembler included + can inspect stack frames via traceback module - no single step or breakpoint facility (maybe in the next release) + functions can return multiple values. * The default output command `print' inserts a space between each field output. ! I don't like the above, or rather I would like the option of not having it done. * Documentation includes tutorial and library reference as TeX files. Both are incomplete, but there is enough in them to be able to write Python code. The reference manual is not yet finished, and is not currently distributed with the source. + Python mode for Emacs. (Its primitive, but its a start) Icon - version 8 ---------------- To quote from one of the Icon books :- Icon is a high-level, general purpose programming language that contains many features for processing nonnumeric data, particularly for textual material consisting of string of characters. Available :- In USA :- ??, consult `archie'. In UK :- I picked up a copy form the sources archive at Imperial College. The JANET address is 00000510200001 - no packages. Everything is in one namespace. However ... - no exceptions. + Object oriented features. An extension to the language called Idol is included. This converts Idol into standard Icon. Idol itself looks (to me) like Smalltalk. + has records. Other types include :- sets, lists, strings, tables + unlimited line length when reading (Note. the newline is discarded) ! The only language that has enough facilities to be able to re-write some of my Lex/Yacc code. + stack trace on error. + C interface is good. Can extend the language by building `personal interpreter'. No dynamic linking. + extensive documentation 9 technical reports in all (PostScript and ASCII) - Unix interface is quite primitive. If you just want to use a command, you can use `callout', anything more complicated requires building a personal interpreter (not as difficult as it may sound) + extensive test suite + Usenet group exists specifically for it - comp.lang.icon - Unless you use Idol, all procedures are at the same level i.e. one scope. - regular expressions not supported. However, in many cases, you can use an Icon functions `find', `match', `many' and `upto' instead. + Can trace execution. * Pascal/C like syntax i.e. uses {} but has a few more keywords than C. + lots of example programs included. + can define your own iterators i.e. your own procedures for iterating through arbitrary structures. + co-expressions. Powerful tool, hard to explain briefly. See chapter 13 of the Icon Programming Language. - co-expressions haven't been implemented on Sun 4s (the type of machine I use) + has an `initial' section in procedures that is only ever executed once and allows you to initialise C like static variables with the result of other functions (unlike C). + arbitrary precision integers. As well as the excellent documentation included in the source, there are two books on Icon available (I skimmed through both of them) :- The Icon Programmming Language Ralph E. Griswold and Madge T. Griswold Prentice Hall 1983 The Implementation of the Icon Programmming Language Ralph E. Griswold and Madge T. Griswold Princeton University Press 1986 The second one is particularly useful if you are considering extending Icon yourself. Appendix E of this book also contains a list of projects that could be undertaken to extend and improve Icon. Here are some projects, that if implemented, would greatly improve the usefulness of Icon :- E.2.4 Add a regular expression data type. Modify the functions find and match to perate appropriately when their first argument is a regular expression. E.2.5 \ All of these suggest extending E.5.4 | the string scanning facilities to E.5.5 / cope with files and strings in a uniform way. E.12.1 Provide a way to load functions (written in C) at runtime Perl ---- Available :- USA :- ??, consult `archie' UK :- Imperial sources archive I received more responses about Perl than anything else, so I that most people already know a lot about the language. Here are some edited highlights from a message I received from Tom Christiansen :- First some good words from Tom :- > ... I shall now reveal my true colors as perl disciple > and perhaps not infrequent evangelist. Perl is without question the > greatest single program to appear to the UNIX community (although it runs > elsewhere too) in the last 10 years. It makes progamming fun again. It's > simple enough to get a quick start on, but rich enough for some very > complex tasks. > ... perl is a strict superset of sed and awk, so much so that s2p and > a2p translators exist for these utilities. You can do anything in > perl that you can do in the shell, although perl is not strictly > speaking a command interpreter. It's more of a programming language. and now some of the low points of Perl. [Note this is only a small part of a long post, that explained a lot of good things about Perl. As most people seem to use/like Perl, I thought I'd highlight some of the things wrong with the language, and what better place to get information than from the designer of the language. Note also that this is from a message dated June 90, so some of it may be out of date.] Larry Wall :- > The basic problem with Perl is that it's not about complex data structures. > Just as spreadsheet programs take a single data structure and try to > cram the whole world into it, so too Perl takes a few simple data structures > and drives them into the ground. This is both a strength and a weakness, > depending on the complexity and structure of the problem. > > The basic underlying fault of Perl is that there isn't a real good way > of building composite structures, or to make one variable refer to a piece > of another variable, without giving an operational definition of it. > > ... In a sense, the problem with Perl is not that it is too > complicated or hard to learn, but that perhaps it is not expressive > enough for the effort you put into learning it. Then again, maybe it > is. Your call. Some people are excited about Perl because, despite > its obvious faults, it lets them get creative. > > There are many things I'd do differently if I were designing Perl from > scratch. It would probably be a little more object oriented. Filehandles > and their associated magical variables would probably be abstract types > of some sort. I don't like the way the use of $`, $&, $' and $<digit> > impact the efficiency of the language. I'd probably consider some kind > of copy-on-write semantics like many versions of BASIC use. The subroutine > linkage is currently somewhat problematical in how efficiently it can > be implemented. And of course there are historical artifacts that wouldn't > be there. I think the above is a vary fair summary of the low points of the language. At one point it says `... perhaps it is not expressive enought for the effort you put into learning it. Then again maybe it is. Your call'. Well _my_ call is that it is not. Note I didn't actually pick up the source to this, just the manual. Consequently I haven't been able to check all the points listed below. + packages. ! Note in the examples that I've seen in comp.lang.perl, people don't seem to use the facility, instead they put everything directly in `main' (i.e. the top level scope) rather than in the local scope. + exceptions + provide/require * C Interface ?? I couldn't find this in the documentation I had. + No arbitrary restrictions + has a source level debugger + Well integrated with Unix (nearly all system calls are built in !) ! However, like Unix, only one name space seems to be used (see above) * C like syntax + source contains texinfo manual. You can always buy the (Camel) book for more information. - no records. Other types lists, strings, tables (associative arrays) * some types have distinct scopes. ! You prefix the name with `@', '$', '%' to indicate which type you want. This is one of the ugliest things I've ever seen. ! Uses lots of short strings to contain often used things e.g. `$_' is the current input, `$.' is current line number. I guess some people must like this, but I prefer names like `input' and `line-number' myself. + includes programs to convert existing awk, find and sed scripts into Perl. + Usenet news group - comp.lang.perl + Perl mode for Emacs. GAWK ---- Available :- USA :- prep.ai.mit.edu, probably other places as well. Consult `archie' UK :- Imperial sources archive. A few points about GNU awk as it seems to fix some of the problems with `old' awk. - no packages - no exceptions - no C interface - no records + allows user defined functions + can read and write to arbitrary files + much more informative error messages than the old awk.