[comp.lang.perl] Survey Results : Perl vs Icon vs ....

bevan@cs.man.ac.uk (Stephen J Bevan) (03/30/91)
[Note I've crossposted to all the groups I send my original message
to.  This was at the request of some of the respondents (sp?)]

Here are the results of my question regarding which language to use for
writing programs to extract information from files, generate reports
... etc.  I initially suggested languages like Perl, Icon, Python ...

As part of my original message I said :-

> Rather than FTP all of them and wade through the documentation, I was
> wondering if anybody has experiences with them that they'd like to
> share?

I would like to thank the following people for replying :-

Dan Bernstein - brnstnd@kramden.acf.nyu.edu 
Tom Christiansen - tchrist@convex.COM
Chris Eich - chrise@hpnmdla.hp.com
Richard L. Goerwitz - goer@midway.uchicago.edu
Clinton Jeffery - cjeffery@cs.arizona.edu
Guido van Rossum - guido@cwi.nl
Randal L. Schwartz - merlyn@iWarp.intel.com
Peter da Silva - peter@ficc.ferranti.com
Alan Thew - QQ11@LIVERPOOL.AC.UK
Edward Vielmetti - emv@ox.com
?? - russell@ccu1.aukuni.ac.nz

Most of the replies were about Perl, so I didn't learn much about the
other languages I suggested (other than very general things).
Even though I was originally hoping not to have to ftp any stuff, I
ended up getting the source to Python, GAWK, TCL, Icon and the texinfo
manual for Perl.

To save you going through my list of good and bad points of the
languages I looked at, here is the summary of what I see the languages
as :-

TCL - an embedded language i.e. an extension language for large
      programs (IMHO only if you haven't got, or don't
      like, Scheme based ones like ELK).
Perl - the de facto UNIX scripting language.  You name it, and you
       can probably cobble a solution together in Perl.
       Beyond the fact that a lot of people use it, I can see nothing
       to recommend it.  It's a bit like C in that respect.
Python - Good prototyping language with a consistent design.  It might
         not have all the low level UNIX stuff built in, but by using 
         modules, its easy to add the necessary things in an ordered way.
Icon - the `nearly' language.  Well designed language, that never seemed
       to make it into general use.  Seems to cover the ground all the
       way from AWK type applications to Prolog/Lisp ones.
       If I wasn't already happy with Scheme, I'd use this for more
       general programming.
       I would recommend people at least look at this language.
GAWK - simple scripting language.  Definitely better than `old' awk.
       I would only use it if the job were really simple or if
       something like Python or TCL were not available.

Note I wouldn't expect anybody to make a choice on what I say.  I
suggest you get the source/manuals yourself and have a good long look
at the language/implementation before you decide.

For the types of things _I_ want to do, it would be a tie between Icon
and Python.  Having said that, given that I'd have to extend both to
cover the sort of things I want to do, I'll probably use Scheme
instead (ELK in particular).  The reason I didn't just use Scheme in
the first place is that I was hoping one of the languages would have
all the facilities I want without me having to extend them myself.

Before, the summary of the languages themselves, I thought I'd try and
list some of the things I was looking for.  (Actually, I showed an
earlier version of this summary to somebody and they didn't understand
some of the terms I was using, so this is my attempt at an
explanation).  Note that most of the things are to do with structuring
the code and alike.  This is not the sort of thing you usually worry
about when writing small scripts, but I plan to convert and write a
number of tools, some of which are around the 1000 LOC mark.  For
example, I'd like to convert a particular lex/yacc/C program I have
into the chosen language.

You can skip ahead to the actual summary by searching for SUMMARY.
(Well I can do this in GNUS, I don't know about other news readers
like rn)

Packages/Modules
----------------
These are a mechanism for splitting up the name space so that function
name clashes are reduced.  Most systems work by declaring a package
and then all functions listed from then on are members of that
package.  You then access the functions using the package prefix, or
import the whole package so that you don't have to use the prefix.
The following is an example in CommonLisp :-

;;; foo.lsp                     ;;; bar.lsp
(in-package 'foo)		(in-package 'bar)
(export '(bob))                 (export '(bob))
(defun bob (a b) ...)		(defun bob (x) ...)

;;; main.lsp
(foo:bob 10 20)
(bar:bob 3)

Packages are not perfect, but they do help.  You can get the same
effect by declaring implicit package prefixes :-

;;; foo.lsp			;; bar.lsp
(defun foo-bob (a b) ...)	(defun bar-bob (x) ...)

;; main.lsp
(foo-bob 10 30)
(bar-bob 4)

The advantage of packages over this is that you don't have to use a
package prefix in the package itself when you want to call a function.
This can be a saving if you have lots of functions in a package, and
only a few are exported.

Exception Handling
------------------
This is useful for dealing with error that shouldn't happen. e.g.
reaching the end of the file when you were looking for some valid
data.  For example, in CommonLisp :-

(defun foo (x y)
  ...
  (if (catch 'some-unexpexted-error (bar x y) nil)
    (handle-the-exception ...)
  
(define bar (a b)
  ...
  (if (something-wrong) (throw 'some-unexpected-error t))
  ...)

Here the function `foo' calls `bar', and if any error occurs whilst
processing, it is handled by the exception handler.  (The example is a
bit primitive as I'm trying to save space).

The advantage of this is that you don't have to explicitly pass back
all sorts of error codes from your functions to handle unusual errors.
It also usually means you won't have so many nested `if's to handle
the special cases, therefore, making your code clearer.

Records/Tuples/Aggregates/Structs
---------------------------------
It's handy to be to define objects that contain certain number of
elements.  You can then pass these objects around and access the
individual bits.  For example in CommonLisp :-

(defstruct point x y)

This declares `point' as a type containing two items called `x' and
`y'.  Some languages don't name the items, they rely on position
instead.  I see these as equivalent (assuming you have some sort of
pattern matching)

Provide/Require
---------------
This is a primitive facility for declaring that one package depends on
another one.  For example in CommonLisp :-

;;; foo.lsp
(defun bob (a b) ...)
(provide 'foo)

;;; main.lsp
(require 'foo)
(bob 10 3)

The above declares that the file `foo' provides the function `bob' and
that the file `main' requires `foo' to be loaded for it to work.
So when you load in `main' and `foo' hasn't been loaded, it is
automatically loaded by the system.

C Interface
-----------
How easy is it to call C from the language.
Is there a dynamic loading facility i.e. do I have to recompile the
program to use some arbitrary C code, or can it load in a .o file at
runtime?

Arbitrary Restrictions
----------------------
This really applies to the implementations rather than the languages.
However, as there is only one implementation for most of the languages
I'm looking at, they tend to be synonymous

If there is one thing I hate about an [implementation of] a languages
its arbitrary restrictions.  For example, `the length of the input
line must not exceed 80 characters', or "strings must be less than 255
characters long".  I can except some initial restrictions if :-

1) they are documented.
2) they will be removed in future versions.

Note. I realise that some restrictions are not arbitrary, or at least
not under the control of the language implementor e.g. the number of
open files under UNIX.

SUMMARY
-------
If you want to know more about the languages, there follows a brief
description of the languages, how to get an implementation and some
good and bad points as I see them.  Each point is preceded by a
character indicating the type of point :-

    +  good point
    -  bad point
    *  just a point to note
    !  subjective point

Other than the `*' items, I guess it is all subjective, however, I've
tried to put things that are generally good/bad in `+'/`-' and limit
really subjective statements to `!'.

                   TCL - version 4.0 patch level 1
                   -------------------------------

TCL (Tool Command Language) was developed by John Ousterhout at Berkeley.
It started out as a small language that could be embedded in
applications.  It has now been extended by some people at hackercorp
into more of a general purpose shell type programming language.
It is described by Peter Da Silva (one of the people who extended it)
as :-

> TCL is like a text-oriented Lisp, but lets you write algebraic
> expressions for simplicity and to avoid scaring people away.

The language itself for some reason reminds me of csh even though I
can only point to two things (the use of `set' and `$') which a
definitely like csh.

Unless you have other ideas about what an extension language should
look like (e.g. IMO it should be Scheme), then I'd definitely
recommend this.  It's small, and integrates easily with other C
programs (you can even have multiple TCL interpreters in an
application!)

Version 5.0 is available by anonymous ftp from sprite.berkeley.edu as
tk.tar.Z (its part of an X toolkit called Tk).  Note, although it has
a higher number than the one above, does not include the extensions
mentioned above.  These will apparently be integrated soon.

Version 4.0 pl1 is available by anonymous ftp from
media-lab.ai.mit.edu (sorry can't remember the exact path)

+  exceptions.
+  packages, called libraries
   However there is only one name-space.  The libraries are used as a
   way of storing single versions of code rather than as a solution to
   the name space pollution problem.
+  provide/require
+  C interface is excellent.  You can easily go TCL->C and C->TCL.
-  No dynamic loading ability that I'm aware of.
-  Arbitrary line length limit on `gets' and `scan'. i.e. the commands
   that read lines from files/strings.  I would guess this will go
   away in the next version.
-  No records.  The main data types are strings/lists/associative arrays
+  extensive test suite included.
!  doesn't look to have been tested on many systems.  The above
   version actually failed to link on a SPARCstation running SunOS 4.1
   as the source refers to `strerror'.  This has apparently been fixed
   in patch level 2.
+  lots of example code included in distribution.
+  extensive documentation (all in nroff)
+  Can trace execution.
!  To make arguments evaluate, you must enclose them in {} or []
   This shouldn't be a problem, except that being used to Lisp like
   languages I expect to quote constants.
!  The extensions though useful, are not seamless. e.g. some string
   facilities are in the core language and some in the extensions.
   This might happen when the hackercorp extensions are officially
   merged with the Berkeley core language and released by Berkeley.
+  As part of the extensions, you get tclsh.  This is a shell which you
   can type command directly into.
+  scan contexts.  This is sort of regular expressions on files rather
   than strings.

                        Python - version 0.9.1
                        ----------------------

Available by anonymous ftp from wuarchive.wustl.edu as
pub/python0.9.1.tar.Z or for Europeans via the info server at
hp4nl.nluug.nl

I couldn't think of a good way to describe this, so I'm blatantly
copying the following from the Python tutorial :-

    Python is a simple, yet powerful programming language that bridges
    the gap between C and shell programming, and is thus ideally
    suited for rapid prototyping.  Its syntax is put together from
    constructs borrowed from a variety of other languages; most
    prominent are influences from ABC, C, Modula-3 and Icon

So far so good, here's some more from the tutorial :-

    Because of its more general data types Python is applicable to a
    much larger problem domain that Awk or even Perl, yet most simple
    things are at least as easy in Python as in those languages.

i.e. Python seems to be designed for larger tasks than you would
undertake using the shell/awk/perl.

+  packages.
+  exceptions (based on Modula 2/3 modules)
+  records (actually tuples.  I'm not sure they do everything I want
   as the documentation is a bit vague in this area)
   Other main types are lists, sets, tables (associative arrays)
+  C interface is good.  No dynamic linking that I am aware of.
-  Arbitrary Restrictions
   line length limit on readline.
   This has been fixed and I would guess will appear in the next release.
+  lots of example python programs included.
   There is even a TCL (version 2ish) interpreter!
+  Object oriented features.
   Based on Modula 3 i.e. classes with methods, all of which are
   virtual (to use a C++ term).
*  any un caught errors produce a stack trace.
+  disassembler included
+  can inspect stack frames via traceback module
-  no single step or breakpoint facility
   (maybe in the next release)
+  functions can return multiple values.
*  The default output command `print' inserts a space between each
   field output.
!  I don't like the above, or rather I would like the option of not
   having it done.
*  Documentation includes tutorial and library reference as TeX files.
   Both are incomplete, but there is enough in them to be able to
   write Python code.  The reference manual is not yet finished, and
   is not currently distributed with the source.
+  Python mode for Emacs.
   (Its primitive, but its a start)

                           Icon - version 8
                           ----------------

To quote from one of the Icon books :-

    Icon is a high-level, general purpose programming language that
    contains many features for processing nonnumeric data,
    particularly for textual material consisting of string of
    characters.

Available :-
In USA :- ??, consult `archie'.
In UK :-  I picked up a copy form the sources archive at Imperial College.
          The JANET address is 00000510200001

-  no packages.  Everything is in one namespace.  However ...
-  no exceptions.
+  Object oriented features.
   An extension to the language called Idol is included.
   This converts Idol into standard Icon.
   Idol itself looks (to me) like Smalltalk.
+  has records.  Other types include :- sets, lists, strings, tables
+  unlimited line length when reading
   (Note. the newline is discarded)
!  The only language that has enough facilities to be able to re-write
   some of my Lex/Yacc code.
+  stack trace on error.
+  C interface is good.  Can extend the language by building `personal
   interpreter'.  No dynamic linking.
+  extensive documentation
   9 technical reports in all (PostScript and ASCII)
-  Unix interface is quite primitive.
   If you just want to use a command, you can use `callout', anything
   more complicated requires building a personal interpreter (not as
   difficult as it may sound)
+  extensive test suite
+  Usenet group exists specifically for it - comp.lang.icon
-  Unless you use Idol, all procedures are at the same level
   i.e. one scope.
-  regular expressions not supported.
   However, in many cases, you can use an Icon functions `find',
   `match', `many' and `upto' instead.
+  Can trace execution.
*  Pascal/C like syntax
   i.e. uses {} but has a few more keywords than C.
+  lots of example programs included.
+  can define your own iterators
   i.e. your own procedures for iterating through arbitrary structures.
+  co-expressions.  Powerful tool, hard to explain briefly.  See
   chapter 13 of the Icon Programming Language.
-  co-expressions haven't been implemented on Sun 4s (the type of
   machine I use)
+  has an `initial' section in procedures that is only ever executed
   once and allows you to initialise C like static variables with the
   result of other functions (unlike C).
+  arbitrary precision integers.

As well as the excellent documentation included in the source, there
are two books on Icon available (I skimmed through both of them) :-

    The Icon Programmming Language
    Ralph E. Griswold and Madge T. Griswold
    Prentice Hall 1983

    The Implementation of the Icon Programmming Language
    Ralph E. Griswold and Madge T. Griswold
    Princeton University Press 1986

The second one is particularly useful if you are considering
extending Icon yourself.  Appendix E of this book also contains a list
of projects that could be undertaken to extend and improve Icon.

Here are some projects, that if implemented, would greatly improve the
usefulness of Icon :-

E.2.4 Add a regular expression data type.  Modify the functions find
      and match to perate appropriately when their first argument is a
      regular expression.

E.2.5 \  All of these suggest extending
E.5.4  | the string scanning facilities to
E.5.5 /  cope with files and strings in a uniform way.

E.12.1 Provide a way to load functions (written in C) at runtime


                                 Perl
                                 ----
Available :-
USA :- ??, consult `archie'
UK :- Imperial sources archive

I received more responses about Perl than anything else, so I that
most people already know a lot about the language.

Here are some edited highlights from a message I received from Tom
Christiansen :-

First some good words from Tom :-

> ... I shall now reveal my true colors as perl disciple
> and perhaps not infrequent evangelist.  Perl is without question the
> greatest single program to appear to the UNIX community (although it runs
> elsewhere too) in the last 10 years.  It makes progamming fun again.  It's
> simple enough to get a quick start on, but rich enough for some very
> complex tasks.

> ... perl is a strict superset of sed and awk, so much so that s2p and
> a2p translators exist for these utilities.  You can do anything in
> perl that you can do in the shell, although perl is not strictly
> speaking a command interpreter.  It's more of a programming language.

and now some of the low points of Perl.  [Note this is only a small
part of a long post, that explained a lot of good things about Perl.
As most people seem to use/like Perl, I thought I'd highlight some of
the things wrong with the language, and what better place to get
information than from the designer of the language.  Note also that
this is from a message dated June 90, so some of it may be out of date.]

Larry Wall :-

> The basic problem with Perl is that it's not about complex data structures.
> Just as spreadsheet programs take a single data structure and try to
> cram the whole world into it, so too Perl takes a few simple data structures
> and drives them into the ground.  This is both a strength and a weakness,
> depending on the complexity and structure of the problem.
> 
> The basic underlying fault of Perl is that there isn't a real good way
> of building composite structures, or to make one variable refer to a piece
> of another variable, without giving an operational definition of it.
> 
> ...  In a sense, the problem with Perl is not that it is too
> complicated or hard to learn, but that perhaps it is not expressive
> enough for the effort you put into learning it.  Then again, maybe it
> is.  Your call.  Some people are excited about Perl because, despite
> its obvious faults, it lets them get creative.
> 
> There are many things I'd do differently if I were designing Perl from
> scratch.  It would probably be a little more object oriented.  Filehandles
> and their associated magical variables would probably be abstract types
> of some sort.  I don't like the way the use of $`, $&, $' and $<digit>
> impact the efficiency of the language.  I'd probably consider some kind
> of copy-on-write semantics like many versions of BASIC use.  The subroutine
> linkage is currently somewhat problematical in how efficiently it can
> be implemented.  And of course there are historical artifacts that wouldn't
> be there.

I think the above is a vary fair summary of the low points of the
language.  At one point it says `... perhaps it is not expressive
enought for the effort you put into learning it.  Then again maybe it
is.  Your call'.  Well _my_ call is that it is not.

Note I didn't actually pick up the source to this, just the manual.
Consequently I haven't been able to check all the points listed below.

+  packages.
!  Note in the examples that I've seen in comp.lang.perl, people don't
   seem to use the facility, instead they put everything directly in
   `main' (i.e. the top level scope) rather than in the local scope.
+  exceptions
+  provide/require
*  C Interface ??  I couldn't find this in the documentation I had.
+  No arbitrary restrictions
+  has a source level debugger
+  Well integrated with Unix (nearly all system calls are built in !)
!  However, like Unix, only one name space seems to be used (see above)
*  C like syntax
+  source contains texinfo manual.
   You can always buy the (Camel) book for more information.
-  no records.  Other types lists, strings, tables (associative arrays)
*  some types have distinct scopes.
!  You prefix the name with `@', '$', '%' to indicate which type
   you want.  This is one of the ugliest things I've ever seen.
!  Uses lots of short strings to contain often used things e.g. `$_'
   is the current input, `$.' is current line number.  I guess some
   people must like this, but I prefer names like `input' and
   `line-number' myself.
+  includes programs to convert existing awk, find and sed scripts into
   Perl.
+  Usenet news group - comp.lang.perl
+  Perl mode for Emacs.

				 GAWK
				 ----
Available :- 
USA :- prep.ai.mit.edu, probably other places as well.  Consult `archie'
UK :- Imperial sources archive.

A few points about GNU awk as it seems to fix some of the problems
with `old' awk.

-  no packages
-  no exceptions
-  no C interface 
-  no records
+  allows user defined functions
+  can read and write to arbitrary files
+  much more informative error messages than the old awk.