[net.sources] Icon Overview Document

whm (04/01/83)

Due to popular request, here's the overview of Icon.  If you like
what you see, the UNIX implementation of Icon is available from
the Computer Science Department of The University of Arizona.
This implementation runs on VAX UNIX systems and split I/D PDP-11
V7 UNIX systems.  The system includes both an interpreter and a
compiler for Icon.  Complete source code for all system components
is included.  To get an Icon system, specify the following where
applicable:

	Your Name
	Address
	Organization
	Telephone
	Network Address
	Desired tape density (800 or 1600 bpi)

and send a tape at least 600 feet long (or check/money order for $15
payable to The University of Arizona) to:

	Icon Project
	Department of Computer Science
	The University of Arizona
	Tucson, AZ  85721
	
Nroff, troff, or vtroff this with the -ms macro package.
-----
.if \nv .ss 10
.hy 14
.de do
.if '\\$1'' .in +.5i
.if !'\\$1'' .in +\\$1
.vs 6p
\&.
\&.
\&.
.vs
.in
..
.ds t 	
.ds mi -
.ds fm '
.ds sl /
.ds mi -
.ds >= \v'-1p'>\v'1p'=
.ds >: \v'-1p'>\v'1p':
.ds >> \v'-1p'>>\v'1p'
.ds <= \v'-1p'<\v'1p'=
.ds <: \v'-1p'<\v'1p':
.ds << \v'-1p'<<\v'1p'
.ds <- \v'-1p'<\v'1p'-
.ds -> -\v'-1p'>\v'1p'
.ds <> \v'-1p'<\v'1p'-\v'-1p'>\v'1p'
.ds b \|
.ds el \fR.\^.\^.\fP
.ds sd \s8\v'.2m'\h'-0.4n'
.ds su \v'-.2m'\s0
.ds 0 \fIexpr\fP
.ds 1 \fIexpr\*(sd1\*(su\fP
.ds 2 \fIexpr\*(sd2\*(su\fP
.ds 3 \fIexpr\*(sd3\*(su\fP
.ds i \fIexpr\*(sdi\*(su\fP
.ds n \fIexpr\*(sdn\*(su\fP
.de Ds
.DS
.if ''\\$1' .ft B
.if \nv .ft B
.if !''\\$1' .ft \\$1
.tr -\\(mi'\\(fm/\\(sl
.if t .ss 9
.if \nv .ss 20
..
.de De
.if t .ss 4
.if \nv .ss 10
.DE
.ft R
.tr -\\*(mi'\\*(fm/\\*(sl
..
.if \nv .ss 10
.LP
.am Ds
.ft B
..
.tr *\(**
.ce 10
\fBAn Overview of the Icon Programming Language\u*\d\fR
.sp 1
Ralph E. Griswold
.sp 1
Department of Computer Science
The University of Arizona, Tucson, AZ  85721
.sp 1
February 27, 1983
.ce 0
.FS
\u*\dThis work was supported by the National Science Foundation under
Grant MCS81-01916.
.FE
.NH
Introduction
.PP
Icon is a high-level programming language with extensive facilities for
processing strings and lists. Icon has several novel features, including
expressions that may produce sequences of results, goal-directed
evaluation that automatically searches for a successful result, and
string scanning that allows operations on strings to be formulated at
a high conceptual level.
.PP
Icon resembles SNOBOL4 [1] in its emphasis on high-level string
processing and a design philosophy that allows ease of programming
and short, concise programs. Like SNOBOL4, storage allocation and
garbage collection are automatic in Icon, and there are few restrictions on the
sizes of objects. Strings, lists, and other structures are created
during program execution and their size does not need to be known when
a program is written.
Values are converted to expected types automatically; for example,
numeral strings read in as input can be used in numerical computations
without explicit conversion.
Whereas SNOBOL4 has a pattern-matching facility that is separate from
the rest of the language, string scanning is integrated with the
rest of the language facilities in Icon.
Unlike SNOBOL4,
Icon has an expression-based syntax with reserved words;
in appearance, Icon programs resemble those of several other conventional programming
languages.
.PP
Examples of the kinds of problems for which Icon is well suited are:
.in .5i
.IP \(bu
text analysis, editing, and reformatting
.IP \(bu
document preparation
.IP \(bu
symbolic mathematics
.IP \(bu
text generation
.IP \(bu
program parsing and translation
.IP \(bu
data laundry
.IP \(bu
graph manipulation
.in 0
.PP
Icon is implemented in C [2] and runs under UNIX\u\(dg\d
.FS
\u\(dg\dUNIX is a trademark of Bell Laboratories.
.FE
on the PDP-11, VAX-11, and Onyx C8002 computers. Implementations for
other computers and operating systems are presently underway.
An earlier version of Icon [3] is available on several large-scale
computers, including the CRAY-1, DEC-10, IBM 360/370, PRIME 450/550/650, DG
MV8000,
and CDC Cyber/6000.
.PP
A brief description of some of the representative features of Icon
is given in the following sections. This description is not rigorous
and does not include many features of Icon. See [4] for a
complete description.
.NH
Strings
.PP
Strings of characters may be arbitrarily long, limited only by the
architecture of the computer on which Icon is implemented. A string
may be specified literally by enclosing it in double quotation marks,
as in
.Ds
greeting := "Hello world"
.De
which assigns an 11-character string to \fBgreeting\fR, and
.Ds
address := ""
.De
which assigns the zero-length \fIempty\fR string to \fBaddress\fR.
The number of characters in a string \fBs\fR, its size, is given
by \fB*s\fR. For example, \fB*greeting\fR is 11 and \fB*address\fR
is 0.
.PP
Icon uses the ASCII character set, extended to 256 characters.
There are escape conventions, similar to those of C, for representing
characters that cannot be keyboarded.
.PP
Strings also can be read in and written out, as in
.Ds
line := read()
.De
and
.Ds
write(line)
.De
Strings can be constructed by concatenation, as in
.Ds
element := "(" || read() || ")"
.De
If the concatenation of a number of strings is to be written
out, the \fBwrite\fR function can be used with several arguments
to avoid actual concatenation:
.Ds
write("(",read(),")")
.De
.PP
Substrings can be formed by subscripting strings with range
specifications that indicate, by position, the desired range of
characters. For example,
.Ds
middle := line\^[10:20]
.De
assigns to \fBmiddle\fR the string of characters of \fBline\fR between
positions 10 and 20.
Similarly,
.Ds
write(line\^[2])\fR
.De
writes the second character of \fBline\fR.
The value 0 is used to refer to the position after the last character
of a string. Thus
.Ds
write(line\^[2:0])
.De
writes the substring of \fBline\fR from the second character to the end, thus
omitting the first character.
.PP
An assignment can be made to the substring of string-valued variable
to change its value. For example,
.Ds
line[2] := "..."
.De
replaces the second character of \fBline\fR by three dots. Note that
the size of \fBline\fR changes automatically.
.PP
There are many functions for analyzing strings. An example is
.Ds
find(s1,\*bs2)
.De
which produces the position in \fBs2\fR at which \fBs1\fR occurs as
a substring. For example, if the value of \fBgreeting\fR is as
given earlier,
.Ds
find("or",\*bgreeting)
.De
produces the value 8.
See Section 4.2 for the handling of situations in which \fBs1\fR does not
occur in \fBs2\fR, or in which it occurs at several different positions.
.NH
Character Sets
.PP
While strings are sequences of characters, \fIcsets\fR are sets of characters
in which membership rather than order is significant. Csets are
represented literally using single enclosing quotation marks, as
in
.Ds
vowels := 'aeiouAEIOU'
.De
Two useful built-in csets are \fB&lcase\fR and \fB&ucase\fR, which
consist of the lowercase and uppercase letters, respectively.
Set operations are provided for csets. For example,
.Ds
letters := &lcase ++ &ucase
.De
forms the cset union of the lowercase and uppercase letters and assigns the
resulting cset to \fBletters\fR, while
.Ds
consonants := letters -- 'aeiouAEIOU'
.De
forms the cset difference of the letters and the vowels and assigns the
resulting cset to \fBconsonants\fR.
.PP
Csets are useful in situations in which any one of a number of characters
is significant. An example is the string analysis function
.Ds
upto(c,\*bs)
.De
which produces the position \fBs\fR at which any character in \fBc\fR occurs.
For example,
.Ds
upto(vowels,\*bgreeting)
.De
produces 2. Another string analysis function that uses csets is
.Ds
many(c,\*bs)
.De
which produces the position in \fBs\fR after an initial substring consisting
only of characters that occur in \fBs\fR.
An example of the use of \fBmany\fR is in locating words. Suppose, for
example, that a word is defined to consist of a string of letters.
The expression
.Ds
write(line\^[1:many(letters,\*bline)])
.De
writes a word at the beginning of \fBline\fR. Note the use of the
position returned by a string analysis function to specify the
end of a
substring.
.NH
Expression Evaluation
.NH 2
Conditional Expressions
.PP
In Icon there are \fIconditional expressions\fR that may \fIsucceed\fR and
produce a result, or may \fIfail\fR and not produce any result. An example
is the comparison operation
.Ds
i > j
.De
which succeeds (and produces the value of \fBj\fR) provided that the value
of \fBi\fR is greater than the value of \fBj\fR, but fails otherwise.
.PP
The success or failure of conditional operations is used instead of
Boolean values to drive control structures in Icon. An example is
.Ds
if i > j then k := i else k := j
.De
which assigns the value of \fBi\fR to \fBk\fR if the value of \fBi\fR
is greater than the value of \fBj\fR, but assigns the value of \fBj\fR to
\fBk\fR \%otherwise.
.PP
The usefulness of the concepts of success and failure is illustrated by
\fBfind(s1,\*bs2)\fR, which fails
if \fBs1\fR does not occur as a substring of \fBs2\fR.
Thus
.Ds
if i := find("or",line) then write(i)
.De
writes the position at which \fBor\fR occurs in \fBline\fR, if it occurs,
but does not write a value if it does not occur.
.PP
Many expressions in Icon are conditional. An example is \fBread()\fR,
which produces the next line from the input file, but fails when the
end of the file is reached. The following expression is typical of
programming in Icon and illustrates the integration of conditional
expressions and conventional control structures:
.Ds
while line := read() do
   write(line)
.De
This expression copies the input file to the output file.
.PP
If an argument of a function fails, the function is not called,
and the function call fails as well. This ``inheritance'' of failure allows the
concise formulation of many programming tasks. Omitting the optional
\f3do\fR clause in \f3while-do\fR, the previous expression can be
rewritten as
.Ds
while write(read())
.De
.NH 2
Generators
.PP
In some situations, an expression may be capable of producing more than
one result. Consider
.Ds
sentence := "Store it in the neighboring harbor"
find("or",\*bsentence)
.De
Here \fBor\fR occurs in \fBsentence\fR at positions 3, 23, and 33. Most
programming languages treat this situation by selecting one of the
positions, such as the first, as the result of the expression. In Icon,
such an expression is a \fIgenerator\fR and is capable of producing
all three positions.
.PP
The results that a generator produces depend on context. In a situation
where only one result is needed, the first is produced, as in
.Ds
i := find("or",\*bsentence)
.De
which assigns the value 3 to \fBi\fR.
.PP
If the result produced by a generator does not lead to the success of
an enclosing expression, however, the generator is \fIresumed\fR
to produce another value. An example is
.Ds
if (i := find("or",\*bsentence)) > 5 then write(i)
.De
Here the first result produced by the generator, 3, is assigned to
\fBi\fR, but this value is not greater than 5 and the comparison
operation fails. At this point, the generator is resumed and
produces the second position, 23, which is greater than 5. The
comparison operation then succeeds and the value 23 is written.
Because of the inheritance of failure and the fact that comparison
operations return the value of their right argument, this expression
can be written in the following more compact form:
.Ds
write(5 < find("or",\*bsentence))
.De
.PP
Goal-directed evaluation is inherent in the expression evaluation
mechanism of Icon and can be used in arbitrarily complicated situations.
For example,
.Ds
find("or",\*bsentence1) = find("and",\*bsentence2)
.De
succeeds if \fBor\fR occurs in \fBsentence1\fR at the same position
as \fBand\fR occurs in \fBsentence2\fR.
.PP
A generator can be resumed repeatedly to produce all its results by
using the \f3every-do\fR control structure. An example is
.Ds
every i := find("or",\*bsentence)
   do write(i)
.De
which writes all the positions at which \fBor\fR occurs in \fBsentence\fR.
For the example above, these are 3, 23, and 33.
.PP
Generation is inherited like failure, and this expression can be written
more concisely by omitting the optional \f3do\fR clause:
.Ds
every write(find("or",\*bsentence))
.De
.PP
There are several built-in generators in Icon. One of the most frequently
used of these is
.Ds
i to j
.De
which generates the integers from \fBi\fR to \fBj\fR. This generator can be
combined with \f3every-do\fR to formulate the traditional \f3for\fR-style
control structure:
.Ds
every k := i to j do
   f(k)
.De
Note that this expression can be written more compactly as
.Ds
every f(i to j)
.De
.PP
There are a number of other control structures related to generation.
One is \fIalternation\fR,
.Ds
\*1 | \*2
.De
which generates the results of \*1 followed by the results of \*2.
Thus
.Ds
every write(find("or",\*bsentence1) | write("or",\*bsentence2)
.De
writes the positions of \fBor\fR in \fBsentence1\fR followed by
the positions of \fBor\fR in \fBsentence2\fR. Again, this sentence can
be written more compactly by using alternation in the second
argument of \fBfind\fR:
.Ds
every write("or",\*bsentence1 | sentence2)
.De
.PP
Another use of alternation is illustrated by
.Ds
(i | j | k) = (0 | 1)
.De
which succeeds if any of \fBi\fR, \fBj\fR, or \fBk\fR has the value 0 or 1.
.NH
String Scanning
.PP
The string analysis and synthesis operations described in
Sections 2 and 3 work best for relatively simple operations on strings.
For complicated operations, the bookkeeping involved in keeping track of
positions in strings becomes burdensome and error prone.
In such cases, Icon has a string scanning facility that is
analogous in many respects to pattern matching in SNOBOL4. In string
scanning, positions are managed automatically and attention is
focused on a current position in a string as it is examined by a sequence of
operations.
.PP
The string scanning operation has the form
.Ds
s ? \*0
.De
where \fBs\fR is the \fIsubject\fR string to be examined and \*0 is an expression that
performs the examination.
A position in the subject, which starts at 1, is the focus of examination.
.PP
\fIMatching functions\fR change this position.
One matching function, \fBmove(i)\fR, moves the position by \fBi\fR and
produces the substring of the subject between the previous and new
positions. If the position cannot be moved by the specified amount
(because the subject is not long enough), \fBmove(i)\fR fails. A
simple example is
.Ds
line ? while write(move(2))
.De
which writes successive two-character substrings of \fBline\fR, stopping
when there are no more characters.
.PP
Another matching function is \fBtab(i)\fR, which sets the position in the
subject to \fBi\fR and also returns the substring of the subject between
the previous and new positions.
For example,
.Ds
line ? if tab(10) then write(tab(0))
.De
first sets the position in the subject to 10 and then to the end of the subject, writing
\fBline\^[10:0]\fR.
Note that no value is written if the subject is not long enough.
.PP
String analysis functions such as \fBfind\fR
can be used in string scanning. In this context, the string that they
operate on is not specified and is taken to be the subject. For example,
.Ds
line ? while write(tab(find("or")))
   do move(2)
.De
writes all the substrings of \fBline\fR prior to occurrences of \fBor\fR.
Note that \fBfind\fR produces a position, which is then used by \fBtab\fR
to change the position and produce the desired substring. The \fBmove(2)\fR
skips the \fBor\fR that is found.
.PP
Another example of the use of string analysis functions in scanning is
.Ds
line ? while tab(upto(letters)) do
   write(tab(many(letters)))
.De
which writes all the words in \fBline\fR.
.PP
As illustrated in the examples above, any expression may occur in
the scanning expression. Unlike SNOBOL4, in which the operations that
are allowed in pattern matching are limited and idiosyncratic, string
scanning is completely integrated with the rest of the operation
repertoire of Icon.
.NH
Structures
.NH 2
Lists
.PP
While strings are sequences of characters, lists in Icon are sequences
of values of arbitrary types. Lists are created by enclosing the lists
of values in brackets. An example is
.Ds
car1 := ["buick",\*b"skylark",\*b1978,\*b2450]
.De
in which the list \fBcar1\fR has four values, two of which are strings
and two of which are integers. Note that the values in a list need not
all be of the same type. In fact, any kind of value can occur in a list
\(em even another list, as in
.Ds
inventory := [car1,\*bcar2,\*bcar3,\*bcar4]
.De
.PP
Lists also can be created by
.Ds
a := list(i,\*bx)
.De
which creates a list of \fBi\fR values, each of which has the value
\fBx\fR.
.PP
The values in a list can be referenced by position much like the
characters in a string. Thus
.Ds
car1\^[4] := 2400
.De
changes the last value in \fBcar1\fR to 2400.
A reference that is out of the range of the list fails. For example,
.Ds
write(car1\^[5])
.De
fails.
.PP
The values in a list \fBa\fR are generated by \fB!a\fR. Thus
.Ds
every write(!a)
.De
writes all the values in \fBa\fR.
.PP
Lists can be manipulated like stacks and queues. The function
\fBpush(a,\*bx)\fR
adds the value of \fBx\fR to the left end of the list \fBa\fR,
automatically increasing the size of \fBa\fR by one. Similarly,
\fBpop(a)\fR removes the leftmost value from \fBa\fR, automatically
decreasing the size of \fBa\fR by one, and produces the removed value.
.PP
A list value in Icon is a pointer (reference) to a structure. Assignment
of a structure
in Icon does not copy the structure itself but only the pointer to it. Thus the
result of
.Ds
demo := car1
.De
causes \fBdemo\fR and \fBcar1\fR to reference the same list. Graphs with
loops can be constructed in this way. For example,
.Ds
node1 := ["a"]
node2 := [node1,\*b"b"]
push(node1,\*bnode2)
.De
constructs a structure that can be pictured as follows:
.ne 2i
.nf
.in 1i
.ft B
.sp 2
.cs B 20
node1  .->a--.
       |     |
       |     |
node2  '--b<-'
.sp 2
.cs B
.in 0
.fi
.NH 2
Tables
.PP
Icon has a table data type similar to that of SNOBOL4. Tables essentially
are sets of pairs of values, an \fIentry value\fR and a corresponding
\fIassigned value\fR. The entry and assigned values may be of any type,
and the assigned value for any entry value can be looked up automatically.
Thus tables provide a form of associative access in contrast with the
positional access to values in lists.
.PP
A table is created by an expression such as
.Ds
symbols := table(x)
.De
which assigns to \fBsymbols\fR a table with the default assigned value
\fBx\fR.
Subsequently, \fBsymbols\fR can be referenced by any entry value, such as
.Ds
symbols\^["there"] := 1
.De
which assigns the value 1 to the \fBthere\fRth entry in symbols.
.PP
Tables grow automatically as new entry values are added.
For example, the following program segment produces a
table containing a
count of the
words that appear in the input file:
.Ds
words := table(0)
while line := read() do
   line ? tab(upto(letters)) do
      words\^[tab(many(letters))] +:= 1
.De
Here the default assigned value for each word is 0, as given
in \fBtable(0)\fR, and \fB+:=\fR is an augmented assignment operation that
increments the assigned values by one.
There are augmented assignment operations for all binary operators.
.PP
Tables can
be converted to lists, so that their entry and assigned values can be
accessed by position.
This is done by \fBsort(t)\fR, which produces a list of two-element
lists from \fBt\fR, where each two-element list consists of an
entry value and its corresponding assigned value. For example,
.Ds
wordlist := sort(words)
every pair := !wordlist do
   write(pair\^[1],\*b" : ",\*bpair\^[2])
.De
writes the words and their counts from \fBwords\fR.
.NH
Procedures
.PP
An Icon program consists of a sequence of procedure declarations.
An example of a procedure declaration is
.Ds
procedure max(i,\*bj)
   if i > j then return i else return j
end
.De
where the name of the procedure is \fBmax\fR and its formal parameters
are \fBi\fR and \fBj\fR. The \f3return\fR expressions return the value of
\fBi\fR or \fBj\fR, whichever is larger.
.PP
Procedures are called like built-in functions. Thus
.Ds
k := max(*s1,\*b*s2)
.De
assigns to \fBk\fR the size of the longer of the strings \fBs1\fR and
\fBs2\fR.
.PP
A procedure also may suspend instead of returning. In this case, a
result is produced as in the case of a return, but the procedure
can be resumed to produce other results. An example is
the following procedure that generates the words in the input file.
.Ds
procedure genword()
   local line, letters, words
   letters := &lcase ++ &ucase
   while line := read() do
      line ? while tab(upto(letters)) do {
         word := tab(many(letters))
         suspend word
         }
end
.De
The braces enclose a compound expression.
.PP
Such a generator is used in the same way that a built-in generator is
used. For example
.Ds
every word := genword() do
   if find("or",\*bword) then write word
.De
writes only those words that contain the substring \fBor\fR.
.NH
An Example
.PP
The following program sorts graphs topologically.
.Ds
.ta 3.8i
procedure main()
   local sorted, nodes, arcs, roots
   while nodes := read() do {	# get next node list
      arcs := read()	# get arc list
      sorted := ""	# sorted nodes
	# nodes without predecessors
      while *(roots := nodes -- snodes(arcs)) > 0 do {
         sorted ||:= roots	# add to sorted nodes
         nodes --:= roots	# delete these nodes
         arcs := delarcs(arcs,\*broots)	# delete their arcs
         }
      if *arcs = 0 then write(sorted)	# successfully sorted
      else write("graph has cycle")	# cycle if node remains
   }
end
.De
.Ds
procedure snodes(arcs)
   local nodes
   nodes := ""
   arcs ? while move(1) do {	# predecessor
      move(2)	# skip "->"
      nodes ||:= move(1)	# successor
      move(1)	# skip ";"
      }
   return nodes
end
.De
.Ds
procedure delarcs(arcs,\*broots)
   local newarcs, node
   newarcs := ""
   arcs ? while node := move(1) do {	# get predecessor node
      if many(roots,\*bnode) then move(4)	# delete arc from root node
      else newarcs ||:= node || move(4)	# else keep arc
      }
   return newarcs
end
.De
Graph nodes are represented
by single characters with a list of the nodes on one input line followed by
a list of arcs. For example, the graph
.nf
.ft B
.cs B 20
.in 1i
.ne 2i
.sp 2
        .---------------.
        |               |
        |               \o'v|'
        a------>b------>c
        \o'^|'       |       \o'^|'
        |       |       |
        |       \o'|v'       |
        d------>e-------'
.sp 1
.cs B
.fi
.in 0
.sp 1
.ft R
is given as
.Ds
abcde
a\*(->b;a\*(->c;b\*(->c;b\*(->e;d\*(->a;d\*(->e;e\*(->c;
.De
for which the output is
.Ds
dabec
.De
.PP
The nodes are represented by csets and automatic type conversion
is used to convert strings to csets and vice versa.
Note the use of augmented assignment operations for concatenation and in the computation of
cset differences.
.SH
Acknowledgement
.PP
Icon was designed by the the author in collaboration with Dave Hanson,
Tim Korb, Cary Coutant, and Steve Wampler. The current implementation is
largely the work of Cary Coutant and Steve Wampler with recent
contributions by Bill Mitchell.
Dave Hanson and Bill Mitchell also made several helpful suggestions on the presentation
of material in this paper.
.SH
References
.LP
.IP 1.
Griswold, Ralph E., Poage, James F., and Polonsky, Ivan P.
\fIThe SNOBOL4 Programming Language\fR, second edition.
Prentice-Hall, Inc., Englewood Cliffs, New Jersey. 1971.
.IP 2.
Kernighan, Brian W. and Ritchie, Dennis M. \fIThe C
Programming Language\fR. Prentice-Hall, Inc.,
Englewood Cliffs, New Jersey. 1978.
.IP 3.
Griswold, Ralph E. \fIDifferences Between Versions 2 and 5 of Icon\fR.
Technical Report TR 83-5,
Department of Computer Science, The University of Arizona. March 1983.
.IP 4.
Griswold, Ralph E. and Griswold, Madge T. \fIThe Icon Programming
Language\fR. Prentice-Hall, Inc., Englewood Cliffs, New Jersey.
1983.