[comp.lang.misc] ap

byron@archone.tamu.edu (Byron Rakitzis) (06/04/91)

After compiling perl on my system and being nauseated by the syntax of
the language, I've decided to try to come up with my own alternative.
I'm going to call it ap, or anti-perl.

Right now I'm thinking that ap will be a super-awk that is less
confusing for a C programmer to learn. I'm not sure if I want the
implicit looping over stdin (though that's kind of nice) and I
definitely don't want the

	pattern { action }

syntax that awk has. It will have an integer and a string datatype, and
you should be able to build arrays out of those objects (associative
arrays too). Functions would be a nice thing to have, but it must
always be easy to toss off a quick one-line ap script, i.e., in the
most trivial case I would like something like

	ypcat hosts | ap 'print $1'

or something similar to work just right. I hate having to place braces
around that simple statement as one has to do in awk.

Most importantly, ap will be driven by an easy-to-understand grammar
with C-like syntax. There may be 2 or 3 ways to perform a particular
task, but there will not be 10,000 as there are in perl.

The main deficiency of awk that I see is its inability to interface
well with Unix. Up until recently, awk did not even have ARGC and ARGV,
not to mention things like file redirection. This is where perl has
taken a step in the "right" direction. Of course, it could be argued,
why put symlink(2) into ap when you have ln(1)? Well, this is why perl
was written:  Unix today just cannot provide any performance with shell
scripts; for better or for worse this has to be coded into the command
interpreter.

Ideas are welcome. I really want to write this thing; perl is a
disgrace to the Unix community.
--
Byron Rakitzis
byron@archone.tamu.edu

tml@extro.ucc.su.OZ.AU (Tim Long) (06/12/91)

I read Byron's comments on perl and awk with some sympathy.
I have had thoughts along similar, although not identical, lines for
some time.  By coincidence, I just designed and implemented a language to
address similar issues; which I would be grateful to here peoples
opinion on.  But first I'll just mention my own motivations:

1) To have a freely available general purpose interpretive language on
UNIX systems.  (As opposed to the many more special purpose ones such
as awk and the shell).  This can be re-phrased as: To have a UNIX
language like DOS has BASIC.

2) To have a freely available language suitable for embedding in other
programs and systems.

3) To allow programming on UNIX systems which do not have development
systems (which are becoming very common).

So I guess the design spec was to make a freely available general
purpose language suitable both for system supported, and embedded use.

By embeded use I mean both within stand-alone devices (like PostScript)
and as an adjunct to applications.  The source is arranged to be ammenable
to this.

Although I have been brooding on it for some time I have only actually
done it in the last month.  I'm reasonably happy with the result at this
stage but welcome comment.  There is a preliminary manual entry which
describes the language, but it's just a manual entry.  I'll try to give
some more background here.

The language, which I am calling ICI for the time being, has dynamic
typing and object management, with all the flavour (flow control constructs,
operators and syntax) of C.  You can write very C like code, if you wish
(pointers work), but you can take advantage of the more flexible data
handling to make things a lot easier.

I have tried to keep the design carefully divided into the language
and its fundamental functions and then other groups of functions which
relate to the operating environment.  Naturally the UNIX shell level
version has almost all of these included.

I could try to convey the nature of the language here, but it
is probably better just to skim the manual entry.  So I'll include
it here and continue the general discussion after that.  Its
about 14 pages, but you can start skipping after you get to the
standard functions (it finishes after the next line of minuses)...
----------------------------------------------------------------------



     ICI(1)		                    			ICI(1)



     NAME
	  ici  -  General purpose interpretive programming language

     SYNOPSIS
	  ici [	file ] [ -f file ] [ -i	prog ] [ -digit	] [ -l lib ] [ args... ]

     DESCRIPTION
	  Ici parses ICI program modules as indicated by its
	  arguments.  They may or may not cause	code to	execute	as
	  they are parsed.  But	after the modules have been read, if
	  main is defined as an	external function it will be called
	  with the otherwise unused arguments (as an integer count and
	  a pointer to the first element of an array of	strings).

	  The options are:

	  file	    If the first argument does not start with a	hyphen
		    it is taken	to be a	program	module as if specified
		    with the -f	flag.  This may	be used	to allows ICI
		    programs to	execute	directly with the #! facility.

	  -f file   The	file is	parsed as an ICI module.

	  -i prog   The	prog argument is parsed	directly as an ICI
		    module.

	  -digit    An ICI module is read from the file	descriptor
		    digit.

	  -l lib    An ICI module is read from $ICILIB/liblib.ici.  If
		    ICILIB is not defined as an	environment variables,
		    /usr/ici will be used.

	  other	    Any	argument not listed above is gathered into the
		    arguments which will be available to the program.

	  --	    All	further	arguments are gathered into the
		    arguments which will be available to the program.

	  Note that argument parsing is two pass, all the "unused"
	  arguments are	determined and assigned	to argc	and argv
	  before the first module is parsed.

	  If an	error occurs which is not dealt	with by	the program
	  itself, a suitable error message will	be printed and ici
	  will exit.

	  The remainder	of this	manual entry is	a brief	description of
	  the language.

	OVERVIEW
	  ICI has dynamic typing and flexible data types with the flow
	  control constructs and operators of C.  It is	designed to
	  allow	all types of programs to be written without the
	  programmer having to take responsibility for memory
	  management and error handling.  There	are standard functions
	  to provided the sort of support provided by the standard I/O
	  and the C libraries, as well as additional types and
	  functions to support common needs such as simple data	bases
	  and character	based screen handling.

	  A programmer familiar	with C should be able to write ICI
	  programs after reading this document.

	STATEMENTS
	  An ICI source	module consists	of a sequence of statements.
	  Statements may be any	of the following:

	       expression ;
	       compound-statement
	       if ( expression ) statement
	       if ( expression ) statement else	statement
	       while ( expression ) statement
	       do statement while ( expression ) ;
	       for ( exp(opt) ;	exp(opt) ; exp(opt) ) statement
	       switch (	expression ) compound-statement
	       case constant-expression	:
	       default :
	       break expression(opt) ;
	       continue	expression(opt)	;
	       return expression(opt) ;
	       ;
	       storage-class ident function-body
	       storage-class decl-list ;

	  In contrast to C, all	statement forms	are allowed at all
	  scopes.  But in order	to distinguish declarations and
	  function definitions from ordinary expressions, the storage
	  class	(extern, static	or auto) is compulsory.

	  There	is no goto statement, but break	and continue
	  statements may have an optional expression signifying	how
	  many levels to effect.  (Not in this version.)

	  The term constant-expression above refers to an expression
	  that is evaluated exactly once, at parse time.  In other
	  respects it is unrestricted, it may call functions and have
	  side-effects.

	  Switch statements must be followed by	a compound statement,
	  not just any statement as in C.  Furthermore,	each case-
	  label	and the	default	must label statements at the top level
	  of this compound statement.

	OBJECTS	AND LVALUES
	  In ICI objects are dissociated from the storage locations
	  (variables, for instance) which refer	to them.  That is, any
	  place	which stores a value actually stores a reference to
	  the value.  The value	itself,	whether	it is a	simple integer
	  or a large structure,	has an independent existence.  The
	  type of an object is associated with the value, not with any
	  storage locations which may be referring to it.  Thus	ICI
	  variables are	dynamically typed.  The	separation of storage
	  location and value is	transparent in most situations,	but in
	  some ways is distinguishable from the	case in	a language
	  such as C where an object is isomorphic with the storage it
	  occupies.

	  ICI assignment and function argument passing does not
	  transfer a copy of an	object,	but transfers a	reference to
	  the object (that is, the new variable	refers to the same
	  object).  Thus it is straight	forward	to have	two variables
	  referring to the same	object;	but this does not mean that
	  assigning to one effects the value of	the other.
	  Assignment, even in its most heavily disguised forms,	always
	  assigns a new	object to a storage location.  (Even an
	  operation such a "++i" makes the variable "i"	refer to the
	  object whos value is one larger than the object which	it
	  previously referred to.)

	  The normal storage locations are the elements	of arrays and
	  structures.  Simple variables	are actually structure
	  elements, although this is not apparent in everyday
	  programming.

	  Some object types are	"atomic" (scalar), that	is their
	  internal structure is	not modifiable.	 Atomic	data types
	  have the property that all objects with the same value are
	  in fact the same object.  Integers, floating point numbers,
	  strings and functions	are atomic by nature.  The only
	  standard non-atomic data types are arrays and	structures.
	  An atomic (constant) version of any aggregate	type (array or
	  structure) can be obtained.  Several of the intrinsicly
	  atomic types do allow	read-only access to their interior
	  through indexes, structure keys or pointers.	(Strings for
	  example allow	indexing to obtain one character sub-strings.)

	TYPES
	  Each of the following	paragraphs is tagged with the internal
	  name of the type, as returned	by the typeof()	function:

	  int  Integers	are 32 bit signed integers.  All the usual C
	       integer operations work on them.	 When they are
	       combined	with a float, a	promoted value is used in the
	       usual C style.  Integers	are atomic.

	  float
	       All floating point is carried out in the	host machine's
	       double precision	format.	 All the usual C floating
	       point operations	work.  Floats are atomic.

	  string
	       Strings are atomic sequences of characters.  Strings
	       may be indexed and have the address taken of internal
	       elements.  The value of fetching	a sub-element of a
	       string is the one character string at that position
	       unless the index	is outside the bounds of the string,
	       in which	case the result	is the empty string.  The
	       first character of a string has index 0.

	       Strings may be used with	comparison operators, addition
	       operators (which	concatenate) and regular expression
	       matching	operators.  The	standard function sprintf is a
	       good way	of generating and formatting strings from
	       mixed data.

	  NULL The NULL	type only has one value, NULL (the same	name
	       as the type).  The NULL value is	the general undefined
	       value.  Anything	uninitialised is generally NULL.

	  array
	       Arrays always start at 0	but extend to positive indexes
	       dynamically as elements are written to them.  A read of
	       any element either not yet assigned to or outside the
	       bounds of the array will	produce	NULL.  A write to
	       negative	indexes	will produce an	error, while a write
	       to positive indexes will	extend the array.  Note	that
	       arrays do not attract an	implicit ampersand as in C.
	       Use &a[0] to obtain a pointer to	the first element of
	       an array	"a".

	       The function array() and	array constants	(see below)
	       can be used to create new arrays.

	  struct
	       Structures are collections of storage locations named
	       by arbitrary keys.  Structures acquire storage
	       locations and member names as they are assigned to.
	       Elements	which do not exist read	as NULL.  Pointers may
	       be taken	to any member, but pointer arithmetic is only
	       possible	amongst	element	names which are	simple
	       integers.

	       Note that normal	structure dereferencing	with
	       struct.member is	as per C, and the member name is a
	       string.	Member names which are determined at run time
	       may be specified	by enclosing the key in	brackets as
	       per: struct.(expr), in which case the key may be	any
	       object (derived from any	expression).  Thus
	       struct.("mem" + "ber") is the same as struct.member. An
	       "index" may also	be used, as per: struct[expr], and has
	       the same	meaning	as struct.(expr).  (This is true in
	       general,	all data types which allow any indexing	of
	       their internal structure	operate	through	the same
	       mechanism and these are only notational variations.)

	       The function struct() and structure constants (see
	       below) can be used to create new	structures.

	       From a theoretical standpoint structures	are a more
	       general type than arrays.  But in practice arrays have
	       some properties structures do not (intrinsic order,
	       length and different concatenation semantics, as	well
	       as less storage overhead).

	       Note that by ignoring the value associated with a key,
	       structures are sets (and	addition performs set union,
	       see below).

	  ptr  Pointers	point to places	where things are stored, but a
	       pointer may be taken to any object and a	nameless
	       storage location	will be	fabricated if necessary.  They
	       allow all the usual C operations.  Pointer arithmetic
	       works as	long as	the pointer points to an aggregate
	       element which is	indexed	by an integer (for instance
	       all elements of arrays, and amongst structure elements
	       which have integer keys).  Pointers are atomic.

	       Note that pointers point	to a storage location, not to
	       the value of an object itself.  Thus if "a" is an
	       array, after "p = &a;", the expression "*p" will	have
	       the same	value as "a" even if "a" becomes a structure
	       (through	assignment).

	       Note that it is not possible to generate	pointers which
	       are in any way illegal or dangling.  Also note that
	       because assignment and argument passing does not	copy
	       values, pointers	are not	required as often as they are
	       in C.

	  func Functions are the result	of a function declaration and
	       function	constants. They	are generally only applicable
	       to the function call operation and equality testing.
	       They do not attract an implicit ampersand as in C.
	       Functions are atomic.  (Code fragments within functions
	       are also	atomic and thus	shared amongst all functions.)

	  regexp
	       Regular expressions are atomic items produced by	either
	       regular expression constants (see below)	or compiled at
	       run-time	from a string.	They are applicable to the
	       regular expression comparison operators described
	       below.

	  file Files are returned and used by some of the standard
	       functions.  See below.

	  window
	       Windows are produced and	used by	some of	the standard
	       functions.  See below.

	  Other	types (pc, catch, mark,	op, module and src) are	used
	  internally and are not likely	to be encountered in ordinary
	  programming.

	LEXICON
	  Lexicon is as	per C, although	there is no preprocessor yet,
	  with the following additions:

	  Adjacent string constants separated only by white space form
	  one concatenated string literal (as per ANSI C).

	  The sequence of a "#"	character (not on the start of line),
	  followed by any character except a newline up	to the next
	  "#" is a compiled regular expression.

	  The sequences	!~, ~~,	~~=, ~~~, $, @,	[{, }],	[<, and	>] are
	  new tokens.

	  The names NULL and onerror are keywords.

	EXPRESSIONS
	  Expressions are full C expressions (with standard precedence
	  and associativity) with some additions.  The overall syntax
	  of an	expression is:

	  expression:
	       primary
	       prefix-unary expression
	       expression postfix-unary
	       expression binop	expression

	  primary:
	       NULL
	       int-literal
	       float-literal
	       char-literal
	       string-literal
	       regular-expression
	       [ expression-list ]
	       [< assignment-list >]
	       [{ function-body	}]
	       ident
	       ( expression )
	       primary ( expression-list(opt) )
	       primary [ expression ]
	       primary . struct-key
	       primary -> struct-key

	  struct-key:
	       ident
	       ( expression )

	  prefix-unary:
	       * & + - ! ~ ++ -- $ @

	  postfix-unary:
	       ++ --

	  binop:
	       * / % + - >> << < > <= >=
	       == != ~ !~ ~~ ~~~ & ^ | && || : ?
	       = += -= *= /= %=	>>= <<=	&= ^= |= ~~=
	       ,

	  expression-list:
	       expression
	       expression , expression-list

	  assignment-list:
	       assignment
	       assignment , assignment-list

	  assignment:
	       struct-key = expression

	  The effect and properties of various expression elements are
	  discussed in groups below:

	  simple constants
	       integers	and floats are recognised and interpreted as
	       they are	in C. Character	literals (such as 'a') have
	       the same	meaning	as in C	(ie. they are integers,	not
	       characters).  String literals have the same lexicon as
	       C except	that they produce strings (see Types above).
	       Both character and string literals allow	the additional
	       ANSI C backslash	escapes	(\e \v \a \? \xhh) Regular
	       expressions are those of	ed(1).

	  complex constants

	  [ expression-list ]

	  [< assignment-list >]

	  [{ function-body }]
	       Because variables are intrinsically typeless it is
	       necessary that initialisers, even of aggregates,	be
	       completely self-describing.  This is one	of the reasons
	       these forms of constants	have been introduced.  The
	       first is	an array initialised to	the given values, the
	       second is a structure with the given keys initialised
	       to the given values.  The third is a function.  The
	       values in the first two are all computed	as constant
	       expressions (not	meaning	that they are made atomic or
	       may only	contain	constants, just	that they are computed
	       once when they are first	parsed).

	  primary ( expression-list(opt) )
	       Function	calls have the usual semantics.	 But if	there
	       are more	actual parameters than there are formal
	       parameters in the function's definition,	and the
	       function	has an auto variable called "vargs", the
	       remaining actual	parameters will	be formed into an
	       array and assigned to this variable.  If	there is no
	       excess of actual	parameters any "vargs" variable	will
	       be undisturbed, in particular, any initialisation it
	       has will	be effective.

	  prefix-unary (* & + -	! ~ ++ -- $ @)
	       Apart from "$" and "@", the prefix unary	operators have
	       the same	meaning	as they	do in C.  The "*" operator
	       requires	a ptr as an argument.  The "-" operator
	       requires	an int or float.  "!" and "~" require ints.
	       "++" and	"--" work with any values which	can be placed
	       on the left of a	"+ 1" or "- 1" operation (see below).
	       The rest	("&", "+", "$",	"@") work with any types.  A
	       "+" always has no effect.  If the operand of an "&" is
	       not an lvalue in	the usual sense, a one element array
	       will be fabricated to hold the value and	a pointer to
	       this element will result.  The "$" operator causes the
	       effected	expression to evaluated	at parse time (thus
	       "$sin(0.5)" will	cause the value	to be computed once no
	       matter how many times the term is used).	 The "@"
	       operator	returns	the "atomic" form of an	object.	 This
	       is a no-op for simple types.  When applied to an
	       aggregate the result is a read-only version of the
	       same, which will	be the same object as all other	atomic
	       forms of	equal aggregates (as per ==).

	  regular expression matches (~	!~ ~~ ~~= ~~~)
	       These binary operators perform regular expression
	       matches.	 In all	cases one operand must be a string and
	       the other a regular expression.	The operator ~
	       performs	the match and returns 1	or 0 depending whether
	       the string did, or didn't match the expression.
	       Likewise	for !~ with opposite values.

	       The operator ~~ matches the string and regular
	       expression and returns the portion of the string
	       matched by the \(...\) enclosed portion of the regular
	       expression, or NULL if the match	failed.	 The ~~=
	       operator	is the equivalent assignment operator and
	       follows the usual rules.

	       The ~~~ operator	matches	the string and the regular
	       expression and returns an array of the portions of the
	       string matched by the \(...\) portions of the regular
	       expression, or NULL if the match	failed.	 (This may
	       move to a function.)

	  assignment operators
	       As previously mentioned,	assignment always sets a
	       storage location	to a new object.  The old value	is
	       irrelevant (although it may have	been used in the
	       process of a compound assignment	operator).  Thus there
	       is no implicit cast on assignment, so assigning an int
	       to what is currently a float will result	in an int.

	       Assigning to a currently	unknown	variable will
	       implicitly declare the variable as static.

	  other	binary operators
	       The usual C binary operators work as they do in C and
	       on the same range of types.  In addition:

	       The == and != operators work on all types.  Arrays and
	       structures are equal if they contain the	same objects
	       in the same positions.

	       The + and += operators will concatenate string, arrays
	       and structures (in the last case, where identical keys
	       occur the values	of the right hand operand take
	       precedence).

	       The << and <<= operator will shift an array, loosing
	       elements	from the front and shortening the array	as a
	       whole.

	       The <, >, <=, >=	operators work on strings, making
	       lexical comparisons.

	VARIABLES, SCOPES AND INITIALISERS
	  There	are exactly three levels of scope.  Extern (visible
	  globally by all code), static	(visible by code in the
	  module), and auto (visible by	code in	the function).	The
	  variables in the first two are persistent and	static.	 Auto
	  variables have a fresh instantiation created each time a
	  function is entered, and lost	on exit	(unless	there are
	  references to	them).	Implicitly declared variabled are
	  static.

	  All types of declarations may	occur anywhere,	they are
	  simple statements unlike in C.  They have their effect
	  entirely at parse time and thus produce no code.  But	the
	  rules	about scope still apply.  No matter where an extern
	  declaration is made, once it is parsed that variable is
	  visible globally.  Similarly once an auto declaration	is
	  parsed that variable is visible throughout the scope of the
	  function.

	  Note that initialisers are constant expressions.  They are
	  evaluated once at parse time.	 Even initialisers of autos.
	  Every	time a set of auto variables is	instantiated (by
	  function entry) the variables	are set	to these initial
	  values, NULL if there	is no initialiser.

	STANDARD FUNCTIONS
	  The following	functions form part of the language definition
	  and should be	present	in all implementations,	including
	  embedded systems.

	  call(func, array)
	       Calls the function with arguments taken from the	array.
	       Thus the	statement call(func, ["a", "b"]); is
	       equivalent to func("a", "b");.  Returns the return
	       value of	the function.

	  array(...)
	       Returns a new array formed from the arguments, of which
	       there may be any	number,	including zero.

	  struct([key, value...])
	       Returns a structure initialised with the	paired keys
	       and values given	as arguments, of which there may be
	       any even	number,	including zero.

	  string = sprintf(format, args...)
	       Returns a sting formatted as per printf(3S) from the
	       format and arguments.  All flags	and conversions	are
	       supported up to System 5.3's.  The new ANSI n and p
	       conversions are not provided.  Precision	and field
	       width * specifications are allowed.  Type checking is
	       strict.

	  copy(any)
	       Returns a copy of its argument.	A null operation for
	       all types except	arrays and structures.	To simulate
	       C's structure assignment	use: "a	= copy(b)" in place of
	       "a = b".	 Note that this	is a "top level" copy.	Sub-
	       aggregates are the same sub-aggregates in the copy as
	       in the original.

	  eval(any)
	       Evaluates its argument in the current scope.  This is a
	       null operation for any type except strings.  For	these
	       it will return the value	of the variable	of that	name
	       as looked up in the current scope.

	  exit(int)
	       Exits with the given status.

	  fail(str)
	       Generates a failure with	the given message (see Error
	       handling	above).

	  float(any)
	       Returns a floating point	interpretation of its argument
	       (an int,	string or float	else it	will return 0.0).

	  int(any)
	       Returns an integer interpretation of its	argument (a
	       float, string or	int else it will return	0).

	  string(any)
	       Returns a string	interpretation of its argument (an
	       int, float or string, else it will return the type name
	       in angle	brackets).

	  typeof(any)
	       Returns the type	name of	an object (a string).

	  parse(file/string [,module])
	       Parses the file or string in a new module, or the
	       context of the given module if supplied.

	  regexp(string)
	       Return the regular expression compiled from the string.

	  sizeof(any)
	       Return the number of elements the object	has (Ie.
	       elements	of an array or key/value pairs in a struct or
	       characters in a string, returns 1 for all other types).

	  push(array, any)
	       Adds the	object to the end of the array,	extending it
	       in the process.

	  pop(array)
	       Return the last object in the array and shortens	the
	       array by	one in the process.  It	will return NULL is
	       the array is empty already.

	  keys(struct)
	       Returns an array	of the keys (ie. member	names) of the
	       struct.

	  smash(string1, string2)
	       Returns an array	of sub strings from string1 which were
	       delimited by the	first character	from string2.

	  str =	subst(string1, regexp, string2 [, flag])
	       (Coming	soon.)	Returns	a copy of string1 with sections
	       that matched regexp replaced by string2, globally if
	       flag is given as 1.

	  str =	tochar(int)
	       Retuns a	one character string made from the integer
	       character code.

	  int =	toint(str)
	       Return the character code of the	first character	of the
	       string.

	  int =	rand([int])
	       Returns a pseudo-random number in the range 0 ..	2^15 -
	       1.  If an argument is supplied this is used to seed the
	       random number generator.

	  string/array = interval(string/array,	start [,len])
	       Returns the interval of the string or array starting at
	       index start an continuing till the end or len elements
	       if len is supplied.  Interval extraction	outside	the
	       bounds of the object will merely	leave out the absent
	       elements.

	  array	= explode(string)
	       Return an array of the integer character	codes of the
	       characters in the string.

	  string = implode(array)
	       Returns a string	formed from the	concatenation of the
	       integer character codes and strings found in the	array.
	       Objects of other	types are ignored.

	  file = sopen(string, mode)
	       Returns a file (read only) which	when read will return
	       successive characters from the string.

	  module = module(string)
	       Return a	new module with	its name taken from the	string
	       argument.

	  obj =	waitfor(obj...)
	       Blocks (waits) until an event indicated by any of its
	       arguments occurs, then returns that argument.  The
	       interpretation of an event depends on the nature	of
	       each argument.  A file argument is triggered when input
	       is available on the file.  A float argument waits for
	       that many seconds to expire, an int for that many
	       millisecond (they then return 0,	not the	argument
	       given).	Other interpretations are implementation
	       dependent.  Where several events	occur simultaneously,
	       the first as listed in the arguments will be returned.

	       Note that in implementations that support many basic
	       file types, some	file types may always appear ready for
	       input, despite the fact they are	not.

	  unixfuncs()
	       When first called, will define as external functions
	       the unix	system interface functions described below (if
	       available).  Subsequent calls are ignored.

	  vstack()
	       Return a	copy of	the variable (scope) stack.  Index 0
	       is the outermost	scope.	It will	contain	functions,
	       each optionally followed	by a structure of the local
	       variables.  (Only for debuggers obviously.)

	STANDARD EXTERNAL VARIABLES
	  externs
	       A structure of all the extern variables.

	  argc A count of the otherwise	unused arguments to the
	       interpreter.

	  argv An array	of strings, which are the otherwise unused
	       arguments to the	interpreter.  (Note this is different
	       from the	argument to main, which	is a pointer to	the
	       first element of	this array as it is in C.  It is
	       probably	easier to use the globals in general.)

	  stdin
	       Standard	input.

	  stdout
	       Standard	output.

	  stderr
	       Standard	error output.

	OTHER FUNCTIONS
	  The following	functions will be present on systems where the
	  environment permits.	Missing	file arguments are interpreted
	  as standard input or output as appropriate.  Pretty obvious,
	  but more details latter.
	       printf(fmt, args...)
	       fprintf(file, fmt, args...)
	       file = fopen(name, mode)
	       file = popen(cmd, mode) /* UNIX only. */
	       status =	system(cmd)
	       str = getchar([file])
	       str = getline([file])
	       str = getfile([file])
	       put(str [,file])
	       fflush([file])
	       fclose(file)

	UNIX FUNCTIONS
	  The following	functions will be available on UNIX systems or
	  systems that can mimic UNIX.	See unixfuncs()	above.	They
	  all return an	integer.  On failure they raise	a failure with
	  the error set	to the appropriate system error	message
	  derived from errno.  These interfaces	are raw.  Use at your
	  own risk.
	       access(), alarm(), acct(), alarm(), chdir(),
	       chmod(),	chown(), chroot(), close(), creat(),
	       dup(), _exit(), fork(), getpid(), getpgrp(),
	       getppid(), getuid(), geteuid(), getgid(), getegid(),
	       kill(), link(), lseek(),	mkdir(), mknod(),
	       nice(), open(), pause(),	rmdir(), setpgrp(),
	       setuid(), setgid(), setgid(), signal(), sync(),
	       ulimit(), umask(), umask(), unlink(),
	       clock(),	system(), lockf(), sleep(),
	       /* Rest on the way. */

	DATA BASE FUNCTIONS
	  Simple non-indexed, but otherwise fully locked and
	  functional data base support.	 Not for speed.	 If your
	  application needs a serious data base, get one, don't	use
	  this.	 Use this for configuration info and all that
	  peripheral stuff.

	  The array's are arrays of strings, which are the fields of a
	  record.  The "keyfieldno" is which field number of the
	  record is the	key for	this operation.	 The "dbname" is a
	  file name, one table per file.  It will be created if	it
	  does not exists, but an empty	file is	ok too.	 Use UNIX
	  permissions for access control.  Read	access on read-only
	  files	is ok.	db_get() returns NULL if not found.  More
	  details later.
	       array = db_get(dbname, keyfieldno, value)
	       array = db_delete(dbname, keyfieldno, value) /* Returns old data. */
	       db_set(dbname, keyfieldno, array)
	       db_add(dbname, array)

	WINDOWS
	  Upon first reference to any of the window routines standard
	  input	is placed in the appropriate modes for non-echoing
	  character at a time input.  All input	from the terminal
	  should be fetched with w_getchar() and w_edit().  Upon exit
	  (including interrupt)	all modes will be restored.

	  win =	w_push(line, col, nlines, ncols)
	       Pushes an opaque	rectangular window on the screen at
	       the given line and col, which are in screen
	       coordinates.  But special values	of -1 or -2 for	line
	       or col indicate centering or right justification
	       (bottom justification for line) for that	aspect of the
	       position.  The window will have the given number	of
	       lines and columns, unless line or col are less than or
	       equal to	zero, in which case they will be that much
	       less than the full screen size.	The window is
	       initially clear and on top of all previous windows.

	  w_pop(win)
	       "Pops" the window from the screen; re-exposing anything
	       which the window	was hiding.  Any window	may be popped
	       from the	screen,	whether	it is the top window or	not.
	       After a window has been popped it is dead and cannot be
	       put back.  Make a new window to do this.	 Note that if
	       a window	is not referenced it will get popped when the
	       next garbage collection occurs, but windows should
	       always be popped	explicitly.

	  w_paint(win, line, col, text [,tabs])
	       Paints the text on the window at	the given line and
	       column (in the window's space), with auto-indent	on
	       subsequent lines	(indicated by a	\n character in	the
	       text).

	       A string	tab specification reminisent of	troff (and
	       most word processors) may be given.  If supplied	it
	       must be a concatenation of tab-specs.  Each tab-spec
	       consists	of an optional "+" character, followed by a
	       decimal number, followed	by an optional leader
	       character, followed one of the letters "L", "C" or "R".

	       If the "+" is supplied the tab position is at a
	       relative	offset from the	previous one, else it is an
	       distance	from the left margin of	this text block.  If a
	       leader character	is given the distance between the
	       current column and the start of the next	text will be
	       filled with that	character, else	a direct motion	will
	       be used (use an explicit	space leader to	clear an
	       area).  If an "L" tab is	set, the next field of text
	       will start at the tab stop, if a	"C" tab	is set the
	       next field of text will be centered on the tab stop,
	       and if an "R" tab is set	the next field of text will
	       end on the tab stop.  The "next field of	text" is the
	       text after the tab character up to the next tab,
	       newline or end of string.

	       The last	tab-spec in the	string will be used
	       repeatedly.  Scanning of	the tabs starts	again on each
	       new line.  If no	tab specification is given multiple-
	       of-8 column tabs	are used, but relative to the start
	       position.

	       For example, a three part title in an 80	column window
	       could be	painted	with the tab spec "40C80R".

	  win =	w_textwin(line,	col, text [,tabs])
	       Pushes a	window in the same manner as w_push() (with
	       the same	interpretation of line and col)	of just
	       sufficient size to hold the given text as it is set by
	       w_paint() with a	box around it.	It is allowable	for
	       column positions	in the text being set to have negative
	       numbers during the sizeing phase	of this	operation.

	  w_mesg(str)
	       Pushes a	boxed one line window centred at the bottom of
	       the screen and containing the string.  It will be
	       automatically removed after the next keystroke.

	  w_cursorat(win, line,	col)
	       Sets the	cursor position	for this window	(in the
	       window's	space).	 When the window is the	top window on
	       the screen, the real screen cursor will be at this
	       position.

	  str =	w_getchar()
	       Returns the next	character from the terminal, without
	       echo and	without	canonical input	processing.  For
	       ordinary	ASCII characters a one character string	is
	       returned.  For special keys an appropriate multi
	       character string	is returned (currently "F0", "F1" ...
	       "F32", "LEFT", "RIGHT", "UP", "DOWN", "HOME", "END",
	       "PGUP", "PGDOWN").

	       The screen is refreshed before the waiting for user
	       input.

	  w_ungetchar(str)
	       Pushes a	character back.	 Only one character of push-
	       back is allowed.	 Only the first	16 characters of the
	       string will be significant (all "characters" returned
	       by w_getchar() are shorter than this).

	  str =	w_edit(win, line, col, width, str)
	       Allows traditional editing of an	input field at the
	       given position and width	and initially containing the
	       given string.  Editing will proceed until any unusual
	       character is pressed (that is, not a printing ASCII
	       character or one	of the field editing keys such as
	       backspace).  At that point the character	which caused
	       termination will	be pushed back on the input stream and
	       the current text	of the field returned.	The next call
	       to w_getchar() will return the key which	terminated
	       editing.

	  w_box(win)
	       Draws a box around the inside edge of the window.

	  w_clear(win)

	  w_refresh()

	  w_suspend()
	       Restores	the terminal to	normal modes and moves the
	       cursor to the bottom left.  The next window operation
	       will revive the screen.

     EXAMPLES
	  The following	shell command line will	print Hello world.
	       ici -p 'printf("Hello world.\n");'

	  The following	program	prints the basename of its argument:
	       #!ici
	       printf("%s0, argv[1] ~~ #\([^/]*\)$#);

	  The following	example	illustrates a simple grep like
	  program.  The	first line makes a Bourne shell	pump the
	  program in through file descriptor 3,	and passes any
	  arguments to the shell script	on to the ICI program.	File
	  descriptor 3 is used to avoid	disturbing the standard	input.
	  This works on	all UNIX's but of course 4.2+ and 5.4+ can use
	  #! stuff.  Note that errors (such as those encountered upon
	  failure to open a file) are not checked for.	The program
	  can be expected to exit with an appropriate message should
	  they occur.
	       exec ici	-3 -- "$0" "$@"	3<<'!'

	       extern
	       main(argc, argv)
	       {
		   if (argc < 2)
		    fail(sprintf("usage: %s pattern [files...]", argv[0]));
		   pattern = regexp(argv[1]);
		   if (argc == 2)
		    grep("", stdin);
		   else
		   {
		    for	(i = 2;	i < argc; ++i)
			grep(sprintf("%s:", argv[i]), fopen(argv[i], "r"));
		   }
	       }

	       static
	       grep(prefix, file)
	       {
		   while ((s = getline(file)) != NULL)
		   {
		    if (s ~ pattern)
			printf("%s%s\n", prefix, s);
		   }
		   if (file != stdin)
		    fclose(file);
	       }

     SEE ALSO
	  awk(1), ed(1), printf(3S), etc.

     BUGS
	  There	is a problem with the right-associativity of ? :
	  stuff.  Use brackets when combining multiple ? : operators
	  for the time being.

	  There	is an occasional problem with the screen updating with
	  multiple windows.

	  A && or || expression	may not	result in exactly 0/1 if it
	  gets to the last clause.

     AUTHOR
	  Tim Long, May	'91.
----------------------------------------------------------------------
Returning to the general; My intention was not to replace any of the
special purpose tools like the shell, awk, sed etc, nor was it to
make a replacement for real programming languages like C.  Rather, I
regard it as a casual programming tool filling much the same niche as
BASIC.  As such it doesn't have specific language features dedicated
to special tasks (like doing something for each line of input text).
But it does (or will) have a broad base of simple primitives to make
most routine tasks easy.  And of course it is extensible.  But you will
notice that almost none of its "library" features are the ultimate
expression of that area of software technology.

In practice every major application has some principle, or piece of
software technology, or bit of hardware which is its reason
for existence as a product.  But products can't run on one leg.
Inevitably the endless series of tack-on bits has to be supplied.
usually with a great deal of re-invention taking place.  I have thought
of ICI as assisting in that area.  The theory is that if something is
a major focus of an application, you won't be using these dicky little
features to doit.  But for all those other bits, which aren't your
real business, you can just use the stuff provided and hack up the
rest in a somewhat more amenable programming environment than raw C.

Getting back to the language itself...

You can easily see from the above how it is like C.  What is probably
not so obvious is how it is not like C.  Here is a grab bag of things
to convey some of the flavour.

A lot of the usual messing around with strings can be handled
by the regular expression operators.  The ~~= operator is particularly
usful.  For example, to reduce a string s which holds a file name
to its basename:

	s ~~= #\([^/]*\)$#;

I know it looks a bit insane, but regular expression are
like that.  I'm not going to apologise for using # rather than /
to delimit regular expressions.  It was necessary to avoid lexical
ambiguity and you get used to it in no time.

I don't seem to have written the bit in the manual on error handling.
I'll quickly describe it here.  The actual syntax of a compound
statement is:

compound-statement:
	{ statement(rpt) }
	{ statement(rpt) } onerror statement

In other words compound statements may have an optional "onerror"
followed by a statement.  Errors work on the principle that the lower
levels of a program know what happened, but the higher levels know
what to do about it.  When an error occurs, either raised by the
interpreter because of something the program did or explicitly
by the program, an error message is generated and stored in
the global variable "error".

The execution stack is then automatically unwound until an onerror
clause is found, and execution resumes there.  The unwinding will unwind
past function calls, recursive calls to the interpreter (through
the parse function) etc.

If there is no onerror clause in the scope of the execution, the main
interpreter loop will bounce the error message all the way out to the
invoking system.  In the UNIX case this will print the message along
with the the source file name, the function name and the line number (which
is also available).

Although the manual entry doesn't go into that sort of detail it is
important to know what things raise errors in what circumstances.
But the basic philosophy is that the casual programmer can just
ignore the possibility of errors (like failure to open a file)
and expect the finished program to exit with a suitable message when
things go wrong.  The grep program given in the manual is an example
of this.  One error is checked for explicitly so it can give its
own usage message, but failures to open files or syntactically incorrect
regular expressions are allowed to fall out naturally.

I seem to be wandering a bit here, back to some examples...

Functions are of course just another datum.  A function called
"fred" is just a variable which has been assigned a function.
You could re-define the getchar function (even though it is an
intrinsic function coded in C) with either:

extern
getchar()
{
    return tochar(rand() % 256);
}

OR

extern getchar	= [{(){return tochar(rand() % 256);}}];

The second it a little perverse, but function constants make more sense
in examples like:

    sort(stuff, [{(a, b){return a < b ? -1 : (a > b ? 1 : 0);}}]);

Where the sort comparison function is given in-line so you don't have
to go chasing all over the code to find the two line function.
(There is a growing library which contains functions like sort, but
it is not in a fit state for discussion yet.)

They also make more sense when doing object oriented stuff.
Suppose you want to define a set of methods in a type.  You can just
assign the functions directly into the type with:

static type = struct();

type.add =
[{
    (a, b)
    {
	return ....;
    }
}];

type.sub =
[{
    (a, b)
    {
	return ....;
    }
}];

Or you could build it in one hit like:
type =
[<
    add =
    [{
	(a, b)
	{
	    return ....;
	}
    }],

    sub =
    [{
	(a, b)
	{
	    return ....;
	}
    }],
>];

The variable argument support handles all possibilities.  One nice
example of it's use comes from the way libraries are done.

Because code is parsed at run-time, you don't want have to parse
thousands of lines of libraries for every one line program.  Instead,
a library will just define stub functions, which invoke a standard
(library) function called autoload().  They look like this:

extern sort()	{auto vargs; return autoload("sort", sort, vargs);}

Because the function has an auto variable called "vargs", any
unused arguments (ie. all of them) are assigned to it.  These are
than passed on to autoload.  The arguments to autoload are a
file name (it will prefix it with the standard lib dir), the function
being re-defined and the arguments.  It will parse the file, check
that it redefined the function and then call it with the arguments.

From then on of course the new function is defined and the old one gets
garbage collected like all lost data.  The loaded file could define
several functions, and any autoload definition they have will also
be replaced a the same time.  The current version of autoload looks
like this:

/*
 * Parse the given file and transfer control to the newly loaded version
 * of the function as if that was what was called in the first place.
 * A loaded file can define more than one function.  They will all
 * be replaced on the first load.  See examples below.
 */
extern
autoload(file, func, args)
{
    auto	stream;

    file = "/usr/ici/" + file;
    parse(stream = fopen(file, "r"));
    fclose(stream);
    if (func == eval(func.name))
	fail(sprintf("function %s was not found in %s", func.name, file));
    return call(eval(func.name), args);
}

Notice that it references a sub-field of the function like a structure
field.  This is something that the manual entry doesn't go into details
about but you can do things like that.  A function, for instance, has
sub fields of: "name" a name for the function (for simple declarations
this is the name the function was first declared as), "args" an atomic
array of the declared formal parameters, "autos" an atomic struct of all
the autos and their initial values, and there are a few other fields too.

Also notice how it uses the "eval" function to check the value of a
variable whos name is determined at run time, and then its use of the
call function to call a function with a run-time determined variable
argument list.

Again notice that it doesn't need to worry about any errors except
those it wants to check for explicitly.  The others will happen
correctly automatically.  This one feature can save a lot of code.

The sequence of operations on function entry is very deliberate and
you can do some neat things with it.  In particular, formal parameters
are just auto variables which are initialised with the corresponding
actual parameter.  But they are initialised with this after the
explicit initialisations have been applied.  Thus you can use an
explicit initialisation to give a default value to an argument which
is optional, without messing about with the "vargs" variable.
For example:

static
getstuff(file)
{
    auto	file = stdin;

    ....
}

Structure keys (and switch statements which use a struct) work on the
key being the same object as the tag.  Thus switching on strings, ints,
floats, functions etc. is fine.  But you can also use aggregate keys by
always using atomic versions of them:

	switch (@array(this, that))
	{
	case @["one thing", "the other"]:
		...
	case @[1, 2]:
		...
	case @[x, y]:
		...
	}

You will notice that because things refer to other things, rather
than actually holding them, you use pointers far less often than you
do in C.  In fact you can start to treat structured data types in a
much more casual fashion.

I have hardly scratched the surface here, but this is getting a bit long
so I'll terminate this section.

A few practicalities: on my 386 the initial load image (text+data)
comes in at around 110K (85K text + 25K data, of which a disconcerting
amount comes from curses, even though all I want it to do is read
a terminfo entry.  After that, time and space is as proportional to
the needs of the program as I could make it (These sort of
interprerative languages often have nasty non-linear time or space
performance characteristics due to garbage collection and stuff.
I have tried to be careful to avoid this sort of behaviour.)
For some tasks memory use can be better than expected, because of
object sharing...

Memory is only needed to hold distinct atomic objects, so although
technically there are reasonable memory overheads for, say, an integer,
in practice most programs don't have very many distinct integers at any
given point in time.  After the first instance of a given number you are
only paying the overhead of the storage location which refers to each
additional reference, which is 4 bytes for array elements and 8 bytes
for structure elements.

In fact it can happen that large arrays of floating point numbers
(which are 8 bytes each) can occupy less space than you would at first
expect.  I have been thinking of shifting integers to 64 bits, because
there would be no overhead in memory use (they already use the same size data
block as floats) and I suspect the performance loss would be marginal.
But more to the point, 32 bits is just not enough. (A set of good portable
64 bit routines will be gratefully accepted.)

I think I have mentioned that it also designed for embedded systems.
This means that:

a) It is easy to link the interpreter into other C programs, there
are as few external symbols as I could manage and it uses just a few
classic library functions.

b) It is easy to write intrinsic functions (ie. functions written in
C which can be called from ICI code).

c) It is easy to call ICI functions from C (although at the moment there
is slightly more overhead than the inverse direction).

d) Where necessary, additional types can be introduced without disturbing
the rest of the interpreter.  (An example of this is the character
based screen handler.  It is done in a single source module with only
one reference to it (in a configuration array), yet its "window" type
integrates fully with the rest of the interpreter.)

I think this will have to do for now.  I'll post the source, the manual
and some sample programs somewhere soon.

By the way, I have always regarded designing a programming language
as the height of arrogance.  And I can only defend this by saying I
did it for me.
-- 
Tim Long
tml@extro.ucc.su.OZ.AU

phil@ux1.cso.uiuc.edu (Phil Howard KA9WGN) (06/12/91)

tml@extro.ucc.su.OZ.AU (Tim Long) writes:

>1) To have a freely available general purpose interpretive language on
>UNIX systems.  (As opposed to the many more special purpose ones such
>as awk and the shell).  This can be re-phrased as: To have a UNIX
>language like DOS has BASIC.

First thing I do after installing DOS on a PC is find BASIC and erase it.
-- 
 /***************************************************************************\
/ Phil Howard -- KA9WGN -- phil@ux1.cso.uiuc.edu   |  Guns don't aim guns at  \
\ Lietuva laisva -- Brivu Latviju -- Eesti vabaks  |  people; CRIMINALS do!!  /
 \***************************************************************************/

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (06/15/91)

In article <1991Jun11.173907.28331@metro.ucc.su.OZ.AU>, tml@extro.ucc.su.OZ.AU (Tim Long) writes:
>      NAME
> 	  ici  -  General purpose interpretive programming language
> 
>      SYNOPSIS
> 	  ici [	file ] [ -f file ] [ -i	prog ] [ -digit	] [ -l lib ] [ args... ]
> 
> 	  -f file   The	file is	parsed as an ICI module.
> 
> 	  -i prog   The	prog argument is parsed	directly as an ICI
> 		    module.

I for one would find it rather less confusing if you used the same option
name as AWK and sed, namely "-e", as in the following examples I just tried:
	awk -e 'END {print "Hello, world."}' </dev/null
	(echo a; echo b; echo c) | sed -n -e 1p -e 2p

> 	  -digit    An ICI module is read from the file	descriptor
> 		    digit.

May I suggest a slightly more long-winded but rather prettier scheme?
Allow a file name (anywhere at all) to have the form
	/dev/fd#
where # is an integer with however many digits it needs.  Some research
versions of UNIX already support this directly.  People familiar with it
won't thank you for introducing a new notation.  And it takes less than
half a page of code to implement your own "f_or_fd_open(string, mode)"
function in C, and then use that throughout the implementation of ICI
instead of fopen().  [I have done this, and know what I'm talking about.]

> 	  structure) can be obtained.  Several of the intrinsicly
That's intrinsically				      ^^^^^^^^^^^

> 	  int  Integers	are 32 bit signed integers.  All the usual C
> 	       integer operations work on them.	 When they are
> 	       combined	with a float, a	promoted value is used in the
> 	       usual C style.  Integers	are atomic.

Oh *no*!  What's the good of using an interpreted language if it only
gives me 32-bit integers?  If you use any of the PD or redistributable
bignum packages around, then it is *EASY* to provide bignum arithmetic
in an interpreter.  Yes, the bitwise operations &, |, ^, ~ all make
perfect sense on integers of any size, and if we define
	x << y = floor(x * 2**y)
	x >> y = floor(x * 2**(-y))
then even the shifts make sense.  (The shifts won't agree with C, but
then shifts in C aren't as portable as you might think.)

*Please* give very serious consideration to bignums.  For a scripting
language, why the flaming xxxx should I *care* what size a register is?

> 	  Note that initialisers are constant expressions.  They are
> 	  evaluated once at parse time.	 Even initialisers of autos.

Why?  The restriction to constant initialisers for static and external
variables in C made sense, because the initialisation was done by the
linker.  But that doesn't apply to ICI.  About 80% of my initialisations
to auto variables in C are -not- constant expressions.  Why introduce a
restriction that an interpreter like ICI doesn't need and that doesn't
give the ICI programmer any extra safety?

> 	  The array's are arrays of strings, which are the fields of a
The array's what?

>      EXAMPLES
> 	  The following	shell command line will	print Hello world.
> 	       ici -p 'printf("Hello world.\n");'

The manual page said nothing about a "-p" option.

>	  The first line makes a Bourne shell pump the
> 	  program in through file descriptor 3,	and passes any
> 	  arguments to the shell script	on to the ICI program.

I tried something like that on an Apollo once.  Didn't work.  The
shell already had several descriptors other than 0, 1, and 2 open.

> A few practicalities: on my 386 the initial load image (text+data)
> comes in at around 110K (85K text + 25K data, of which a disconcerting
> amount comes from curses, even though all I want it to do is read
> a terminfo entry.

Surely you can get at a terminfo entry by using just -ltermlib;
you don't have to load -lcurses as well.  According to the SVID the
"Terminfo Level Routines" are setupterm(), tparm(), tputs(), putp(),
vidputs(), vidattr(), and mvcur().  setupterm() defines a bunch of
variables.  If you just use those routines, you shouldn't get much
else from Curses.  If not, complain.
-- 
Q:  What should I know about quicksort?   A:  That it is *slow*.
Q:  When should I use it?  A:  When you have only 256 words of main storage.

dws@margay.cs.wisc.edu (DaviD W. Sanderson) (06/15/91)

In article <6354@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes:
>Allow a file name (anywhere at all) to have the form
>	/dev/fd#
>where # is an integer with however many digits it needs.  Some research
>versions of UNIX already support this directly.

And SVR4.0.  I do NOT think you mean /dev/fdN, but rather /dev/fd/N.
This is how it is in Research UNIX and SVR4.  Also, systems that
support /dev/fd typically have synonyms for the standard file
descriptors:

	/dev/stdin	== /dev/fd/0
	/dev/stdout	== /dev/fd/1
	/dev/stderr	== /dev/fd/2
-- 
       ___
      / __\  U N S H I N E	           DaviD W. Sanderson
     |  |  | I N E			    dws@cs.wisc.edu
_____|  |  |_____  ________
\      / \   |__/ /////__	Fusion Powered Locomotives Made to Order
 \____/   \__|_/  \\\\\______	 (TARDIS model available at extra cost)