byron@archone.tamu.edu (Byron Rakitzis) (06/04/91)
After compiling perl on my system and being nauseated by the syntax of the language, I've decided to try to come up with my own alternative. I'm going to call it ap, or anti-perl. Right now I'm thinking that ap will be a super-awk that is less confusing for a C programmer to learn. I'm not sure if I want the implicit looping over stdin (though that's kind of nice) and I definitely don't want the pattern { action } syntax that awk has. It will have an integer and a string datatype, and you should be able to build arrays out of those objects (associative arrays too). Functions would be a nice thing to have, but it must always be easy to toss off a quick one-line ap script, i.e., in the most trivial case I would like something like ypcat hosts | ap 'print $1' or something similar to work just right. I hate having to place braces around that simple statement as one has to do in awk. Most importantly, ap will be driven by an easy-to-understand grammar with C-like syntax. There may be 2 or 3 ways to perform a particular task, but there will not be 10,000 as there are in perl. The main deficiency of awk that I see is its inability to interface well with Unix. Up until recently, awk did not even have ARGC and ARGV, not to mention things like file redirection. This is where perl has taken a step in the "right" direction. Of course, it could be argued, why put symlink(2) into ap when you have ln(1)? Well, this is why perl was written: Unix today just cannot provide any performance with shell scripts; for better or for worse this has to be coded into the command interpreter. Ideas are welcome. I really want to write this thing; perl is a disgrace to the Unix community. -- Byron Rakitzis byron@archone.tamu.edu
Tom Christiansen <tchrist@convex.COM> (06/05/91)
From the keyboard of byron@archone.tamu.edu (Byron Rakitzis): :After compiling perl on my system and being nauseated by the syntax of :the language, I've decided to try to come up with my own alternative. :I'm going to call it ap, or anti-perl. nauseated? you must have a weak stomach. :Right now I'm thinking that ap will be a super-awk that is less :confusing for a C programmer to learn. I've never met a *good* C programmer who's had any real problems with it. There may be a momentary bit of discomfort at seeing dollar signs and thinking of BASIC, but this quickly passes. Certainly no *good* programmer finds it hard. : Functions would be a nice thing to have, but it must :always be easy to toss off a quick one-line ap script, i.e., in the :most trivial case I would like something like : : ypcat hosts | ap 'print $1' : :or something similar to work just right. I hate having to place braces :around that simple statement as one has to do in awk. You mean like these: ypcat hosts | perl -ane "print $F[0]\n" ypcat hosts | perl -pe "s/\s.*//' or skipping ypcat: perl -e 'while (@F=gethostent) { print "$F[0]\n"; }' perl -e '$\ = "\n"; while (($_)=gethostent) { print; }' :Most importantly, ap will be driven by an easy-to-understand grammar :with C-like syntax. There may be 2 or 3 ways to perform a particular :task, but there will not be 10,000 as there are in perl. Perl does have a C like syntax at its base. Certainly it's not much more than 2x as complex, if that. Without C, there'd never have been a perl. But certainly you have a point -- down with expressiveness! Let's also redesign English so there are no synonyms, either lexical or phonetic. Who cares whether connotations, expressiveness, or etymologies get lost in the shuffle? They just clutter up the the language and make it gross. And let's please fix C while we're at it. The way you declare arrays of pointers to functions returning pointers to functions returning pointers to stat structs really nauseates me. Plus did you know that C has several ways of expressing the same thing? It's really disgusting. Check out the flow control: all you need is a for loop; we should abolish the while and the do loops. Or how about saying char *foo versus char foo[] in formal parameters? What stupid redundancy! Let's all flame Dennis Ritchie. As you see, this would be a silly thing. Nothing's perfect in this world. Both Dennis and Larry (and the awk people, too) tried to come up with something to solve a certain set of needs they had at the time. These weren't the same sets of design criteria, of course. The fact that despite their flaws, both C and perl are used proves their utility. :Ideas are welcome. I really want to write this thing; perl is a :disgrace to the Unix community. No -- your statement is. Fortunately, considering how much perl has caught on, you're clearly not speaking for the UNIX community. As for beauty, re-read the third sentence from the perl man page: The language is intended to be practical (easy to use, efficient, complete) rather than beautiful (tiny, elegant, minimal). I truly suggest that you relax and see whether your initial shock goes away. If you find perl helps you get your job done more easily, then use it. If not, then don't! No one's forcing you. If your sensibilities are so very offended by its lack of apparent beauty, then fine, go off and write your own language. But don't go asking the net to tell you how to do it. Great things come from single visions, not from designs by committee. If whatever you come up with is really so wonderful, then it will become popular; if not, it won't. Let empirical evidence decide: good software drives out bad. --tom -- Tom Christiansen tchrist@convex.com convex!tchrist "Perl is to sed as C is to assembly language." -me
byron@donald.tamu.edu (Byron Rakitzis) (06/05/91)
I predicted I would get a lot of flak over this. No surprise to see it coming from one of the perl wizards. In article <1991Jun04.182921.1685@convex.com> tchrist@convex.COM (Tom Christiansen) writes: >nauseated? you must have a weak stomach. I'm not the only one. I have found that people either love perl or hate perl. I think I am not in a small minority when I say that I belong to the latter category. >I've never met a *good* C programmer who's had any real problems with it. What is a *good* programmer? One who has no trouble learning perl? This is a very circular definition. Otherwise, I don't think you're in any position to judge my ability as a programmer. This is just ad-hominem. >There may be a momentary bit of discomfort at seeing dollar signs and >thinking of BASIC, but this quickly passes. Certainly no *good* programmer >finds it hard. Ok, I haven't seen any *good* programmer have any trouble with JCL either. After the inital discomfort of having to type all those //, the feeling soon passes. (!!) >You mean like these: > > ypcat hosts | perl -ane "print $F[0]\n" > ypcat hosts | perl -pe "s/\s.*//' > >or skipping ypcat: > > perl -e 'while (@F=gethostent) { print "$F[0]\n"; }' > perl -e '$\ = "\n"; while (($_)=gethostent) { print; }' Yes, I mean like those. I think my hypothetical example was more concise than any of the suggested replies above. >But certainly you have a point -- down with expressiveness! Let's also >redesign English so there are no synonyms, either lexical or phonetic. I think you miss the point. An expressive language does not imply a perl- like language. A language with the powerful features of perl that's been designed carefully without trying to incorporate syntactic features of every programming language known can be just as expressive. I don't want my language to have a BASIC way of doing things, an APL way of doing things, a C way of doing things... >And let's please fix C while we're at it. The way you declare arrays of >pointers to functions returning pointers to functions returning pointers >to stat structs really nauseates me. Plus did you know that C has several >ways of expressing the same thing? It's really disgusting. Check out the >flow control: all you need is a for loop; we should abolish the while and >the do loops. Or how about saying char *foo versus char foo[] in formal >parameters? What stupid redundancy! Let's all flame Dennis Ritchie. Do you know what part of C Dennis Ritchie is most unhappy with? The declaration syntax! Yes! It's stupid and ugly and hard to understand. Sure, a *good* C programmer has no trouble parsing double (**foo)(void (*)(int)); but I'm sorry, I forgot I was not one of those. >If your sensibilities are so very offended by its lack of apparent beauty, >then fine, go off and write your own language. But don't go asking the >net to tell you how to do it. Great things come from single visions, not >from designs by committee. I don't see what your problem is. Clearly perl was not written in a single vision. If *anything* is an agglomeration of random features, it has *got* to be perl. Also, *EXCUSE ME* for asking the net about their views for an alternative to perl. Perhaps we should just lock an undergraduate in a dark-room with a decstation and wait for her to come up with the successor to perl. After all, programs are not designed by committees, right? "A system without perl is like a hockey game without a fight" -- Byron Rakitzis byron@archone.tamu.edu
Tom Christiansen <tchrist@convex.COM> (06/05/91)
From the keyboard of byron@donald.tamu.edu (Byron Rakitzis), quoting me: :>I've never met a *good* C programmer who's had any real problems with [perl]. : :What is a *good* programmer? One who has no trouble learning perl? This is :a very circular definition. Otherwise, I don't think you're in any position :to judge my ability as a programmer. This is just ad-hominem. It wasn't meant to be. It was based on my experiences. The people who ask the most pointed questions while learning perl have been those who have a deep knowledge of C, and probably other languages as well. Once your brain evolves a language-lawyer mentality, you tend to apply this way of thinking to all you come in contact with. Good C programmers have always picked it up quickly. Budding programmers can have more trouble, but at least they just get null strings, not core dumps. :>There may be a momentary bit of discomfort at seeing dollar signs and :>thinking of BASIC, but this quickly passes. Certainly no *good* programmer :>finds it hard. : :Ok, I haven't seen any *good* programmer have any trouble with JCL either. :After the inital discomfort of having to type all those //, the feeling :soon passes. (!!) I don't happen to find any parallelism between those two cases. :>You mean like these: : [deleted] :Yes, I mean like those. I think my hypothetical example was more concise :than any of the suggested replies above. Not every problem requires the same tool as its answer: that's why we have such a wide variety of them. I still use sed and awk from the command line. :I think you miss the point. An expressive language does not imply a perl- :like language. A language with the powerful features of perl that's been :designed carefully without trying to incorporate syntactic features of :every programming language known can be just as expressive. : :I don't want my language to have a BASIC way of doing things, an APL way :of doing things, a C way of doing things... To borrow a quote from a colleague, programs in large languages come out small, those in small ones come out long. I happen to prefer my languages big because I want my programs small; less writing time. I've been shocked by a lot ruder things in this life than mere dollar signs in front of variable names or having more than one way to express a for loop. :Do you know what part of C Dennis Ritchie is most unhappy with? The declaration :syntax! Yes! It's stupid and ugly and hard to understand. That's why I chose it. To point out there exist awkwardnesses that the authors wish they'd have done differently had they the chance to redo it. That doesn't stop their creations from filling vacuums and achieving widespread use. It just gives us something to complain about. :-) :I don't see what your problem is. Clearly perl was not written in a single :vision. If *anything* is an agglomeration of random features, it has *got* :to be perl. While we users have contributed somewhat to perl's development, such as our crying out for the <=> operator, by and large the whole thing is the brainchild of one man. Certainly he drew from many sources, but that's not the same as having a committee sit down to define a new language: along that path lies madness and ADA. I don't really see what this thread is doing in comp.unix.shell, when to my ears it sounds a lot more like a candidate for alt.religion.computers instead. I've directed followups there, which is where people like to argue about things like this. I think you'll find that if you set about trying to improve awk, what you'll come up with will be a lot more perlian than you're expecting. Read Henry Spencer's paper on awk as a systems programming language. 99% of his laments about awk's shortcomings were things that Larry had already handled in perl before he'd even spoken with Henry about it. Common problems often spawn remarkably similar solutions through convergent evolution. You're obviously a small-is-beautiful person, one valuing aesthetics over utility. Fine. If you don't like it, don't use it. But a lot of us find it the neatest thing since sliced bread, and we get a lot of milage out of this funny-looking camel. It's not some prancing show horse with oh so beautiful lines triggering oohs and ahs from the audience. But I know which beastie *I'd* rather have in the desert: the one that gets the job done quickest without breaking down due to lack of horsepower, or in this case, camelpower. --tom -- Tom Christiansen tchrist@convex.com convex!tchrist "Perl is to sed as C is to assembly language." -me
byron@archone.tamu.edu (Byron Rakitzis) (06/05/91)
In article <1991Jun05.013632.3198@convex.com> tchrist@convex.COM (Tom Christiansen) writes: > >I don't really see what this thread is doing in comp.unix.shell, when >to my ears it sounds a lot more like a candidate for alt.religion.computers >instead. I've directed followups there, which is where people like >to argue about things like this. I'm sorry, you turned it into a religious argument. I started out asking for comments and ideas. OK, I did so by talking in an inflammatory fashion about perl, but that was not my *primary* purpose. So, any reasonable comments about ap, or any alternative to perl for that matter, are still welcome. And yes, I think comp.unix.shell is a reasonable forum for this discussion, since there is no comp.unix.command.interpreter. -- Byron Rakitzis byron@archone.tamu.edu
lwall@jpl-devvax.jpl.nasa.gov (Larry Wall) (06/05/91)
In article <16852@helios.TAMU.EDU> byron@archone.tamu.edu (Byron Rakitzis) writes:
: After compiling perl on my system and being nauseated by the syntax of
: the language, I've decided to try to come up with my own alternative.
: I'm going to call it ap, or anti-perl.
You're certainly welcome to try. But if you come up with a beautiful
syntax, you'll be solving a different problem than the one perl solves.
Perl is certainly ugly--I've never claimed otherwise. But the ugliness
is there for A Reason. Several reasons, in fact:
1) Perl is a systematization of a chaotic culture.
2) Perl should do what you expect, for most "you".
3) Perl should evolve without obsoleting old scripts.
1) Perl is a systematization of a chaotic culture.
If you think of Unix culture as a kind of language, it's an exceedingly
undisciplined language. New words are added helter skelter. Dialects
diverge faster than they can be standardized. There are several
alternate forms of syntax (shells), most of which are based on the
ever-confusing multiple-substitution-pass paradigm (shades of transformational
grammar, ugh).
If you try to analyze Perl as a classical computer language, you'll find
that it's more cumbersome than you're used to. If you think of Perl as
a systematization of Unix culture, however, it's quite small and coherent.
Huge languages like English and Unix are obviously good for something,
but Perl is an experiment in moderation, to see if a medium-sized language
can get some of the benefits of a huge language without all of the
annoyances. The question is whether the expressive power increases
faster than the ease of use and learning decreases. Obviously this
is a matter of taste and prior experience.
I'm not a computer scientist, though I have some training in the field.
I have more training in linguistics. I'm also more of a synthesist than
an analyst. Some of my opinions are naturally repugnant to the analytical
mindset--I don't mind a little redundancy if it increases effective
communication. If you want to be able to factor out every little
bit of redundancy from your programming, pick a language with abstract
types and multiple inheritance like C++. But expect that you'll have to
know the semantics of 57 abstract types to figure out what that "+" means...
2) Perl should do what you expect, for most "you".
If you're going to design a language to fill the (huge) niche between
C, sed, awk and the shells, you'd better make it easy for programmers
in those languages to migrate to your language.
People often suggest new features for Perl. (If you're gonna be a language
designer, you'd better get used to that idea.) Believe it or not, I don't
add most of the features people suggest. However, when someone says, "It
didn't do what I expected," my ears perk up.
3) Perl should evolve without obsoleting old scripts.
All those weird characters like $ and @ and & are also there for a reason.
They let me put variable identifiers into their own namespaces, so that
new "reserved words" can be added to the language without blowing up your old
scripts. (For a simply typed language like Perl, they also enhance
readability--you can see the type of something just by looking at it.
There's also the cultural thing about variable interpolation in shells.)
I was never so gratified as when someone hauled out some old Perl version 1
scripts and they ran without a hitch.
: Right now I'm thinking that ap will be a super-awk that is less
: confusing for a C programmer to learn. I'm not sure if I want the
: implicit looping over stdin (though that's kind of nice) and I
: definitely don't want the
:
: pattern { action }
:
: syntax that awk has.
I'd certainly agree that awk's syntax is too distant from C for comfort (to C
programmers, anyway). Perl's syntax from the expression level on down is
basically ripped right out of C. Higher level constructs are also quite
similar, modulo some changes to make loop control more readable. There
are a few additional control contructs that help you write more readable
programs, but aren't by any means required. You can program Perl like a
C programmer, and nobody is going to get very mad at you (unless they have
to read your code, of course... :-).
: It will have an integer and a string datatype, and
: you should be able to build arrays out of those objects (associative
: arrays too). Functions would be a nice thing to have, but it must
: always be easy to toss off a quick one-line ap script, i.e., in the
: most trivial case I would like something like
:
: ypcat hosts | ap 'print $1'
:
: or something similar to work just right. I hate having to place braces
: around that simple statement as one has to do in awk.
I agree with the sentiment, but you have to be very, very careful in
picking your defaults. In language design you can't have your cake and
eat it too. Every time you choose some way to default something, you
force every other construct to indicate in some way or other that it is
not using that default. For instance, awk boxed itself into a design
corner by using newline to terminate commands and the null string to
indicate string concatenation. Likewise by not having any way to refer
to an open file symbolically. Not to mention the implicit looping and
pattern/action construct.
I suggest very strongly that you start out designing your language with few
implicit semantics. You can then always add a switch to assume some implicit
semantics. Going the other direction is much harder.
: Most importantly, ap will be driven by an easy-to-understand grammar
: with C-like syntax.
You just contradicted yourself there... :-)
Ease of learning is one thing to design for in a language, but it's only one.
Furthermore, you're not designing for ease of learning except for one specific
set of people.
: There may be 2 or 3 ways to perform a particular task, but there will not
: be 10,000 as there are in perl.
I think you may be exagerating slightly. Perl isn't that good yet.
: The main deficiency of awk that I see is its inability to interface
: well with Unix. Up until recently, awk did not even have ARGC and ARGV,
: not to mention things like file redirection. This is where perl has
: taken a step in the "right" direction. Of course, it could be argued,
: why put symlink(2) into ap when you have ln(1)? Well, this is why perl
: was written: Unix today just cannot provide any performance with shell
: scripts; for better or for worse this has to be coded into the command
: interpreter.
That's only one reason Perl was written. As I mentioned earlier, many
of the design decisions in Perl were for the benefit of (potential)
readability. But Perl also lets you optimize for conciseness, or
writability, or performance (as you mentioned), or portability, or
similarity to any of several languages you may be familiar with. To
gain all these, I consciously traded away a little learnability (at
least in terms of being intimately familiar with all parts of the language).
The hope has been, and I think it was justified, that people would pick
up Perl like they pick up English--enough to get by.
Ask yourself if you really have a clear idea of what you want in your language,
and secondly, if you do, whether you really want enough out of your language.
But good luck, and all that. If you're right and I'm wrong, I'm just glad
I had a 5 year headstart... :-)
: Ideas are welcome. I really want to write this thing; perl is a
: disgrace to the Unix community.
No, Unix is a disgrace to the Unix community. Perl merely inherits some
of that ungracefulness.
When you get down to it, Swiss army knives are ugly too. But everything
is there for a reason.
Love,
Larry Wall
lwall@netlabs.com
: --
: Byron Rakitzis
: byron@archone.tamu.edu
cmf851@anu.oz.au (Albert Langer) (06/05/91)
In article <1991Jun05.013632.3198@convex.com> tchrist@convex.COM (Tom Christiansen) writes: >While we users have contributed somewhat to perl's development, such >as our crying out for the <=> operator, by and large the whole thing >is the brainchild of one man. Certainly he drew from many sources, but >that's not the same as having a committee sit down to define a new >language: along that path lies madness and ADA. [...] >You're obviously a small-is-beautiful person, one valuing aesthetics over >utility. Fine. If you don't like it, don't use it. But a lot of us find >it the neatest thing since sliced bread, and we get a lot of milage out of >this funny-looking camel. It's not some prancing show horse with oh so >beautiful lines triggering oohs and ahs from the audience. But I know >which beastie *I'd* rather have in the desert: the one that gets the job >done quickest without breaking down due to lack of horsepower, or in this >case, camelpower. Haven't you heard that "a camel is a horse designed by a committee". Sorry, couldn't resist. I have nothing against perl, nor against Byron Rakitzis designing something better (though there is no need for him to attack perl in order to do so). Nor do I even have anything against ADA, horses or camels and I have only the mildest of unbigoted prejudices against committees. But I will get on my high horse to prevent any mixed metaphor from passing through the eye of a needle! -- Opinions disclaimed (Authoritative answer from opinion server) Header reply address wrong. Use cmf851@csc2.anu.edu.au
Tom Christiansen <tchrist@convex.COM> (06/06/91)
From the keyboard of byron@archone.tamu.edu (Byron Rakitzis):
:In article <1991Jun05.013632.3198@convex.com> tchrist@convex.COM (Tom Christiansen) writes:
:>
:>I don't really see what this thread is doing in comp.unix.shell, when
:>to my ears it sounds a lot more like a candidate for alt.religion.computers
:>instead. I've directed followups there, which is where people like
:>to argue about things like this.
:
:I'm sorry, you turned it into a religious argument. I started out asking
:for comments and ideas. OK, I did so by talking in an inflammatory fashion
:about perl, but that was not my *primary* purpose.
:So, any reasonable comments about ap, or any alternative to perl for that
:matter, are still welcome. And yes, I think comp.unix.shell is a reasonable
:forum for this discussion, since there is no comp.unix.command.interpreter.
I read unqualified inflammatory comments about "disgrace to UNIX" and
"nauseating", ones without reasoned commentary about why these things were
said. When I see fire, I get heated. Just my nature. It didn't sound
so much like a request for a ideas as much as it did a bashing session.
Larry's already posted at length about some of the background on why perl
turned out as it did. I don't know whether that proved illuminating for
you or not.
If you'd like to fill in the details about what you want in a language, or
perhaps what still bothers you about perl, then maybe someone can find a
language to fit your bill. There are a plethora of languages already
out there. I'm not sure we really need another. You could always just
use ksh or rc or something if you want a UNIX command interpreter.
I happen to know some excellent C programmers who do not use perl. I
didn't mean to imply that these went hand in hand. I just meant that they
probably could if they wanted to. In at least some cases, they simply
don't want to, and it's because they consider perl an "unclean" language.
That's fine for them. I just have to much to do in too little time to
worry about the tool being from the grey market, as it were.
--tom
--
Tom Christiansen tchrist@convex.com convex!tchrist
"Perl is to sed as C is to assembly language." -me
tml@extro.ucc.su.OZ.AU (Tim Long) (06/12/91)
I read Byron's comments on perl and awk with some sympathy. I have had thoughts along similar, although not identical, lines for some time. By coincidence, I just designed and implemented a language to address similar issues; which I would be grateful to here peoples opinion on. But first I'll just mention my own motivations: 1) To have a freely available general purpose interpretive language on UNIX systems. (As opposed to the many more special purpose ones such as awk and the shell). This can be re-phrased as: To have a UNIX language like DOS has BASIC. 2) To have a freely available language suitable for embedding in other programs and systems. 3) To allow programming on UNIX systems which do not have development systems (which are becoming very common). So I guess the design spec was to make a freely available general purpose language suitable both for system supported, and embedded use. By embeded use I mean both within stand-alone devices (like PostScript) and as an adjunct to applications. The source is arranged to be ammenable to this. Although I have been brooding on it for some time I have only actually done it in the last month. I'm reasonably happy with the result at this stage but welcome comment. There is a preliminary manual entry which describes the language, but it's just a manual entry. I'll try to give some more background here. The language, which I am calling ICI for the time being, has dynamic typing and object management, with all the flavour (flow control constructs, operators and syntax) of C. You can write very C like code, if you wish (pointers work), but you can take advantage of the more flexible data handling to make things a lot easier. I have tried to keep the design carefully divided into the language and its fundamental functions and then other groups of functions which relate to the operating environment. Naturally the UNIX shell level version has almost all of these included. I could try to convey the nature of the language here, but it is probably better just to skim the manual entry. So I'll include it here and continue the general discussion after that. Its about 14 pages, but you can start skipping after you get to the standard functions (it finishes after the next line of minuses)... ---------------------------------------------------------------------- ICI(1) ICI(1) NAME ici - General purpose interpretive programming language SYNOPSIS ici [ file ] [ -f file ] [ -i prog ] [ -digit ] [ -l lib ] [ args... ] DESCRIPTION Ici parses ICI program modules as indicated by its arguments. They may or may not cause code to execute as they are parsed. But after the modules have been read, if main is defined as an external function it will be called with the otherwise unused arguments (as an integer count and a pointer to the first element of an array of strings). The options are: file If the first argument does not start with a hyphen it is taken to be a program module as if specified with the -f flag. This may be used to allows ICI programs to execute directly with the #! facility. -f file The file is parsed as an ICI module. -i prog The prog argument is parsed directly as an ICI module. -digit An ICI module is read from the file descriptor digit. -l lib An ICI module is read from $ICILIB/liblib.ici. If ICILIB is not defined as an environment variables, /usr/ici will be used. other Any argument not listed above is gathered into the arguments which will be available to the program. -- All further arguments are gathered into the arguments which will be available to the program. Note that argument parsing is two pass, all the "unused" arguments are determined and assigned to argc and argv before the first module is parsed. If an error occurs which is not dealt with by the program itself, a suitable error message will be printed and ici will exit. The remainder of this manual entry is a brief description of the language. OVERVIEW ICI has dynamic typing and flexible data types with the flow control constructs and operators of C. It is designed to allow all types of programs to be written without the programmer having to take responsibility for memory management and error handling. There are standard functions to provided the sort of support provided by the standard I/O and the C libraries, as well as additional types and functions to support common needs such as simple data bases and character based screen handling. A programmer familiar with C should be able to write ICI programs after reading this document. STATEMENTS An ICI source module consists of a sequence of statements. Statements may be any of the following: expression ; compound-statement if ( expression ) statement if ( expression ) statement else statement while ( expression ) statement do statement while ( expression ) ; for ( exp(opt) ; exp(opt) ; exp(opt) ) statement switch ( expression ) compound-statement case constant-expression : default : break expression(opt) ; continue expression(opt) ; return expression(opt) ; ; storage-class ident function-body storage-class decl-list ; In contrast to C, all statement forms are allowed at all scopes. But in order to distinguish declarations and function definitions from ordinary expressions, the storage class (extern, static or auto) is compulsory. There is no goto statement, but break and continue statements may have an optional expression signifying how many levels to effect. (Not in this version.) The term constant-expression above refers to an expression that is evaluated exactly once, at parse time. In other respects it is unrestricted, it may call functions and have side-effects. Switch statements must be followed by a compound statement, not just any statement as in C. Furthermore, each case- label and the default must label statements at the top level of this compound statement. OBJECTS AND LVALUES In ICI objects are dissociated from the storage locations (variables, for instance) which refer to them. That is, any place which stores a value actually stores a reference to the value. The value itself, whether it is a simple integer or a large structure, has an independent existence. The type of an object is associated with the value, not with any storage locations which may be referring to it. Thus ICI variables are dynamically typed. The separation of storage location and value is transparent in most situations, but in some ways is distinguishable from the case in a language such as C where an object is isomorphic with the storage it occupies. ICI assignment and function argument passing does not transfer a copy of an object, but transfers a reference to the object (that is, the new variable refers to the same object). Thus it is straight forward to have two variables referring to the same object; but this does not mean that assigning to one effects the value of the other. Assignment, even in its most heavily disguised forms, always assigns a new object to a storage location. (Even an operation such a "++i" makes the variable "i" refer to the object whos value is one larger than the object which it previously referred to.) The normal storage locations are the elements of arrays and structures. Simple variables are actually structure elements, although this is not apparent in everyday programming. Some object types are "atomic" (scalar), that is their internal structure is not modifiable. Atomic data types have the property that all objects with the same value are in fact the same object. Integers, floating point numbers, strings and functions are atomic by nature. The only standard non-atomic data types are arrays and structures. An atomic (constant) version of any aggregate type (array or structure) can be obtained. Several of the intrinsicly atomic types do allow read-only access to their interior through indexes, structure keys or pointers. (Strings for example allow indexing to obtain one character sub-strings.) TYPES Each of the following paragraphs is tagged with the internal name of the type, as returned by the typeof() function: int Integers are 32 bit signed integers. All the usual C integer operations work on them. When they are combined with a float, a promoted value is used in the usual C style. Integers are atomic. float All floating point is carried out in the host machine's double precision format. All the usual C floating point operations work. Floats are atomic. string Strings are atomic sequences of characters. Strings may be indexed and have the address taken of internal elements. The value of fetching a sub-element of a string is the one character string at that position unless the index is outside the bounds of the string, in which case the result is the empty string. The first character of a string has index 0. Strings may be used with comparison operators, addition operators (which concatenate) and regular expression matching operators. The standard function sprintf is a good way of generating and formatting strings from mixed data. NULL The NULL type only has one value, NULL (the same name as the type). The NULL value is the general undefined value. Anything uninitialised is generally NULL. array Arrays always start at 0 but extend to positive indexes dynamically as elements are written to them. A read of any element either not yet assigned to or outside the bounds of the array will produce NULL. A write to negative indexes will produce an error, while a write to positive indexes will extend the array. Note that arrays do not attract an implicit ampersand as in C. Use &a[0] to obtain a pointer to the first element of an array "a". The function array() and array constants (see below) can be used to create new arrays. struct Structures are collections of storage locations named by arbitrary keys. Structures acquire storage locations and member names as they are assigned to. Elements which do not exist read as NULL. Pointers may be taken to any member, but pointer arithmetic is only possible amongst element names which are simple integers. Note that normal structure dereferencing with struct.member is as per C, and the member name is a string. Member names which are determined at run time may be specified by enclosing the key in brackets as per: struct.(expr), in which case the key may be any object (derived from any expression). Thus struct.("mem" + "ber") is the same as struct.member. An "index" may also be used, as per: struct[expr], and has the same meaning as struct.(expr). (This is true in general, all data types which allow any indexing of their internal structure operate through the same mechanism and these are only notational variations.) The function struct() and structure constants (see below) can be used to create new structures. From a theoretical standpoint structures are a more general type than arrays. But in practice arrays have some properties structures do not (intrinsic order, length and different concatenation semantics, as well as less storage overhead). Note that by ignoring the value associated with a key, structures are sets (and addition performs set union, see below). ptr Pointers point to places where things are stored, but a pointer may be taken to any object and a nameless storage location will be fabricated if necessary. They allow all the usual C operations. Pointer arithmetic works as long as the pointer points to an aggregate element which is indexed by an integer (for instance all elements of arrays, and amongst structure elements which have integer keys). Pointers are atomic. Note that pointers point to a storage location, not to the value of an object itself. Thus if "a" is an array, after "p = &a;", the expression "*p" will have the same value as "a" even if "a" becomes a structure (through assignment). Note that it is not possible to generate pointers which are in any way illegal or dangling. Also note that because assignment and argument passing does not copy values, pointers are not required as often as they are in C. func Functions are the result of a function declaration and function constants. They are generally only applicable to the function call operation and equality testing. They do not attract an implicit ampersand as in C. Functions are atomic. (Code fragments within functions are also atomic and thus shared amongst all functions.) regexp Regular expressions are atomic items produced by either regular expression constants (see below) or compiled at run-time from a string. They are applicable to the regular expression comparison operators described below. file Files are returned and used by some of the standard functions. See below. window Windows are produced and used by some of the standard functions. See below. Other types (pc, catch, mark, op, module and src) are used internally and are not likely to be encountered in ordinary programming. LEXICON Lexicon is as per C, although there is no preprocessor yet, with the following additions: Adjacent string constants separated only by white space form one concatenated string literal (as per ANSI C). The sequence of a "#" character (not on the start of line), followed by any character except a newline up to the next "#" is a compiled regular expression. The sequences !~, ~~, ~~=, ~~~, $, @, [{, }], [<, and >] are new tokens. The names NULL and onerror are keywords. EXPRESSIONS Expressions are full C expressions (with standard precedence and associativity) with some additions. The overall syntax of an expression is: expression: primary prefix-unary expression expression postfix-unary expression binop expression primary: NULL int-literal float-literal char-literal string-literal regular-expression [ expression-list ] [< assignment-list >] [{ function-body }] ident ( expression ) primary ( expression-list(opt) ) primary [ expression ] primary . struct-key primary -> struct-key struct-key: ident ( expression ) prefix-unary: * & + - ! ~ ++ -- $ @ postfix-unary: ++ -- binop: * / % + - >> << < > <= >= == != ~ !~ ~~ ~~~ & ^ | && || : ? = += -= *= /= %= >>= <<= &= ^= |= ~~= , expression-list: expression expression , expression-list assignment-list: assignment assignment , assignment-list assignment: struct-key = expression The effect and properties of various expression elements are discussed in groups below: simple constants integers and floats are recognised and interpreted as they are in C. Character literals (such as 'a') have the same meaning as in C (ie. they are integers, not characters). String literals have the same lexicon as C except that they produce strings (see Types above). Both character and string literals allow the additional ANSI C backslash escapes (\e \v \a \? \xhh) Regular expressions are those of ed(1). complex constants [ expression-list ] [< assignment-list >] [{ function-body }] Because variables are intrinsically typeless it is necessary that initialisers, even of aggregates, be completely self-describing. This is one of the reasons these forms of constants have been introduced. The first is an array initialised to the given values, the second is a structure with the given keys initialised to the given values. The third is a function. The values in the first two are all computed as constant expressions (not meaning that they are made atomic or may only contain constants, just that they are computed once when they are first parsed). primary ( expression-list(opt) ) Function calls have the usual semantics. But if there are more actual parameters than there are formal parameters in the function's definition, and the function has an auto variable called "vargs", the remaining actual parameters will be formed into an array and assigned to this variable. If there is no excess of actual parameters any "vargs" variable will be undisturbed, in particular, any initialisation it has will be effective. prefix-unary (* & + - ! ~ ++ -- $ @) Apart from "$" and "@", the prefix unary operators have the same meaning as they do in C. The "*" operator requires a ptr as an argument. The "-" operator requires an int or float. "!" and "~" require ints. "++" and "--" work with any values which can be placed on the left of a "+ 1" or "- 1" operation (see below). The rest ("&", "+", "$", "@") work with any types. A "+" always has no effect. If the operand of an "&" is not an lvalue in the usual sense, a one element array will be fabricated to hold the value and a pointer to this element will result. The "$" operator causes the effected expression to evaluated at parse time (thus "$sin(0.5)" will cause the value to be computed once no matter how many times the term is used). The "@" operator returns the "atomic" form of an object. This is a no-op for simple types. When applied to an aggregate the result is a read-only version of the same, which will be the same object as all other atomic forms of equal aggregates (as per ==). regular expression matches (~ !~ ~~ ~~= ~~~) These binary operators perform regular expression matches. In all cases one operand must be a string and the other a regular expression. The operator ~ performs the match and returns 1 or 0 depending whether the string did, or didn't match the expression. Likewise for !~ with opposite values. The operator ~~ matches the string and regular expression and returns the portion of the string matched by the \(...\) enclosed portion of the regular expression, or NULL if the match failed. The ~~= operator is the equivalent assignment operator and follows the usual rules. The ~~~ operator matches the string and the regular expression and returns an array of the portions of the string matched by the \(...\) portions of the regular expression, or NULL if the match failed. (This may move to a function.) assignment operators As previously mentioned, assignment always sets a storage location to a new object. The old value is irrelevant (although it may have been used in the process of a compound assignment operator). Thus there is no implicit cast on assignment, so assigning an int to what is currently a float will result in an int. Assigning to a currently unknown variable will implicitly declare the variable as static. other binary operators The usual C binary operators work as they do in C and on the same range of types. In addition: The == and != operators work on all types. Arrays and structures are equal if they contain the same objects in the same positions. The + and += operators will concatenate string, arrays and structures (in the last case, where identical keys occur the values of the right hand operand take precedence). The << and <<= operator will shift an array, loosing elements from the front and shortening the array as a whole. The <, >, <=, >= operators work on strings, making lexical comparisons. VARIABLES, SCOPES AND INITIALISERS There are exactly three levels of scope. Extern (visible globally by all code), static (visible by code in the module), and auto (visible by code in the function). The variables in the first two are persistent and static. Auto variables have a fresh instantiation created each time a function is entered, and lost on exit (unless there are references to them). Implicitly declared variabled are static. All types of declarations may occur anywhere, they are simple statements unlike in C. They have their effect entirely at parse time and thus produce no code. But the rules about scope still apply. No matter where an extern declaration is made, once it is parsed that variable is visible globally. Similarly once an auto declaration is parsed that variable is visible throughout the scope of the function. Note that initialisers are constant expressions. They are evaluated once at parse time. Even initialisers of autos. Every time a set of auto variables is instantiated (by function entry) the variables are set to these initial values, NULL if there is no initialiser. STANDARD FUNCTIONS The following functions form part of the language definition and should be present in all implementations, including embedded systems. call(func, array) Calls the function with arguments taken from the array. Thus the statement call(func, ["a", "b"]); is equivalent to func("a", "b");. Returns the return value of the function. array(...) Returns a new array formed from the arguments, of which there may be any number, including zero. struct([key, value...]) Returns a structure initialised with the paired keys and values given as arguments, of which there may be any even number, including zero. string = sprintf(format, args...) Returns a sting formatted as per printf(3S) from the format and arguments. All flags and conversions are supported up to System 5.3's. The new ANSI n and p conversions are not provided. Precision and field width * specifications are allowed. Type checking is strict. copy(any) Returns a copy of its argument. A null operation for all types except arrays and structures. To simulate C's structure assignment use: "a = copy(b)" in place of "a = b". Note that this is a "top level" copy. Sub- aggregates are the same sub-aggregates in the copy as in the original. eval(any) Evaluates its argument in the current scope. This is a null operation for any type except strings. For these it will return the value of the variable of that name as looked up in the current scope. exit(int) Exits with the given status. fail(str) Generates a failure with the given message (see Error handling above). float(any) Returns a floating point interpretation of its argument (an int, string or float else it will return 0.0). int(any) Returns an integer interpretation of its argument (a float, string or int else it will return 0). string(any) Returns a string interpretation of its argument (an int, float or string, else it will return the type name in angle brackets). typeof(any) Returns the type name of an object (a string). parse(file/string [,module]) Parses the file or string in a new module, or the context of the given module if supplied. regexp(string) Return the regular expression compiled from the string. sizeof(any) Return the number of elements the object has (Ie. elements of an array or key/value pairs in a struct or characters in a string, returns 1 for all other types). push(array, any) Adds the object to the end of the array, extending it in the process. pop(array) Return the last object in the array and shortens the array by one in the process. It will return NULL is the array is empty already. keys(struct) Returns an array of the keys (ie. member names) of the struct. smash(string1, string2) Returns an array of sub strings from string1 which were delimited by the first character from string2. str = subst(string1, regexp, string2 [, flag]) (Coming soon.) Returns a copy of string1 with sections that matched regexp replaced by string2, globally if flag is given as 1. str = tochar(int) Retuns a one character string made from the integer character code. int = toint(str) Return the character code of the first character of the string. int = rand([int]) Returns a pseudo-random number in the range 0 .. 2^15 - 1. If an argument is supplied this is used to seed the random number generator. string/array = interval(string/array, start [,len]) Returns the interval of the string or array starting at index start an continuing till the end or len elements if len is supplied. Interval extraction outside the bounds of the object will merely leave out the absent elements. array = explode(string) Return an array of the integer character codes of the characters in the string. string = implode(array) Returns a string formed from the concatenation of the integer character codes and strings found in the array. Objects of other types are ignored. file = sopen(string, mode) Returns a file (read only) which when read will return successive characters from the string. module = module(string) Return a new module with its name taken from the string argument. obj = waitfor(obj...) Blocks (waits) until an event indicated by any of its arguments occurs, then returns that argument. The interpretation of an event depends on the nature of each argument. A file argument is triggered when input is available on the file. A float argument waits for that many seconds to expire, an int for that many millisecond (they then return 0, not the argument given). Other interpretations are implementation dependent. Where several events occur simultaneously, the first as listed in the arguments will be returned. Note that in implementations that support many basic file types, some file types may always appear ready for input, despite the fact they are not. unixfuncs() When first called, will define as external functions the unix system interface functions described below (if available). Subsequent calls are ignored. vstack() Return a copy of the variable (scope) stack. Index 0 is the outermost scope. It will contain functions, each optionally followed by a structure of the local variables. (Only for debuggers obviously.) STANDARD EXTERNAL VARIABLES externs A structure of all the extern variables. argc A count of the otherwise unused arguments to the interpreter. argv An array of strings, which are the otherwise unused arguments to the interpreter. (Note this is different from the argument to main, which is a pointer to the first element of this array as it is in C. It is probably easier to use the globals in general.) stdin Standard input. stdout Standard output. stderr Standard error output. OTHER FUNCTIONS The following functions will be present on systems where the environment permits. Missing file arguments are interpreted as standard input or output as appropriate. Pretty obvious, but more details latter. printf(fmt, args...) fprintf(file, fmt, args...) file = fopen(name, mode) file = popen(cmd, mode) /* UNIX only. */ status = system(cmd) str = getchar([file]) str = getline([file]) str = getfile([file]) put(str [,file]) fflush([file]) fclose(file) UNIX FUNCTIONS The following functions will be available on UNIX systems or systems that can mimic UNIX. See unixfuncs() above. They all return an integer. On failure they raise a failure with the error set to the appropriate system error message derived from errno. These interfaces are raw. Use at your own risk. access(), alarm(), acct(), alarm(), chdir(), chmod(), chown(), chroot(), close(), creat(), dup(), _exit(), fork(), getpid(), getpgrp(), getppid(), getuid(), geteuid(), getgid(), getegid(), kill(), link(), lseek(), mkdir(), mknod(), nice(), open(), pause(), rmdir(), setpgrp(), setuid(), setgid(), setgid(), signal(), sync(), ulimit(), umask(), umask(), unlink(), clock(), system(), lockf(), sleep(), /* Rest on the way. */ DATA BASE FUNCTIONS Simple non-indexed, but otherwise fully locked and functional data base support. Not for speed. If your application needs a serious data base, get one, don't use this. Use this for configuration info and all that peripheral stuff. The array's are arrays of strings, which are the fields of a record. The "keyfieldno" is which field number of the record is the key for this operation. The "dbname" is a file name, one table per file. It will be created if it does not exists, but an empty file is ok too. Use UNIX permissions for access control. Read access on read-only files is ok. db_get() returns NULL if not found. More details later. array = db_get(dbname, keyfieldno, value) array = db_delete(dbname, keyfieldno, value) /* Returns old data. */ db_set(dbname, keyfieldno, array) db_add(dbname, array) WINDOWS Upon first reference to any of the window routines standard input is placed in the appropriate modes for non-echoing character at a time input. All input from the terminal should be fetched with w_getchar() and w_edit(). Upon exit (including interrupt) all modes will be restored. win = w_push(line, col, nlines, ncols) Pushes an opaque rectangular window on the screen at the given line and col, which are in screen coordinates. But special values of -1 or -2 for line or col indicate centering or right justification (bottom justification for line) for that aspect of the position. The window will have the given number of lines and columns, unless line or col are less than or equal to zero, in which case they will be that much less than the full screen size. The window is initially clear and on top of all previous windows. w_pop(win) "Pops" the window from the screen; re-exposing anything which the window was hiding. Any window may be popped from the screen, whether it is the top window or not. After a window has been popped it is dead and cannot be put back. Make a new window to do this. Note that if a window is not referenced it will get popped when the next garbage collection occurs, but windows should always be popped explicitly. w_paint(win, line, col, text [,tabs]) Paints the text on the window at the given line and column (in the window's space), with auto-indent on subsequent lines (indicated by a \n character in the text). A string tab specification reminisent of troff (and most word processors) may be given. If supplied it must be a concatenation of tab-specs. Each tab-spec consists of an optional "+" character, followed by a decimal number, followed by an optional leader character, followed one of the letters "L", "C" or "R". If the "+" is supplied the tab position is at a relative offset from the previous one, else it is an distance from the left margin of this text block. If a leader character is given the distance between the current column and the start of the next text will be filled with that character, else a direct motion will be used (use an explicit space leader to clear an area). If an "L" tab is set, the next field of text will start at the tab stop, if a "C" tab is set the next field of text will be centered on the tab stop, and if an "R" tab is set the next field of text will end on the tab stop. The "next field of text" is the text after the tab character up to the next tab, newline or end of string. The last tab-spec in the string will be used repeatedly. Scanning of the tabs starts again on each new line. If no tab specification is given multiple- of-8 column tabs are used, but relative to the start position. For example, a three part title in an 80 column window could be painted with the tab spec "40C80R". win = w_textwin(line, col, text [,tabs]) Pushes a window in the same manner as w_push() (with the same interpretation of line and col) of just sufficient size to hold the given text as it is set by w_paint() with a box around it. It is allowable for column positions in the text being set to have negative numbers during the sizeing phase of this operation. w_mesg(str) Pushes a boxed one line window centred at the bottom of the screen and containing the string. It will be automatically removed after the next keystroke. w_cursorat(win, line, col) Sets the cursor position for this window (in the window's space). When the window is the top window on the screen, the real screen cursor will be at this position. str = w_getchar() Returns the next character from the terminal, without echo and without canonical input processing. For ordinary ASCII characters a one character string is returned. For special keys an appropriate multi character string is returned (currently "F0", "F1" ... "F32", "LEFT", "RIGHT", "UP", "DOWN", "HOME", "END", "PGUP", "PGDOWN"). The screen is refreshed before the waiting for user input. w_ungetchar(str) Pushes a character back. Only one character of push- back is allowed. Only the first 16 characters of the string will be significant (all "characters" returned by w_getchar() are shorter than this). str = w_edit(win, line, col, width, str) Allows traditional editing of an input field at the given position and width and initially containing the given string. Editing will proceed until any unusual character is pressed (that is, not a printing ASCII character or one of the field editing keys such as backspace). At that point the character which caused termination will be pushed back on the input stream and the current text of the field returned. The next call to w_getchar() will return the key which terminated editing. w_box(win) Draws a box around the inside edge of the window. w_clear(win) w_refresh() w_suspend() Restores the terminal to normal modes and moves the cursor to the bottom left. The next window operation will revive the screen. EXAMPLES The following shell command line will print Hello world. ici -p 'printf("Hello world.\n");' The following program prints the basename of its argument: #!ici printf("%s0, argv[1] ~~ #\([^/]*\)$#); The following example illustrates a simple grep like program. The first line makes a Bourne shell pump the program in through file descriptor 3, and passes any arguments to the shell script on to the ICI program. File descriptor 3 is used to avoid disturbing the standard input. This works on all UNIX's but of course 4.2+ and 5.4+ can use #! stuff. Note that errors (such as those encountered upon failure to open a file) are not checked for. The program can be expected to exit with an appropriate message should they occur. exec ici -3 -- "$0" "$@" 3<<'!' extern main(argc, argv) { if (argc < 2) fail(sprintf("usage: %s pattern [files...]", argv[0])); pattern = regexp(argv[1]); if (argc == 2) grep("", stdin); else { for (i = 2; i < argc; ++i) grep(sprintf("%s:", argv[i]), fopen(argv[i], "r")); } } static grep(prefix, file) { while ((s = getline(file)) != NULL) { if (s ~ pattern) printf("%s%s\n", prefix, s); } if (file != stdin) fclose(file); } SEE ALSO awk(1), ed(1), printf(3S), etc. BUGS There is a problem with the right-associativity of ? : stuff. Use brackets when combining multiple ? : operators for the time being. There is an occasional problem with the screen updating with multiple windows. A && or || expression may not result in exactly 0/1 if it gets to the last clause. AUTHOR Tim Long, May '91. ---------------------------------------------------------------------- Returning to the general; My intention was not to replace any of the special purpose tools like the shell, awk, sed etc, nor was it to make a replacement for real programming languages like C. Rather, I regard it as a casual programming tool filling much the same niche as BASIC. As such it doesn't have specific language features dedicated to special tasks (like doing something for each line of input text). But it does (or will) have a broad base of simple primitives to make most routine tasks easy. And of course it is extensible. But you will notice that almost none of its "library" features are the ultimate expression of that area of software technology. In practice every major application has some principle, or piece of software technology, or bit of hardware which is its reason for existence as a product. But products can't run on one leg. Inevitably the endless series of tack-on bits has to be supplied. usually with a great deal of re-invention taking place. I have thought of ICI as assisting in that area. The theory is that if something is a major focus of an application, you won't be using these dicky little features to doit. But for all those other bits, which aren't your real business, you can just use the stuff provided and hack up the rest in a somewhat more amenable programming environment than raw C. Getting back to the language itself... You can easily see from the above how it is like C. What is probably not so obvious is how it is not like C. Here is a grab bag of things to convey some of the flavour. A lot of the usual messing around with strings can be handled by the regular expression operators. The ~~= operator is particularly usful. For example, to reduce a string s which holds a file name to its basename: s ~~= #\([^/]*\)$#; I know it looks a bit insane, but regular expression are like that. I'm not going to apologise for using # rather than / to delimit regular expressions. It was necessary to avoid lexical ambiguity and you get used to it in no time. I don't seem to have written the bit in the manual on error handling. I'll quickly describe it here. The actual syntax of a compound statement is: compound-statement: { statement(rpt) } { statement(rpt) } onerror statement In other words compound statements may have an optional "onerror" followed by a statement. Errors work on the principle that the lower levels of a program know what happened, but the higher levels know what to do about it. When an error occurs, either raised by the interpreter because of something the program did or explicitly by the program, an error message is generated and stored in the global variable "error". The execution stack is then automatically unwound until an onerror clause is found, and execution resumes there. The unwinding will unwind past function calls, recursive calls to the interpreter (through the parse function) etc. If there is no onerror clause in the scope of the execution, the main interpreter loop will bounce the error message all the way out to the invoking system. In the UNIX case this will print the message along with the the source file name, the function name and the line number (which is also available). Although the manual entry doesn't go into that sort of detail it is important to know what things raise errors in what circumstances. But the basic philosophy is that the casual programmer can just ignore the possibility of errors (like failure to open a file) and expect the finished program to exit with a suitable message when things go wrong. The grep program given in the manual is an example of this. One error is checked for explicitly so it can give its own usage message, but failures to open files or syntactically incorrect regular expressions are allowed to fall out naturally. I seem to be wandering a bit here, back to some examples... Functions are of course just another datum. A function called "fred" is just a variable which has been assigned a function. You could re-define the getchar function (even though it is an intrinsic function coded in C) with either: extern getchar() { return tochar(rand() % 256); } OR extern getchar = [{(){return tochar(rand() % 256);}}]; The second it a little perverse, but function constants make more sense in examples like: sort(stuff, [{(a, b){return a < b ? -1 : (a > b ? 1 : 0);}}]); Where the sort comparison function is given in-line so you don't have to go chasing all over the code to find the two line function. (There is a growing library which contains functions like sort, but it is not in a fit state for discussion yet.) They also make more sense when doing object oriented stuff. Suppose you want to define a set of methods in a type. You can just assign the functions directly into the type with: static type = struct(); type.add = [{ (a, b) { return ....; } }]; type.sub = [{ (a, b) { return ....; } }]; Or you could build it in one hit like: type = [< add = [{ (a, b) { return ....; } }], sub = [{ (a, b) { return ....; } }], >]; The variable argument support handles all possibilities. One nice example of it's use comes from the way libraries are done. Because code is parsed at run-time, you don't want have to parse thousands of lines of libraries for every one line program. Instead, a library will just define stub functions, which invoke a standard (library) function called autoload(). They look like this: extern sort() {auto vargs; return autoload("sort", sort, vargs);} Because the function has an auto variable called "vargs", any unused arguments (ie. all of them) are assigned to it. These are than passed on to autoload. The arguments to autoload are a file name (it will prefix it with the standard lib dir), the function being re-defined and the arguments. It will parse the file, check that it redefined the function and then call it with the arguments. From then on of course the new function is defined and the old one gets garbage collected like all lost data. The loaded file could define several functions, and any autoload definition they have will also be replaced a the same time. The current version of autoload looks like this: /* * Parse the given file and transfer control to the newly loaded version * of the function as if that was what was called in the first place. * A loaded file can define more than one function. They will all * be replaced on the first load. See examples below. */ extern autoload(file, func, args) { auto stream; file = "/usr/ici/" + file; parse(stream = fopen(file, "r")); fclose(stream); if (func == eval(func.name)) fail(sprintf("function %s was not found in %s", func.name, file)); return call(eval(func.name), args); } Notice that it references a sub-field of the function like a structure field. This is something that the manual entry doesn't go into details about but you can do things like that. A function, for instance, has sub fields of: "name" a name for the function (for simple declarations this is the name the function was first declared as), "args" an atomic array of the declared formal parameters, "autos" an atomic struct of all the autos and their initial values, and there are a few other fields too. Also notice how it uses the "eval" function to check the value of a variable whos name is determined at run time, and then its use of the call function to call a function with a run-time determined variable argument list. Again notice that it doesn't need to worry about any errors except those it wants to check for explicitly. The others will happen correctly automatically. This one feature can save a lot of code. The sequence of operations on function entry is very deliberate and you can do some neat things with it. In particular, formal parameters are just auto variables which are initialised with the corresponding actual parameter. But they are initialised with this after the explicit initialisations have been applied. Thus you can use an explicit initialisation to give a default value to an argument which is optional, without messing about with the "vargs" variable. For example: static getstuff(file) { auto file = stdin; .... } Structure keys (and switch statements which use a struct) work on the key being the same object as the tag. Thus switching on strings, ints, floats, functions etc. is fine. But you can also use aggregate keys by always using atomic versions of them: switch (@array(this, that)) { case @["one thing", "the other"]: ... case @[1, 2]: ... case @[x, y]: ... } You will notice that because things refer to other things, rather than actually holding them, you use pointers far less often than you do in C. In fact you can start to treat structured data types in a much more casual fashion. I have hardly scratched the surface here, but this is getting a bit long so I'll terminate this section. A few practicalities: on my 386 the initial load image (text+data) comes in at around 110K (85K text + 25K data, of which a disconcerting amount comes from curses, even though all I want it to do is read a terminfo entry. After that, time and space is as proportional to the needs of the program as I could make it (These sort of interprerative languages often have nasty non-linear time or space performance characteristics due to garbage collection and stuff. I have tried to be careful to avoid this sort of behaviour.) For some tasks memory use can be better than expected, because of object sharing... Memory is only needed to hold distinct atomic objects, so although technically there are reasonable memory overheads for, say, an integer, in practice most programs don't have very many distinct integers at any given point in time. After the first instance of a given number you are only paying the overhead of the storage location which refers to each additional reference, which is 4 bytes for array elements and 8 bytes for structure elements. In fact it can happen that large arrays of floating point numbers (which are 8 bytes each) can occupy less space than you would at first expect. I have been thinking of shifting integers to 64 bits, because there would be no overhead in memory use (they already use the same size data block as floats) and I suspect the performance loss would be marginal. But more to the point, 32 bits is just not enough. (A set of good portable 64 bit routines will be gratefully accepted.) I think I have mentioned that it also designed for embedded systems. This means that: a) It is easy to link the interpreter into other C programs, there are as few external symbols as I could manage and it uses just a few classic library functions. b) It is easy to write intrinsic functions (ie. functions written in C which can be called from ICI code). c) It is easy to call ICI functions from C (although at the moment there is slightly more overhead than the inverse direction). d) Where necessary, additional types can be introduced without disturbing the rest of the interpreter. (An example of this is the character based screen handler. It is done in a single source module with only one reference to it (in a configuration array), yet its "window" type integrates fully with the rest of the interpreter.) I think this will have to do for now. I'll post the source, the manual and some sample programs somewhere soon. By the way, I have always regarded designing a programming language as the height of arrogance. And I can only defend this by saying I did it for me. -- Tim Long tml@extro.ucc.su.OZ.AU
phil@ux1.cso.uiuc.edu (Phil Howard KA9WGN) (06/12/91)
tml@extro.ucc.su.OZ.AU (Tim Long) writes: >1) To have a freely available general purpose interpretive language on >UNIX systems. (As opposed to the many more special purpose ones such >as awk and the shell). This can be re-phrased as: To have a UNIX >language like DOS has BASIC. First thing I do after installing DOS on a PC is find BASIC and erase it. -- /***************************************************************************\ / Phil Howard -- KA9WGN -- phil@ux1.cso.uiuc.edu | Guns don't aim guns at \ \ Lietuva laisva -- Brivu Latviju -- Eesti vabaks | people; CRIMINALS do!! / \***************************************************************************/
oz@ursa.ccs.yorku.ca (Ozan Yigit) (06/14/91)
Tom Christiansen <tchrist@convex.COM> writes:
You're obviously a small-is-beautiful person, one valuing aesthetics over
utility. Fine.
This is utterly silly. Just because one language designer* could not pull
it off doesn't mean size, aesthetics and utility are incompatible. If you
must have an excuse to justify syntactic and semantic mess with a kitchen
sink attachment, try to think up something more creative.
oz
---
In seeking the unattainable, simplicity | Internet: oz@nexus.yorku.ca
only gets in the way. -- Alan J. Perlis | Uucp: utai/utzoo!yunexus!oz
1
ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (06/15/91)
In article <1991Jun11.173907.28331@metro.ucc.su.OZ.AU>, tml@extro.ucc.su.OZ.AU (Tim Long) writes: > NAME > ici - General purpose interpretive programming language > > SYNOPSIS > ici [ file ] [ -f file ] [ -i prog ] [ -digit ] [ -l lib ] [ args... ] > > -f file The file is parsed as an ICI module. > > -i prog The prog argument is parsed directly as an ICI > module. I for one would find it rather less confusing if you used the same option name as AWK and sed, namely "-e", as in the following examples I just tried: awk -e 'END {print "Hello, world."}' </dev/null (echo a; echo b; echo c) | sed -n -e 1p -e 2p > -digit An ICI module is read from the file descriptor > digit. May I suggest a slightly more long-winded but rather prettier scheme? Allow a file name (anywhere at all) to have the form /dev/fd# where # is an integer with however many digits it needs. Some research versions of UNIX already support this directly. People familiar with it won't thank you for introducing a new notation. And it takes less than half a page of code to implement your own "f_or_fd_open(string, mode)" function in C, and then use that throughout the implementation of ICI instead of fopen(). [I have done this, and know what I'm talking about.] > structure) can be obtained. Several of the intrinsicly That's intrinsically ^^^^^^^^^^^ > int Integers are 32 bit signed integers. All the usual C > integer operations work on them. When they are > combined with a float, a promoted value is used in the > usual C style. Integers are atomic. Oh *no*! What's the good of using an interpreted language if it only gives me 32-bit integers? If you use any of the PD or redistributable bignum packages around, then it is *EASY* to provide bignum arithmetic in an interpreter. Yes, the bitwise operations &, |, ^, ~ all make perfect sense on integers of any size, and if we define x << y = floor(x * 2**y) x >> y = floor(x * 2**(-y)) then even the shifts make sense. (The shifts won't agree with C, but then shifts in C aren't as portable as you might think.) *Please* give very serious consideration to bignums. For a scripting language, why the flaming xxxx should I *care* what size a register is? > Note that initialisers are constant expressions. They are > evaluated once at parse time. Even initialisers of autos. Why? The restriction to constant initialisers for static and external variables in C made sense, because the initialisation was done by the linker. But that doesn't apply to ICI. About 80% of my initialisations to auto variables in C are -not- constant expressions. Why introduce a restriction that an interpreter like ICI doesn't need and that doesn't give the ICI programmer any extra safety? > The array's are arrays of strings, which are the fields of a The array's what? > EXAMPLES > The following shell command line will print Hello world. > ici -p 'printf("Hello world.\n");' The manual page said nothing about a "-p" option. > The first line makes a Bourne shell pump the > program in through file descriptor 3, and passes any > arguments to the shell script on to the ICI program. I tried something like that on an Apollo once. Didn't work. The shell already had several descriptors other than 0, 1, and 2 open. > A few practicalities: on my 386 the initial load image (text+data) > comes in at around 110K (85K text + 25K data, of which a disconcerting > amount comes from curses, even though all I want it to do is read > a terminfo entry. Surely you can get at a terminfo entry by using just -ltermlib; you don't have to load -lcurses as well. According to the SVID the "Terminfo Level Routines" are setupterm(), tparm(), tputs(), putp(), vidputs(), vidattr(), and mvcur(). setupterm() defines a bunch of variables. If you just use those routines, you shouldn't get much else from Curses. If not, complain. -- Q: What should I know about quicksort? A: That it is *slow*. Q: When should I use it? A: When you have only 256 words of main storage.