[comp.software-eng] Some issues in adjective/adverb languages

xtbjh@levels.sait.edu.au (behoffski) (05/10/91)
Several people have asked me to post a code fragment of how code would 
look using adjectives and algebra.  I have two answers: if you want a 
complete, fully-baked, watertight syntax and semantics, then I haven't 
got it.  If you want some partially-baked ideas with advantages and 
disadvantages, then I have hundreds.  

What I will do here is post what my current best guess of how to wrap 
up the ideas into a language.  I'll try to give the factors that I 
considered at each step.  Since I've started from a position of 
rejecting everything built upon algebra, I've given myself the near-
impossible task to look at more or less everything and to decide what 
I want to use and what to discard.  The following discussion shows the 
point I reached before I posted the idea; the fact that I posted shows 
that I felt I would never be able to complete this project myself.  

Incidentally, I'm very happy for anybody to use my ideas, and I put 
them in the public domain.  If you use anything, a simple 
acknowledgement would be appreciated.  

----------------------------------------------------------------------

The process of building systems out of the raw hardware is the process 
of building a series of languages -- this idea was described by Prof. 
Vlad Turski in a lecture I attended in 83 or 84.  The lowest-level 
software languages build upon the language defined by the hardware; 
successive layers of languages move away from the hardware towards the 
specific application.  

Other seed ideas included the C macro "is_leaf()" that I saw in the 
netlist comparator Gemini by Carl Ebling; a few very neat ideas in 
COBOL (gasp, shock horror), and my own fervent belief that data 
structures were a critical technology, but were improperly rendered 
in existing languages.  On 16 Sept 1988 I tried to simplify the 
treatment of data structures by defining a standard set of nouns and 
verbs that would be common to *all* structures -- and of course saw 
that this was rediculous.  And then I saw that implementing all the 
keywords that were defined by the structure would be brilliant -- 
but that this required adjectives and adverbs to be fully-fledged 
building blocks in the language.  

So at any instant during development of the system, the designer is 
working at two levels:
	- the current environment, as defined by the set of nouns, 
		verbs, adjectives and adverbs that are active, and 
	- the visible behaviour of each of the machines that are 
		currently active.  
I claim that a fundamental problem of systems engineering -- trading 
off flexiblity against efficiency -- is well captured in this model.  

Each machine has an interface consisting of nouns, verbs, adjectives 
and adverbs.  There may be zero or more adjectives per noun; there 
may be zero or more adverbs per verb.  Since flexibility is the goal 
of the interface, these language components should be totally 
orthogonal.  

A crop of issues come up here, here are some:
	- what is the minimal request -- a noun and a verb?  
	- what order: [adj...] noun [adv...] verb? 
	- is it worth distinguishing plural from singular nouns?
	- since a token seems to be a word surrounded by whitespace, 
		how do exceptional tokens (escape chars, strings etc) 
		get defined?  
	- how do value adj/adv elements get handled (e.g. twin-cam 
		.vs. 2 litre)?
	- how do the results get returned?  A stack?  
	- what about mutually exclusive adjs -- "leaf" .vs. "non-leaf"

Formal .vs. actual parameters crops up here.  You may be working with 
four machines active simultaneously, two separate instances of one 
machine and two other machines.  All four instances use "leaf" and 
"node", so there is much ambiguity.  However, "node" on one machine 
is really "file"; another is "sprites", and "node" is "person" and 
"address" in the other two machines.  So "leaf files" is unambiguous 
even where "leaf nodes" is ambiguous.  If all the words in the request 
are ambiguous, then adding something like "of xyz_machine" will be 
needed to define who is the recipient of the request.  

So far, I've said nothing about types.  What is a type?  A bit?  An 
integer?  A string?  A real?  A binary tree?  A file system?  A 
sparse matrix?  A polynomial?  A sprite?  

I would claim that types are a by-product of defining a machine.  The 
notion "positive" is more accurate and useful than the noun-and-verb 
formula "x > 0".  Languish has no inbuilt types, except being able 
to reference objects via pointers (or some similar anonymous mechanism).  
Anything that you want to know about an object that you've been given, 
you need to refer back to the machine that handed you that object.  

So in the case of an "integer", for example, there is a single Languish 
machine that defines integers, and defines the primary language 
elements that are available to operate on integers.  This might include 
"odd".  Another mahine might then inherit "integer" from the 
parent machine and add language elements, for example "prime".  The 
user of this second machine might say "prime odd integers", without 
knowing where the individual language elements were derived.  Later 
on, the implementor might choose to implement a single machine that 
knows about both "prime" and "odd", and has a special code fragment 
added to optimise the pairing "prime odd".  This improved machine 
can be substituted without any disruption to the users of the interface.  

As you can see, I'm a Software Revolutionary.  I'm also Australian and 
one-legged, but I'm *definitely* not a programmer (mega - (-:).  I would 
claim that to continue to bolt fixes onto algbraic languages -- C++, Ada 
and Modula-3 are cases in point -- is fundamentally flawed.  

Another point about machines and the individual words that are in the 
macine interface.  These words are *not* procedures or functions; they 
have no meaning outside of the interface.  I extend this point to say 
that allocating storage on a verb-by-verb basis -- which is how existing 
stack-oriented languages such as Pascal and C work -- is inefficient 
compared to allocating storage on an interface-by-interface basis.  
Each instance of an interface can be given a statically allocated 
interface area.  The parameters handed through the interface, and the 
results and return control point, can be declared in a fixed piece of 
RAM.  This saves the cost of continually grabbing and releasing space 
off of stacks.  

Since the requests handled through the interface tools provided by 
Languish are relatively few, it is worth the expense of maintaining 
the language set dynamically.  As machines are activated and closed, 
the available set of operators changes accordingly.  

The discussion now switches from what the interface has, to what the 
underlying machinery looks like.  behoffski very quickly gets very 
vague here.  

The Languish code fragment "leaf nodes get", when implemented on 
a sequential machine, might cause the following code to be generated 
and then executed:

	?? some initialisation as defined by "get"
	tree.node.StartEnumeration(thistree, context)
	while tree.node.GetNext(context, this_node) do
		?? some per-node operation as defined by "get"
		if tree.leaf.IsLeaf(this_node) then
			?? remember this node as defined by "get"
		end if
	end while

Since each noun, verb, adjective and adverb is intended to be 
independent, the natural choice is to define a fragment of code 
for each language element.  In the case of adjectives, this is to 
give the sequential-verb version, i.e. the code fragment for "leaf"
implements the test "IsLeaf()".  In the case of all the pieces of 
a structure (like all the nodes of the tree, or all the bits of the 
integer, or all the files in the filesystem), there needs to be code 
to enumerate all the pieces.  The order in which nodes are processed 
might be described by the verb or adverbs, e.g. "pre-traverse" .vs. 
"in-traverse" .vs. "post-traverse".  

What language is used to implement the code fragments?  Currently I'm 
restricting myself to two languages:
	- some sub-level machines with Languish interfaces, or 
	- machine code (or possibly assembly code).

The second case is obviously needed when a machine, such as integer, 
operates directly off of the hardware.  The first case is the result 
of the "languages within languages" idea.  I'm currently looking at 
quite agressive register-based models for storing the noun and the 
loop context in the machine-code case: this has forced me to abandon 
most hope of reusing existing languages.  

Hope the discussion above helps,

behoffski
-- 
Brenton Hoff (behoffski)  | Senior Software Engineer | My opinions are mine
xtbjh@Levels.UniSA.edu.au | AWA Transponder          | (and they're weird).