[comp.lang.misc] Anyone want to design a language?

brnstnd@stealth.acf.nyu.edu (02/17/90)

I'm bored, it's a cloudy day, and I can't stand Ada.

This is about the most formal announcement there'll be of a new, still
unnamed language. I'll bet that almost every programmer's tastes can
be satisfied by a single language, and I'm willing to go the distance
to find out.

So what do you want in a compiled, imperative, perhaps object-oriented
language? Take C as a starting point for good ideas and feel free to
use parts of any other language. Remember: This isn't Ada. If it gets
too complicated, trash it. Simple is beautiful. Modular design is
beautiful. And above all, remember that this is going to be a language
people can actually like.

Don't bother complaining that there are too many languages already.
I know. I'm just jumping on the bandwagon, with the unusual twist that
the evolving design will incorporate (with credit) the ideas of
programmers around the world. Undue formality is out: I'm not a
standards committee.

I'm not too worried about logistics: if and when this project heats up,
I'll start imposing a bit more organization. Until then, I'll just
archive the discussion.

---Dan

kjj@varese.UUCP (Kevin Johnson) (02/17/90)

In article <22569:05:10:24@stealth.acf.nyu.edu> brnstnd@stealth.acf.nyu.edu (Dan Bernstein) writes:
>So what do you want in a compiled, imperative, perhaps object-oriented
>language? Take C as a starting point for good ideas and feel free to
>use parts of any other language. Remember: This isn't Ada. If it gets
>too complicated, trash it. Simple is beautiful. Modular design is
>beautiful. And above all, remember that this is going to be a language
>people can actually like.

Rhetorical question: Aren't you talking about C++?
Semi-rhetorical question: What would be this language's intended use?

1. How about string operators.
	I hate handling allocing of space for something silly like strings...

2. Ability to dynamically define new operators

3. Ability to use existing C libraries and headers.
	Otherwise, I want:
		a. screen handling poop
		b. internet poop
		b. X poop
		c. :-)
	Seriously, I would consider the ability to link in existing
	libraries, one way or another, an absolute must.


#include <standard_disclaimer>
.-----------------------------------------------------------------------------.
| Kevin Johnson                                      ...!mcdphx!QIS1!kjj      |
| QIS System Administrator  Motorola MCD             kjj@phx.mcd.mot.com      |

jhallen@wpi.wpi.edu (Joseph H Allen) (02/18/90)

In article <22569:05:10:24@stealth.acf.nyu.edu> brnstnd@stealth.acf.nyu.edu (Dan Bernstein) writes:
>I'm bored, it's a cloudy day, and I can't stand Ada.

>So what do you want in a compiled, imperative, perhaps object-oriented
>language? Take C as a starting point for good ideas and feel free to
>use parts of any other language. Remember: This isn't Ada. If it gets
>too complicated, trash it. Simple is beautiful. Modular design is
>beautiful. And above all, remember that this is going to be a language
>people can actually like.

Ok, I'll bite.  Here's a compiled language I'd like to see:

(1)	No semicolons.
(2)	Except for end of line comments.  /* These comments are evil */
(3)	Block structure indicated by indentation level:

		while a!=b
			int q		; Multi-line body
			q=z*5
			r+=foo(q)
		q=6

		while a!=b r=foo(z*5)	; Single line body
		q=6

		if a==b c=d
		
		if a==b
		  c=d
		  r=500
		else
		  q=r
		  s=t

		etc.

	So that you can blocks in single lines, [ and ] can also be used to
	indicate block structure in the conventional way.

(4)	Overloadable AND definable operators
(5)	All characters allowed in symbols.  For example, a typical definition
	might be:

		int :^&%&^*?: = 8

	This is so that operators can be defined.  There shouldn't be seperate
	character sets for operators and identifiers.  I.E., instead of
	detecting the end of identifiers with the presence of operator or
	whitespace characters, the longest possible string which can be
	a symbol is deteced:

		if these are symbols:

			abc
			def
			abcdef

		then when the input sees:

			abc	; abc is recognized
			abcdef	; abcdef is recognized
			defabc	; def is recognized and then abc is recognized

	This requires that special seperators be used to delimit symbols in
	declarations (or wherever they first appear).  Perhaps to save typing
	there might be a default identifier character set which doesn't
	require these delimiters.

	Symbol recognition should occure before constant recognition.  I.E.,
	this way you can define:
	
			int :4: = 5	; Make 4 equal to 5

(6)	Nifty C declarations which allow one type to be shared among multiple
	declarations each of which might have an initializer.

		Bad:
			it	: integer;
			this	: integer;

		Good:

			int it = 7, this = 5, that = 0, theother = 10

(7)	However, the convoluted C declaration system needs to be replaced:

		instead of:
		
			int **foo[]   an array of pointers to int pointers
			
		do this:

			[] * * int foo

(8)	Eliminate arrays.  They arn't needed.  Use pointers and macros
	instead.

(9)	For constants:

		$hex
		decimal
		%binary
		'c'		; Character
		'abc'		; String

		(sorry, no octal.  You could do it with 0777 but that's gross)

	These are equivelent strings:

		'a' \ 'b' \ 'c' \ 13 \ 10 \ 0
		'abc' \ 13 \ 10 \ 0
	
	I.E., no escape sequences needed.  Strings are just integer constants
	concatenated together.  And constant expression can be used in these
	constants soo:
	
		const int CR = 13
		const int LF = 10
		const int EOS = 0
	
		'abc' \ CR \ LF \ EOF
	
	I would prefer ',' for the concatenation character but it's needed
	elsewhere.

(10) Standard operators.  Grouped together in equal precidence:

	( )	Precidence
	[ ]	Block and precidence

	`	Get symbol from previous scope level (C++'s '::')

	@	Get object at address  (C's '*')
	#	Return address of object (C's '&')
	.	Member selector.  No need for '->'.  Why does C do use ->
		anyway?

	~	Bit-wise not
	-	Negate
	sizeof	Size of argument on right
	base	Distance between member indicated on right and base address
		of structure

	>> <<	Shift right and shift left

	*	Multiply
	/	Divide
	//	Modulous
	&	Bit-wise and

	+	Add
	-	Subtract
	|	Bit-wise or
	^	Bit-wise exclusive or

	= += -= |= ^= *= /= //= &= >>= <<=	Assignments
	&&= ||=
	: +: -: |: ^: *: /: //: &: >>: <<:	Assignments which work the
	&&: ||:					the other way:

						a += b	means add b to a
							and return the result

						a +: b	means add b to a but
							return the original
							value of a

	== >= <= != > <		Comparison

	!			Logical not
  
	&&			Logical and

	||			Logical or

(11)	Blocks return the last value generated:

		a = [ int q q=r r=t t=q ]	; a gets r

(12)	Statements return their last value:

		a = if b==c 500 else 1000	; if b equals c a gets 500
						; otherwise it gets 1000

	(This way, there is no need for the '?:' operator)

(13)	Like C++, declarations can be made anywhere.
(14)	Statements

		if expr expr
		else expr
		do expr until expr
		while expr expr
		return			(C's 'return expr' is 'expr return' in
					 this language)
		break
		continue
		goto expr		(gotos take code addresses)

(15)	Structure and code generation rules:

		int	a
		int	b
		
		these are always right next to each other and a is at
		a lower address.  (GNU C actually puts b at a lower address)
	
		The rules for this are the same as in structures
		
		Structure members are placed in the order they appear in
		the defenition- they are never sorted.  Bytes are first packed
		and then padded on machines with alignment problems.  I.E.,

			typedef IT
			 int a
			 char b
			 char c
			 char d
			 int e

		(oh did I mention that there is no 'struct' symbol?  Use
		typedef and blocks instead)
		
			b c and d are all in one integer.  that integer has
			1 extra byte of padding in it.

(16)    Basic types should be:

		int expr	; a signed of at least expr bits
		uint expr	; an unsigned integer of at least expr bits

	A set of macros might be used for the machine standard types.

(17)	More types shit:

		const		; for addressable constants
		inline		; for small non-addressable constants
				; (and inline functions)
		register	; non-addressable variable
				; fully addressable variable (blank)
		macro		; same as inline but with no type checking

		op LEFT RIGHT RETURN	; an operator or function
					; LEFT indicates left-side arguments
					; RIGHT indicates right-side arguments
					; RETURN is the return type

		op void RIGHT RETURN	; This is a traditional function

(18)	There should be a symbol for the automatic conversion stuff.  This
	way you can control how conversions can work:
	
		op void NSTRING s int CONVERT = atoi(s.text)

	This overloads the converion function CONVERT to allow automatic
	conversion from NSTRINGs (string with a number in it, say) to
	integers.

(19)	prec SYMBOL expr	sets the precidence of operator SYMBOL to
				expr (a number).

(20)	In this function, the right argument is a pointer to a string (s is an
	address of (#) a character (int 8)) and returns a 32 bit integer. 
	When it's called you actually give it an address.

		op void # int 8 s int 32 atoi =
		 ...

	This defines the '+=' operator.  a is a reference to an int.  When you
	call it you put a variable on the left as usual:  x += y  but the
	function will actually receive the address of the variable:
	
		op @ int 32 a int 32 b int :+=: =
		 @a = @a + b

	(and this is the '+:' operator)
	
		op @ int 32 a int 32 b int ::=: =
		 int 32 tmp
		 tmp=@a		; Remember original value of left side
		 @a = @a + b	; Add
		 tmp		; Return original value

	There should also be a modified so that the '@' is automatically
	assumed in the function (I.E., like pointers in pascal):

		op ref @ int 32 a int 32 b int ::=: =
		 int 32 tmp
		 tmp=a		; don't need @a since 'ref' is there
		 a = a + b
		 tmp

(21)	More about structures

		- Classes == structures
		- There should be a word 'inherit' which copies the contents
		  of the indicated structure defenition into the new one.
		  I.E.:
		  
		  typedef me
		   int a
		   int b
		  
		  typedef you
		   inherit me
		   int c

		  is the same as
		  
		  typedef you
		   int a
		   int b
		   int c

			- Inherits with clashing members are not allowed.  Use
		  instances instead.

		- Function arguments are really structures.  If a function
		  returns a structure, that structure is placed on the stack,
		  not a in a global variable. 

		- Member functions are indicated in function declarations.
		  There should be another type qualifyer which indicates a
		  function gets a pointer to the structure and all members of
		  that structure look like local variables to the function.

		- To get the instance.message form, function pointers should
		  be used in the structure.

		- There should be a way to indicate default structure values
		  for when structures are created.  Possibly this could be
		  done in a constructor/destructor system.

(22) Named arguments.  You should be able to call a function in two ways:
	func(10,20,30)	; position arguments
	func(`a=20, `c=30, `b=20)	; argumnents are specifically named


There's much, much more to do and there are problems with what I have.  But
this is the way my ideal language should sort of look like.  The general goal
is to make it both one step above assembly language and completely extendable.
-- 
            "Come on Duke, lets do those crimes" - Debbie
"Yeah... Yeah, lets go get sushi... and not pay" - Duke

gateley@m2.csc.ti.com (John Gateley) (02/18/90)

In article <22569:05:10:24@stealth.acf.nyu.edu> brnstnd@stealth.acf.nyu.edu (Dan Bernstein) writes:
>So what do you want in a compiled, imperative, perhaps object-oriented
>language? Take C as a starting point for good ideas and feel free to
>use parts of any other language.

Lets fix the brain damaged complicated syntax to start with:
make all terms in the language look like:
<simple term> ::= number or other constant etc.
<term> ::= (<term> <term> ...)
Here the first term is an "operation", like a special form name, or
a function call or even possibly another term, and the remaining
terms are "arguments" to the operation.

Presto: easy to understand/learn syntax, no messy parsers, a nice uniform
syntax which allows program manipulation tools to be developed much
easiser.

I don't take credit for this idea of course: it comes from Lisp.

John
gateley@m2.csc.ti.com

brnstnd@stealth.acf.nyu.edu (02/19/90)

Syntax is less important than semantics, though of course a clean,
simple syntax is necessary for a language programmers actually like.
(ALPAL: A Language Programmers Actually Like. Naaah, too pretentious.)

For the moment, general principles are more important than specifics.

There should be some number of macro (preprocessing) levels to handle
trivial syntactic issues. I don't know what system would be best, or
if there even is a best system.

In article <8475@wpi.wpi.edu> jhallen@wpi.wpi.edu (Joseph H Allen) writes:
  [ lots of suggestions ]

1, 2, 3. No semicolons. End-of-line comments. Block structure indicated
         by indentation.

These all relate to the syntax of simple statements and control
structures. The most important general issue is whether structures
should be explicitly terminated. The only advantage of C-ish failure to
terminate is that single-statement structures are slightly shorter; and
there are lots of syntactic disadvantages. Is there anyone out there who
really wouldn't like loop ... end/endloop/pool, etc.?

You propose letting indentation determine structure, and using newlines
as statement terminators. It's easy to convert between this and a more
traditional syntax; in fact, it would be nice to have a macro facility
good enough to do the job. Anyway, I favor a syntax that doesn't depend
on lines or indentation: otherwise it's too easy to make syntax errors.
A line-based syntax also feels very dirty: there are exceptions for
multiple statements on a line, exceptions for single-statement
structures, etc.

4. Overloadable and definable operators

This is another syntax issue. The language MUST provide an unambiguous
syntax for everything. Fortran-90 is the only overloading language I
know that does this well. Overloading just means ambiguous abbreviation,
and definable operators are just a more convenient syntax for certain 
functions. I think overloading should be just kept in mind until 
function calls and any object-oriented facilities are worked out.

5. All characters allowed in symbols.

Would you really want to read a program with ?)*[! as an identifier?
I wouldn't mind a macro facility that could handle this, or the ability
to partition the character set the way you want. However, the basic
language must have some namespace control to do any parsing at all.
Also, this language MUST be interoperable with other languages to be
useful.

The issue of defining your own character set relates strongly to the
syntactic argument about overloading. Never force a reader to learn a
new language.

6. C-like initialization power.

Well, okay. Take it for granted that declarations and definitions will
be at least as powerful as in C.

7. int **foo[] becoming [] * * int foo

Yeah. C would be cleaner if all the ``type constructants'' had a single
syntax. This needs to be considered in much more detail to see what
people would like to use. Perhaps there's a simple, readable, consistent
way to provide everything in both prefix and postfix form; then nobody
can complain.

8. Eliminate arrays in favor of pointers and macros.

Say what? You need some way to express the concept of a contiguous
region of memory. That's what arrays are for. How do pointers cleanly
express multidimensional arrays? The language should know something
about arrays, even if just for efficiency.

9. Constants: $hex, decimal, %binary, 'c', 'abc'

This is again a matter of taste; we'll see what people like. Many
different forms of constants can be provided without hurting simplicity
or readability. I don't agree with the combined syntax for strings and
characters: what do you do with single-character strings? The language
shouldn't have to know about strings; Pascal and Ada deal with strings
poorly. (C's problem is that there isn't a good enough syntax to easily
interface the language with different string-storage techniques.) I also
disagree with the idea of leaving out octal: finding a better syntax is
a good idea but there's no reason to take the feature away.

10. Standard operators.

This is, again, something that must be considered in much greater
detail to get right. (Yes, I agree that @ is a much more logical symbol
than * for indirection.) For the moment let's stick to general issues:
You're right that there should be Algol 68C-like assignments that relate
to a = b and a op= b the same way that a++ relates to ++a.

As for =/== vs. :=/= vs. your :=/== vs. statements-ain't-expressions =/=
vs. =/.EQ. vs. ... : I dunno. When I'm coding on paper I alternate
between paper-only left-arrow/= and C's =/==. On the screen I've begun
using preprocessors that can handle my terminal's extended characters.
As many writers have observed, the problem is balancing paper tradition
with ASCII's rather inexpressive character set.

11, 12. BCPL-like statements returning values.

Yes, of course. C's restriction that you can't do something like
a = {if (b == c) 2; else 3; } is purely annoying. At the very least,
the language should solve this the way that GNU's C compiler does.

13. Declarations anywhere.

Yeah.

14. Control flow statements, control structures: [ various ]

I have some rather heretical thoughts on this subject. I'll make them
clear in another message. (Remember that this isn't Ada. Given an
infinite loop ... endloop, if, and break, you don't need to provide
a terminating loop as a basic construct. Define it instead as a standard
macro. Ada's infinite variety of control structures is awful.)

15. Structure and code generation rules: Variables are in memory in the
    order of declaration.

Yeah. I very much want more control over stack allocation and control
flow than in C. This is not dealt with by any current language and needs
a lot of thought. One idea I've been considering is replacing function
types with statement types. This makes setjmp/longjmp, multiple function
entry points, and various other techniques much cleaner. The problem is,
once again, how and when to allocate stack variables.

I think two goals along these lines are (simpler:) that the language
support varargs (and varargs passing!) cleanly, and (harder, assuming
both a good exception mechanism and OS-generated timer exceptions:)
that the language support enough stack control and longjmp control
that a programmer can build a portable threads library. Note that a
truly working setjmp/longjmp would deal with register variables
correctly; this is probably impossible without OS and hardware support
for a ``register storage vector'' indicating storage locations for all
register variables. It's certainly something to think about...

You mention that structures can disappear in favor of typedef and blocks.
To me it doesn't look like you're simplifying anything; and it's a bad
idea to confuse statement blocks with structure blocks.

Unions can and should be reorderable. union { int a; float b; } and
union { float b; int a; } must, of course, be compatible---except that
in C they'd be initialized differently. (I hope you don't mind unions?)

16. Basic types: int bits, uint bits

I disagree. The basic types should be those types that the machine can
handle quickly. The language must be efficient! It's perfectly fine to
have a standard notation for ``a type long enough to handle N bits'' or
``how many bits are in type X?'' but the language should not make
restrictions on the size of basic types.

(Then again, every case in which portability takes second place to
efficiency must be carefully considered and well documented. Two issues
along these lines are bit sizes and the semantics of mod. As I feel very
strongly that the second should be portable, I shouldn't assume that
nobody feels the same way about the first. Then again, wouldn't a
standard notation for your ``int 8'' be enough?)

What about characters? What about floating-point types, which many
machines support better than ints? What about Ada-like fixed-point
types?

ANSI C messed up in its restrictions on void. void should mean a 0-bit
integer, aligned so that any pointer type can be safely converted back
and forth to void *. So dereferencing a void always produces 0;
sizeof(void) is 0; and so on.

I agree with C's philosophy of only allowing bit packing inside
structures. Other packing methods would really mangle the concept of
pointers.

17. const, inline, register, macro, op LEFT RIGHT RETURN

Interesting idea, the last one. The basic function call syntax should be
what the most people like; if there's a clean way to integrate (say) C's
functions, Forth's statements, and Lisp's whatevers, let's do it.

It would be wonderful to have a way to express more complex data flow
than algebraic expressions and single-type function calls. Unfortunately,
I don't know any good syntax or semantics for data flow. (This is NOT
going to become a so-called ``functional'' language, thank you.) Data
flow is just a convenient way to express temporary (register) variables;
whenever I use an expression twice I wonder if there's some natural
``teeing'' extension to C's ``piping'' notation that would simplify my
code.

18. Automatic conversion.

Yeah. Is it inconsistent how C really mangles the representation when you
convert from int to float while it (typically) doesn't change it at all
when you convert from int * to void *? I'm not sure. It may not be wise
to integrate casts with user-defined conversion functions, as the former
are implementation-dependent while the latter should not be.

19. User-defined precedence.

This is yet another syntactic preprocessing problem. Remember that the
language should be readable!

20. Parameter passing: [ various ideas ]

This has to be dealt with very carefully. I like C's solution: it's clean
while allowing every trick Ada can do. A general principle here (which
you appear to disagree with) is that the form of a function call can
make clear the fact that a variable is not modified.

21. Classes equal structures. Inheritance is just including one structure
    in another. Function arguments are really structures.

This is the kind of idea that I'm looking for. Object-oriented
programming can be very clean given a sufficiently powerful syntax
and semantics for function pointers and structures. Your ``inherit''
keyword is beautiful.

Function arguments being structures: This could be useful if it's
combined with a simple way to deal with the program stack.

Default structure values upon creation: This brings up the issue of
whether there should be a way to call an initialization function the
first time a function is called (as in Modula-2, Fortran-90, and a few
other obsolete [1/2 :-)] languages). I don't think there's any point:
all related ``features'' can be much more cleanly implemented by
combining function pointers with the more usual initializations, or
by keeping an appropriate local variable. (Those are the two methods
used in Modula-2 compilers: the point is that if they're easily
implemented with simpler features, they should be. Modularity.)

> 		- Member functions are indicated in function declarations.
> 		  There should be another type qualifyer which indicates a
> 		  function gets a pointer to the structure and all members of
> 		  that structure look like local variables to the function.

I'm not sure what you're getting at here.

22. Named arguments.

Yeah. This is one of Ada's few good features. The syntax is a bit of a
problem, but I'm sure it can be worked out.

> There's much, much more to do

No duh.

> The general goal
> is to make it both one step above assembly language and completely extendable.

And modular. And clean. And robust. And likable, even fun to use!

---Dan

brnstnd@stealth.acf.nyu.edu (02/19/90)

In article <12507@mcdphx.phx.mcd.mot.com> kjj@varese.UUCP (Kevin Johnson) writes:
> Rhetorical question: Aren't you talking about C++?

Of course not. C++ isn't even close to perfect.

Isn't there some change you'd like to make to C++ so that you'd like
programming even better than you do now? Fine, say so. Then iterate.
Hopefully almost everyone's wishes will be synthesized into this new,
still-to-be-named language. (Now there's a name: FOO. Naaah, people
might remember what foo stands for, and lots of young urban CS types
will think it stands for something object oriented.)

> Semi-rhetorical question: What would be this language's intended use?

Similar to C. It will have the ``low-level'' features of C so that
it's appropriate for systems programming, but there's no particular
focus. (I use UNIX C for complex numerical programming, so I may be
biased.)

> 1. How about string operators.
> 	I hate handling allocing of space for something silly like strings...

This is mainly a library problem (though a good syntax helps).

> 2. Ability to dynamically define new operators

Expand. What exactly do you want? We're not talking p-code, you know.
Are you looking for something that can't be implemented on top of the
language?

> 3. Ability to use existing C libraries and headers.

At least to interface with the loader the same as other languages. As
for headers: one of the first standard applications will be a program to
convert C function prototypes to this language. (Having the same macro
processing is too much to ask, because C's macro processor is so
limited. But most libraries do fine with just the function interface.)
It would be nice if the language could compile to C, but it already
looks like C just isn't powerful enough.

> 	Seriously, I would consider the ability to link in existing
> 	libraries, one way or another, an absolute must.

I agree.

---Dan

nick@lfcs.ed.ac.uk (Nick Rothwell) (02/19/90)

In article <22569:05:10:24@stealth.acf.nyu.edu>, brnstnd@stealth writes:
>This is about the most formal announcement there'll be of a new, still
>unnamed language. I'll bet that almost every programmer's tastes can
>be satisfied by a single language,

You're kidding, right?

>Take C as a starting point for good ideas

You're kidding, right?. If you're going to mess around with antiquated
low-level languages like C, I don't see the point of bringing yet
another one into the world. Look at the functional languages like ML,
or the newer specification/programming languages, or the modern OO
languages like Cardelli's Quest. Or even take Eiffel and give it a
decent formal semantics.

>---Dan

		Nick.
--
Nick Rothwell,	Laboratory for Foundations of Computer Science, Edinburgh.
		nick@lfcs.ed.ac.uk    <Atlantic Ocean>!mcvax!ukc!lfcs!nick
~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~
		       ...als das Kind, Kind war...

jhallen@wpi.wpi.edu (Joseph H Allen) (02/19/90)

In article <4489:05:14:19@stealth.acf.nyu.edu> brnstnd@stealth.acf.nyu.edu (Dan Bernstein) writes:
>Syntax is less important than semantics, though of course a clean,
>simple syntax is necessary for a language programmers actually like.
>(ALPAL: A Language Programmers Actually Like. Naaah, too pretentious.)

	Pretentious?  How about D?

>For the moment, general principles are more important than specifics.

>There should be some number of macro (preprocessing) levels to handle
>trivial syntactic issues. I don't know what system would be best, or
>if there even is a best system.

I think you hint at this later, but I think it should be just as easy to
extend/add control statements as it is to extend/add functions.  Perhaps some
macro processing stage which is more heavily interwoven with the language is
needed for this.  I.E., a macro system in which you can say, "I want an
expression here", "I want this symbol here" etc.

>In article <8475@wpi.wpi.edu> jhallen@wpi.wpi.edu (Joseph H Allen) writes:
>  [ lots of suggestions ]
>
>1, 2, 3. No semicolons. End-of-line comments. Block structure indicated
>         by indentation.
>
>These all relate to the syntax of simple statements and control
>structures. The most important general issue is whether structures
>should be explicitly terminated. The only advantage of C-ish failure to
>terminate is that single-statement structures are slightly shorter; and
>there are lots of syntactic disadvantages. Is there anyone out there who
>really wouldn't like loop ... end/endloop/pool, etc.?

>You propose letting indentation determine structure, and using newlines
>as statement terminators.

I didn't mean newlines to be statement terminators.  If a statment needs to go
into an another line, that's fine.  Statements should be terminated implicitly
when they can no longer be parsed.  This means we have to be very careful
about not having identifiers which can both be operators and variables.  Also
infix must not be shared with prefix or postfix operators.  One problem we
will have is with '-'.  When you see: 

		it = this + 5
		- 10

Does it mean it=this+5-10? or is the -10 a single return value for the block?
I prepose we let this problem stand and solve it with parenthasis:

		it = this + 5
		( - 10)

However, new lines could be used to terminate multi-statement lines (the
single statement problem you talked about):  

If the statement (expression.  No reason to distinguish between the two)
starts after the if expression, then it's a single line statement. 

	if expr expr expr expr expr \n

If the statement doesn't start after the if expression then it's a multi-line
block:

	if expr
	 expr
	 expr expr expr
	 expr

which ends when the indentation level becomes lower.

> It's easy to convert between this and a more
>traditional syntax; in fact, it would be nice to have a macro facility
>good enough to do the job.

Lets not cop out too early...

>Anyway, I favor a syntax that doesn't depend
>on lines or indentation: otherwise it's too easy to make syntax errors.

I disagree with this.  It's more of a pain when the indentationing doesn't
match the block symbols:

	if dfjhkjddf {{{{{{{
		sdfjkhdf
 	  }}}} else {{{{{
		}}}}}}

What people do with C makes things very confusing.

( :-) YOU use the macro processor to make it your way.  The language defualt
will, of course be my way.) 

>A line-based syntax also feels very dirty: there are exceptions for
>multiple statements on a line, exceptions for single-statement
>structures, etc.

It's absolutely consistent.  Only two rules are needed.  Deeper indentation
means a new block and when the body statement begins on the same line as the
structure statement a single line block is indicated (oh, the end of line
terminator shouldn't be "hard".  Instead all statements beginning on the same
line are part of block.  The last statement should be able to continue onto
the next line if it has to:

	if a==b a=d+	; + means has to continue
         e		; onto this line
)

>4. Overloadable and definable operators

>I think overloading should be just kept in mind until 
>function calls and any object-oriented facilities are worked out.

Overloading is too convenient not to have built into the language at every
level.  All of the language intrisics should be as unambiguous as possible. 
However, it will be possible for the user to screw up with definable
operators.  I think this is a style issure- don't overload unless you
absolutely have to.

>5. All characters allowed in symbols.
>
>Would you really want to read a program with ?)*[! as an identifier?

Yes.  And spaces should be allowed in symbols too (I hate those stupid _)

>I wouldn't mind a macro facility that could handle this, or the ability
>to partition the character set the way you want.

Sure take it out of the language why don't you.

>However, the basic
>language must have some namespace control to do any parsing at all.

No it doesn't.  Operators and other symbols are all disginguished by what's in
the symbol table not by what characters they use.  The LEXer only finds words
in the symbol table and passes this on to the parser.  The LEXer doesn't do
anything else (except constants (if you're a real purist, put these in the
symbol table too- all of them :))

>Also, this language MUST be interoperable with other languages to be
>useful.

This and the fact that you can make some very instersting unambiguities are
the downfall of this.  I think the language shouldn't be restrictive.  People
should just excercise self control.  Which would you be more annoyed at?  The
language not letting you use '$' in symbols so that you couldn't access VAX's
special assembler symbols or using IBM's graphic characters and then discover
that doing so isn't very portable?

>The issue of defining your own character set relates strongly to the
>syntactic argument about overloading. Never force a reader to learn a
>new language.

I don't want to start a war here but I'm more for writing then reading and
maintaining.  Let the managers force rules on the programmers to make things
maintainable.

>8. Eliminate arrays in favor of pointers and macros.
>
>Say what? You need some way to express the concept of a contiguous
>region of memory.

No, no, no.  This should be an initializer issue:

	inline # int array = # 256 dup int

Left side:  non addressable pointer to integers (an equate).
Right side: The address of 256 uninitialized ints.

>That's what arrays are for. How do pointers cleanly
>express multidimensional arrays? The language should know something
>about arrays, even if just for efficiency.

What are you some kind of math person :) ?  System langauges don't need
arrays.  C doesn't even really have arrays.. there's no way of passing
mutidimension arrays without seperately passing the size of each dimension.  

Efficiency is an other problem, however.

>9. Constants: $hex, decimal, %binary, 'c', 'abc'
>
>This is again a matter of taste;

No it's not.  It's just plain stupid to do hex contants with 0x... or 0...h
(the C and Intel way).

>I don't agree with the combined syntax for strings and
>characters: what do you do with single-character strings?

The language absolutely must do this.  I find it very annoying that there are
things I can do in assembly language strings that I can't do with C's (namely,
have constant expressions in each character).

Single character strings?  No problem:  string = # 'A'
					string = # 65
					string = # 'ABCDEFG'
					string = # 65 \ 66 \ 67 \ 68 \ 'EFG'

Admittedly, having to put a '#' before each string to get its address is a
pain.

> The language
>shouldn't have to know about strings; Pascal and Ada deal with strings
>poorly. (C's problem is that there isn't a good enough syntax to easily
>interface the language with different string-storage techniques.)

I agree.  C's problem is solved with macros and overloadable operators.

> I also
>disagree with the idea of leaving out octal: finding a better syntax is
>a good idea but there's no reason to take the feature away.

Ok.  Lets make it the braindamaged type.  Octal numbers end with 'O' (oh).
(actually I'm kidding.  I know we have to support octal.  Perhaps there should
even have base-n constants)

>10. Standard operators.

>As for =/== vs. :=/= vs. your :=/== vs. statements-ain't-expressions =/=
>vs. =/.EQ. vs. ... : I dunno. When I'm coding on paper I alternate
>between paper-only left-arrow/= and C's =/==. On the screen I've begun
>using preprocessors that can handle my terminal's extended characters.
>As many writers have observed, the problem is balancing paper tradition
>with ASCII's rather inexpressive character set.

Right.  Allow all characers to be used in symbols.

>You mention that structures can disappear in favor of typedef and blocks.
>To me it doesn't look like you're simplifying anything; and it's a bad
>idea to confuse statement blocks with structure blocks.

What's the difference between this and what C does?  I just don't see the need
for C's 'struct' keyword.  Every structure I ever make begins like this:

	typedef struct foo FOO;
	struct foo
	 {
	 FOO *next;
	 etc...
	 };

>Unions can and should be reorderable. union { int a; float b; } and
>union { float b; int a; } must, of course, be compatible---except that
>in C they'd be initialized differently. (I hope you don't mind unions?)

Frankly I wish there were some easier way to deal with unions.  As far as I'm
concerned, unions are just as difficult to use as casts:

	x->thing.memeber=7	(union)
	(cast)x->thing=7	(cast)

Perhaps we should have overloadable variables?

>16. Basic types: int bits, uint bits
>
>I disagree. The basic types should be those types that the machine can
>handle quickly. The language must be efficient! It's perfectly fine to
>have a standard notation for ``a type long enough to handle N bits'' or
>``how many bits are in type X?'' but the language should not make
>restrictions on the size of basic types.

This isn't making any restrictions.  The only types provided are the machine
primitive ones (char short long etc..) this just provides a way of selecting
the proper one for the machine being used.

>nobody feels the same way about the first. Then again, wouldn't a
>standard notation for your ``int 8'' be enough?)

Yes, use a header file.

>What about characters? What about floating-point types, which many
>machines support better than ints? What about Ada-like fixed-point
>types?

Lots more to do...

>ANSI C messed up in its restrictions on void. void should mean a 0-bit
>integer, aligned so that any pointer type can be safely converted back
>and forth to void *. So dereferencing a void always produces 0;
>sizeof(void) is 0; and so on.

Yes perhaps there should both be 'void' and 'unspecified'.  There might be
some way to combine this with variable arguments.

>[only allow bit packing in structures]

I don't know.  I think if the machine can handle chars and ints and also has
alignment problems char packing isn't that bad.  The only time there is a
problem with this is when you try to increment a pointer from a char to an
int.  Inside of structures this isn't a problem because I want to provide a
special 'sizeof' like operator:  'base'  return the distance between the
structure base address and one of its members.

>19. User-defined precedence.

>This is yet another syntactic preprocessing problem. Remember that the
>language should be readable!

This is really just part of definable operators.
-- 
            "Come on Duke, lets do those crimes" - Debbie
"Yeah... Yeah, lets go get sushi... and not pay" - Duke

gateley@m2.csc.ti.com (John Gateley) (02/20/90)

In article <8475@wpi.wpi.edu> jhallen@wpi.wpi.edu (Joseph H Allen) writes:
<Ok, I'll bite.  Here's a compiled language I'd like to see:
<(1)	No semicolons.
<(2)	Except for end of line comments.  /* These comments are evil */
<(3)	Block structure indicated by indentation level:
<
<		while a!=b
<			int q		; Multi-line body
<			q=z*5
<			r+=foo(q)
<...

Now, just take it a little further: get rid of all infix notation,
and let "blocks" be denoted by ( and ) and you get:
(begin
  (while (!= a b)
    (let ((q integer))
      (= q (* z 5))
      (+= r (foo q))))
  
  ...)
and you have achieved a truly simple easy to use syntax where you
dont have to worry about indentation (the editor does it for you),
and programs which manipulate source code are much much easier to
write.

<(4)	Overloadable AND definable operators

Using syntax like I descibed above avoids this issue entirely (there is
no such thing as an operator). The guy that occurs in the first position
of ( ... ) can be overloaded/defined in well behaved ways easily.

<(5)	All characters allowed in symbols.

Using syntax like I described above, the ONLY characters not allowed
in symbols are (, ), and ; (for comments). In real life, this set usually
is a little bit larger, but not much.

<	the longest possible string which can be
<	a symbol is deteced:
<		if these are symbols:
<			abc
<			def
<			abcdef
<		then when the input sees:
<			defabc	; def is recognized and then abc is recognized

Uggh, gross!!

<	Symbol recognition should occure before constant recognition.  I.E.,
<	this way you can define:
<			int :4: = 5	; Make 4 equal to 5

You are starting to have to make lots of rules to handle
your way of life, it should be simple not complex, and
having to worry about details like "Does :4: means : 4 : or a symbol
named ":4:"" confuses people.

<(6)	Nifty C declarations which allow one type to be shared among multiple
<	declarations each of which might have an initializer.

Even better, lets support dynamic typing, then we can avoid this
issue as well.

<(7)	However, the convoluted C declaration system needs to be replaced:
<		instead of:
<			int **foo[]   an array of pointers to int pointers
<		do this:
<			[] * * int foo

Hmmm, this is very similar to my arguments for a simpler syntax:
by specifying everything in prefix notation, you avoid the
precedence problem (among other things). Even though you keep
operators and precedence for expressions, you want to remove them
from declarations! The first line is perfectly acceptable if you
remember the precedence/associativeness of [] and *. Of course, I
think that it should be ([] (* (* int))) for the type spec.

<(8)	Eliminate arrays.  They arn't needed.  Use pointers and macros
<	instead.

Ummm, if you pursue this argument ad infinitum (is that the right
latin word?), then you wind up with a turing machine, the lambda calculus,
or a combinatory logic, or some other beast which is close to impossible
to program in. On the other hand if you have a standard interface to
arrays, then looking at other peoples code will be easier: you won't
have to learn their version of the macros.

Perhaps, though, you meant provide a standard set of macros to implement
arrays on top of pointers, I have no argument with this.

<(10) Standard operators.  Grouped together in equal precidence:
< [HUGE TABLE DELETED]

I hate precedence: I can never remember which is which.
Instead, using the lisp-style syntax again avoids this issue:
everything is parenthesized automatically and you don't ever
have to worry about precedence.

(+ 2 (* 3 4) (- 5 6))

<(11)	Blocks return the last value generated:
<(12)	Statements return their last value:

Now go one step further and eliminate the concept of a statement: now
everything is an expression and returns values. In the spirit of something
posted somewhere recently: we have reduced the number 2 (statements and
expressions) to 1 (expressions).

John
gateley@m2.csc.ti.com

gateley@m2.csc.ti.com (John Gateley) (02/20/90)

In article <12507@mcdphx.phx.mcd.mot.com> kjj@varese.UUCP (Kevin Johnson) writes:
>1. How about string operators.
>	I hate handling allocing of space for something silly like strings...

But, string sizes are not known at compile time, and so must be handled
by the heap (i.e. alloc).

>3. Ability to use existing C libraries and headers.

This is truly a difficult problem, because you have to say: "whats so
special about C, I want my <foo> libraries" where <foo> might be Ada,
Lisp, PDP-11 assembler, or whatever. Instead, why not write a routine
which, given a C library, would convert it into a library for the new
language. This would involve changing the entry points a little, but
that should be about it.

John
gateley@m2.csc.ti.com

peter@ficc.uu.net (Peter da Silva) (02/20/90)

Coroutines!

Some minor C syntax cleanups.

Replace C pointer syntax (prefix *) with Pascal pointer syntax (postfix ^).
This will automatically clean up declarations and get rid of an ugly two
character symbol ("->").

Replace = with :=, but don't replace ==. That way it's *never* OK to say
a = b. Keep +=, -=, etc the same.
-- 
 _--_|\  Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>.
/      \
\_.--._/ Xenix Support -- it's not just a job, it's an adventure!
      v  "Have you hugged your wolf today?" `-_-'

peter@ficc.uu.net (Peter da Silva) (02/20/90)

> > 2. Ability to dynamically define new operators

> Expand. What exactly do you want?

MATRIX matrix.*(MATRIX a, MATRIX b);


	A := A matrix.* B;

This could be abbreviated to:

	A := A .* B;

But not (as in ADA):

	A := A * B;
-- 
 _--_|\  Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>.
/      \
\_.--._/ Xenix Support -- it's not just a job, it's an adventure!
      v  "Have you hugged your wolf today?" `-_-'

daniels@teklds.WR.TEK.COM (Scott Daniels) (02/20/90)

A pet peeve: why only decimal radix for floating point?  How about a notation
like: d:xxx.xxx  Where d is the largest digit allowable in the radix, (thus
solving the "what base do I write the base in problem).  Then decimal 4.75
ould be 9:4.75, (hex) F:4.C, (octal) 7:4.6, or (binary) 1:100.11.  The 
base for the exponent should be the same as the radix of the number itself 
(So the exponent indicates radix-point shifts) Note: exponent separator better
be @ or something instead of `E'.

Another thing I would like to be able to do is to indicate that a function
is "pure" (only depends on its args), thus allowing the compiler to 
(1) complain when I violate that, and (2) compile-time evaluate values
to produce constants, and (3) expand inline as it sees fit.

On strings: SAIL had strings which consisted of a length and pointer part
I know this prevents the "infinite length string" idea, but (1) substrings
are easy, [lexing a line involves copying no character data], and (2) it is
(almost) as easy to get to the last few chars of a string as the first few.

On structures: 
(1) How about some way to provide structs which have negatively indexed 
    fields as well as positively indexed fields.  This allows structure 
    elaboration in two ways (great for protocol layering.)
(2) Rather than tightening the layout of records, have a modifier (like
    Pascal's "packed", but not optional to implement), which says "no
    padding", and otherwise use a loose rule that says 	"adding a field
    must not change the arrangement of variables previously placed," and
    explicitly "fields may be placed in holes in the structure".  Thus the
    compiler is free to use the same layout for the following structs 
    (assuming aligned ints):	
    struct { char c,d; int a; } and struct { char c; int a; char d; }
(3) Allow "anonymous" incorporation of a structure (or union): ie bring
    the field names to the same level as explicitly provided names, (of
    course name conflicts are errors).

On types: 	
>16. Basic types: int bits, uint bits
>I disagree. The basic types should be those types that the machine can
>handle quickly. The language must be efficient! It's perfectly fine to
>have a standard notation for ``a type long enough to handle N bits'' or
>``how many bits are in type X?'' but the language should not make
>restrictions on the size of basic types.
This was provided as a notation for accessing the basic types, but does 
not go far enough.  I would like to be able to give a range which must be 
representable (like integral[1..29]), and have the type chosen, I don't 
always know the number of bits (and I could always use [0:(1<<7)-1] for 
7-bitters).

>(Then again, every case in which portability takes second place to
>efficiency must be carefully considered and well documented. Two issues
>along these lines are bit sizes and the semantics of mod. As I feel very
>strongly that the second should be portable, I shouldn't assume that
>nobody feels the same way about the first. Then again, wouldn't a
>standard notation for your ``int 8'' be enough?)
How about ADDING an operation `modulo' in addition to `%'.  Then we can say
either "fast, fits integer divide," or "result in range [0,modulus)."

>I agree with C's philosophy of only allowing bit packing inside structures. 
But, it would be nice to have a packed vector of bits available somewhere
(inside structures only would be fine with me).

>Function arguments being structures: This could be useful if it's
>combined with a simple way to deal with the program stack.
This probably introduces another form of structure packing (and is a
good idea but...be sure to allow the compiler to delete unused variables
if it can remove them)

Type coersions:
    Something between C's "forget all your type checking" and many other 
language's "you can't get there from here."  How about a coersion that
has both `from' and `to' parts.  suppose: 
	coerce ::= ( to_type : from_type ) e
then for:	( dest_type : source_type ) e;
    It is an error if e is not "easily-coerced" to type "source_type", 
the internal conversion (to dest_type) is performed, and that is the
type of the whole expression.

-Scott Daniels
daniels@cse.ogi.edu

new@udel.edu (Darren New) (02/20/90)

Actually, I've been playing with a language that has flexibility
as its greatest goal.  Each function is compiled as it is seen
and can in turn compile other functions.  This is essentially what
FORTH and to some extent LISP do. However, in my language there are
no parsers that cannot be overridden.  That is, to parse the language,
each character is read and appended to a buffer. Then each entry in
a "parse" array is called in turn. Once one of the entries recognises
the token in the buffer, it outputs the object code for that token
and clears the buffer. This technique allows overloaded functions,
new literals, "create-on-reference" functions (for example, it could
create a "typecast" function given the name) and so on. This would
be an especial boon if the intermediate code was standardized and 
fairly flexible, allowing optimizations to different architechures
much like the Smalltalk or Pascal PCodes do.  

I think that if you can't add new literal types, you don't have a 
truely new language. To do this, you must be able to define
internal representations of high-level structures in terms of low-level
structures (like bit strings).  From that, the language can be
customized out the ptooks.  Of course, it may not be readable
when you are done, but...           :-)

		   -- Darren

cik@l.cc.purdue.edu (Herman Rubin) (02/20/90)

In article <4489:05:14:19@stealth.acf.nyu.edu>, brnstnd@stealth.acf.nyu.edu writes:
> Syntax is less important than semantics, though of course a clean,
> simple syntax is necessary for a language programmers actually like.
> (ALPAL: A Language Programmers Actually Like. Naaah, too pretentious.)

What is a simple syntax?  Simple for whom, the human or the machine?  For
example, most assembler languages, macro designs, etc., have simple syntax
for the machine but not for the human.

> For the moment, general principles are more important than specifics.
> 
> There should be some number of macro (preprocessing) levels to handle
> trivial syntactic issues. I don't know what system would be best, or
> if there even is a best system.

I find the lack of a versatile typed macro processor extremely inconvenient,
and I would find one preferable to any existing language, even if no other
tools were available.  For example,

			x = y - z

should be the (= -) macro (or some such designation, and it should allow for
the types of the arguments.

> In article <8475@wpi.wpi.edu> jhallen@wpi.wpi.edu (Joseph H Allen) writes:
>   [ lots of suggestions ]
> 
> 1, 2, 3. No semicolons. End-of-line comments. Block structure indicated
>          by indentation.
> 
> These all relate to the syntax of simple statements and control
> structures. The most important general issue is whether structures
> should be explicitly terminated. The only advantage of C-ish failure to
> terminate is that single-statement structures are slightly shorter; and
> there are lots of syntactic disadvantages. Is there anyone out there who
> really wouldn't like loop ... end/endloop/pool, etc.?

I believe that we should have semicolons, but that an end-of-line should
terminate a statement unless a specific exception is made.  This is one of
the most common sources of errors in C programs, and is in any case a
nuisance.  I definitely do not like to have to use such clumsiness as
typing unnecessary strings for the convenience of the compiler.  I do not
like endloop/pool.

I also do not believe that indentation is necessarily the right method for
block structure.  For one thing, by the 10th block in, it is certainly a
nuisance.  A suggestion would be to allow arbitrary block labels, and have
an end pseudoinstruction with multiple labels.  This is especially important
when aborting to an explicit earlier place.

			....................

> 4. Overloadable and definable operators
> 
> This is another syntax issue. The language MUST provide an unambiguous
> syntax for everything. Fortran-90 is the only overloading language I
> know that does this well. Overloading just means ambiguous abbreviation,
> and definable operators are just a more convenient syntax for certain 
> functions.

NO NO NO!  An operator is not a function, especially if it is different
for arguments of different types, such as the sum, product, power operators,
etc.

Also, I see no more reason for a function call, or even function notation,
for power than for sum.  It is no more reasonable to require x = pow(y,z)
than x = sum(y,z).

> 5. All characters allowed in symbols.
> 
> Would you really want to read a program with ?)*[! as an identifier?

Only as a macro name (see above), which the macro being more in the form
of

		x ? y )* z [n!

or something similar.

> I wouldn't mind a macro facility that could handle this, or the ability
> to partition the character set the way you want. However, the basic
> language must have some namespace control to do any parsing at all.
> Also, this language MUST be interoperable with other languages to be
> useful.

This means that global names must not be changed by the compiler.  It is
a real nuisance that the function sin in C becomes _sin to the loader, and
that erf in Fortran becomes _erf_.  When writing a program, I should not
have to know from which language the subroutine library got the subroutines
used, nor should I have to replicate subroutine libraries because of this.
It is definitely the case that one may want to use subroutines from different
sources, and this requires that names be unchanged by the compiler.  This even
applies if blocks are used across subroutines.

> The issue of defining your own character set relates strongly to the
> syntactic argument about overloading. Never force a reader to learn a
> new language.

This may be necessary.  My basic operations are frequently so clumsy to
duplicate in the existing languages that it is necessary to do otherwise.
This includes the introduction of operator symbols and strings.  For
example, suppose I want to unpack floating point numbers into their
exponents and mantissas.  I do not want to have to try to do this with
the debilities of languages like C.

> 6. C-like initialization power.
> 
> Well, okay. Take it for granted that declarations and definitions will
> be at least as powerful as in C.
> 
> 7. int **foo[] becoming [] * * int foo

Or even better @ @ int foo.  This is an unnecessay overloading of *, done
because early UNIX had @ as the line kill character.

> Yeah. C would be cleaner if all the ``type constructants'' had a single
> syntax. This needs to be considered in much more detail to see what
> people would like to use. Perhaps there's a simple, readable, consistent
> way to provide everything in both prefix and postfix form; then nobody
> can complain.
> 
> 8. Eliminate arrays in favor of pointers and macros.
> 
> Say what? You need some way to express the concept of a contiguous
> region of memory. That's what arrays are for. How do pointers cleanly
> express multidimensional arrays? The language should know something
> about arrays, even if just for efficiency.

I agree.  this is one of the great lacks in C.

> 9. Constants: $hex, decimal, %binary, 'c', 'abc'
> 
> This is again a matter of taste; we'll see what people like. Many
> different forms of constants can be provided without hurting simplicity
> or readability. I don't agree with the combined syntax for strings and
> characters: what do you do with single-character strings? The language
> shouldn't have to know about strings; Pascal and Ada deal with strings
> poorly. (C's problem is that there isn't a good enough syntax to easily
> interface the language with different string-storage techniques.) I also
> disagree with the idea of leaving out octal: finding a better syntax is
> a good idea but there's no reason to take the feature away.

Here nothing should be left out.  There is a great need for floating point
numbers not in decimal, at least octal or hex for the mantissa and exponent,
but a base 2 exponent.

> 10. Standard operators.
> 
> This is, again, something that must be considered in much greater
> detail to get right. (Yes, I agree that @ is a much more logical symbol
> than * for indirection.) For the moment let's stick to general issues:
> You're right that there should be Algol 68C-like assignments that relate
> to a = b and a op= b the same way that a++ relates to ++a.

The use of ++ and -- is another example which leads to problems.  I have no
problem with op=, but using bad symbols because you did not think of anything
better is at least highly debatable.  There is also the systematic use of
symbols in C which conflict with long-standing notation.  ASCII is not enough
in any case.

> As for =/== vs. :=/= vs. your :=/== vs. statements-ain't-expressions =/=
> vs. =/.EQ. vs. ... : I dunno. When I'm coding on paper I alternate
> between paper-only left-arrow/= and C's =/==. On the screen I've begun
> using preprocessors that can handle my terminal's extended characters.
> As many writers have observed, the problem is balancing paper tradition
> with ASCII's rather inexpressive character set.

			........................

> 13. Declarations anywhere.
> 
> Yeah.
> 
> 14. Control flow statements, control structures: [ various ]
> 
> I have some rather heretical thoughts on this subject. I'll make them
> clear in another message. (Remember that this isn't Ada. Given an
> infinite loop ... endloop, if, and break, you don't need to provide
> a terminating loop as a basic construct. Define it instead as a standard
> macro. Ada's infinite variety of control structures is awful.)

Mine are even more heretical.  I insist on goto, and frequently terminate
a loop by jumping out of it.  Spaghetti algorithms call for spaghetti code,
and I have lots of them.  Structured programming can cause huge inefficien-
cies, as well as being harder to understand.

> 15. Structure and code generation rules: Variables are in memory in the
>     order of declaration.
> 
> Yeah. I very much want more control over stack allocation and control
> flow than in C. This is not dealt with by any current language and needs
> a lot of thought. One idea I've been considering is replacing function
> types with statement types. This makes setjmp/longjmp, multiple function
> entry points, and various other techniques much cleaner. The problem is,
> once again, how and when to allocate stack variables.

DO NOT INSIST ON PASSING ARGUMENTS WITH STACKS.  Register arguments are
frequently better, and there are numerous other ways, such as argument
arrays.  There are situations where stacks are the way to do it, but 
memory references in general should be avoided where possible.

			.......................

> 16. Basic types: int bits, uint bits
> 
> I disagree. The basic types should be those types that the machine can
> handle quickly. The language must be efficient! It's perfectly fine to
> have a standard notation for ``a type long enough to handle N bits'' or
> ``how many bits are in type X?'' but the language should not make
> restrictions on the size of basic types.

A type need not even consist of adjacent elements.  If a string requires
a beginning address and a length, the pair is the designator of the string.
It may or may not be desirable to have the indices in adjacent memory
locations.  An array descriptor would have the location of the 0,0, ..., 0
element, the dimensions, and if necessary the storage locations; these need
not be adjacent, and some of this information can be shared.

> (Then again, every case in which portability takes second place to
> efficiency must be carefully considered and well documented. Two issues
> along these lines are bit sizes and the semantics of mod. As I feel very
> strongly that the second should be portable, I shouldn't assume that
> nobody feels the same way about the first. Then again, wouldn't a
> standard notation for your ``int 8'' be enough?)
> 
> What about characters? What about floating-point types, which many
> machines support better than ints? What about Ada-like fixed-point
> types?

It is almost impossible to get full portability on anything other than
integer arithmetic, and even here there are problems.  

			.......................

> 19. User-defined precedence.
> 
> This is yet another syntactic preprocessing problem. Remember that the
> language should be readable!

I suspect that much of the problems are with precedence.  I am not sure
that we would not be better off without trying to make it rigid.  Some of
the precedences in C are gotten wrong by just about everybody.  We could
possibly used numbered parentheses to get around it.

> 20. Parameter passing: [ various ideas ]
> 
> This has to be dealt with very carefully. I like C's solution: it's clean
> while allowing every trick Ada can do. A general principle here (which
> you appear to disagree with) is that the form of a function call can
> make clear the fact that a variable is not modified.

But it is clumsy and slow.  Now if we had a decent notation, so a function
could return a list of results (NOT a struct). we could get around this. 
But trying to keep functions from having side effects is a losing proposition.

> 21. Classes equal structures. Inheritance is just including one structure
>     in another. Function arguments are really structures.

Classes equal typedefs.  Structures are needed for more complicated situations.
DO NOT insist on function arguments being structures; more time can be wasted
by forming the structure than by computing the function.  A list of values is
more general than a structure by far, and the list need not be in consecutive
locations in any way.

			......................

> > The general goal
> > is to make it both one step above assembly language and completely extendable.
> 
> And modular. And clean. And robust. And likable, even fun to use!

All of this can be done, but not portable.  Good code can usually at most
be semi-portable.

------------------------------------------------------------------------------

Another point that I wish to address is what I call the usurpation of notation.
There are many somewhat standard uses of symbols in mathematics which are used
in languages such as C for totally different meanings.  The two most flagrant
ones here are | and ^.  I know of several uses of | in mathematics, none of 
which is "or".  The notation by Backus was a long vertical, and it was used
in the sense of || in C.  The most common use of vertical lines in mathematics
is for absolute value, and I believe it should have this meaning in programming
as an overloaded operator.  The use in mathematics extends for well over 100
years.

There are even other uses of ^ in CS before C, none of which was exclusive or.
This taking of symbols and defining their use to be something else because the
inventors of the language are not sufficiently knowledgeable is a bad idea.
Fortran designers avoided this and made it clear that they used * and ** because
these were not already used for something else, and what they wanted to use was
not available.  I suggest that we make every effort to avoid using symbols which
have other meanings.  Whenever mathematical notation disagrees with that of 
applications, it is usually the mathematics that got there first.

Also, make sure that the language is not restricted so that only an idiot can
stand it.  I believe it is possible to produce a language in which good 
programming can be done.

Something to keep in mind is that people will find ways to do things that
you have not thought of.  So it is necessary to allow computer objects to
be used as bit strings, to allow bitwise operations on floating point 
numbers, to allow the use of a number as something other than the language
intended.

Furthermore, the programmer may know things the compiler can use like frequency,
etc.  The programmer may have a good reason for keeping something in a 
register, or even insisting it be stored, which it is hard for the compiler
to figure out.  I have a natural example of a recursive program in which
several registers should be kept across recursions.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)

machaffi@fred.cs.washington.edu (Scott MacHaffie) (02/20/90)

In article <4489:05:14:19@stealth.acf.nyu.edu> brnstnd@stealth.acf.nyu.edu (Dan Bernstein) writes:
>14. Control flow statements, control structures: [ various ]
>
>I have some rather heretical thoughts on this subject. I'll make them
>clear in another message. (Remember that this isn't Ada. Given an
>infinite loop ... endloop, if, and break, you don't need to provide
>a terminating loop as a basic construct. Define it instead as a standard
>macro. Ada's infinite variety of control structures is awful.)

Unconditional loops have a serious problem: you have to read all of the
code inside the loop to find out when (or if) it terminates. Replacing
while and for with loop would be a bad idea. Even providing loop
means that people will use it and stick "end loop" inside the loop
(this happens in ada).

The advantage of a while/for loop is that the terminating condition
(or at least the standard terminating condition) is easy to find.
Then, only exceptional terminating conditions are inside the loop.

			Scott MacHaffie

kjj@varese.UUCP (Kevin Johnson) (02/20/90)

In article <4623:05:31:06@stealth.acf.nyu.edu> brnstnd@stealth.acf.nyu.edu (Dan Bernstein) writes:
>In article <12507@mcdphx.phx.mcd.mot.com> kjj@varese.UUCP (Kevin Johnson) writes:
>> Rhetorical question: Aren't you talking about C++?
>Of course not. C++ isn't even close to perfect.
>Isn't there some change you'd like to make to C++ so that you'd like
>programming even better than you do now? Fine, say so. Then iterate.
>Hopefully almost everyone's wishes will be synthesized into this new,
>still-to-be-named language. (Now there's a name: FOO. Naaah, people
>might remember what foo stands for, and lots of young urban CS types
>will think it stands for something object oriented.)

I don't want to get in a flaming war - but that's an awfully long response
to a rhetorical question :-)
Oh well, that's what I get for not putting on a smily face when I ask a
rhetorical question :-|

>> Semi-rhetorical question: What would be this language's intended use?
>Similar to C. It will have the ``low-level'' features of C so that
>it's appropriate for systems programming, but there's no particular
>focus. (I use UNIX C for complex numerical programming, so I may be
>biased.)

>> 1. How about string operators.
>> 	I hate handling allocing of space for something silly like strings...
>This is mainly a library problem (though a good syntax helps).

A good syntax is the crux of the biscuit (to quote a favorite Zappaism).

>> 2. Ability to dynamically define new operators
>Expand. What exactly do you want? We're not talking p-code, you know.
>Are you looking for something that can't be implemented on top of the
>language?

I realize 'We're not talking p-code'...
How about something similar in flavor to OO methods.
It doesn't have to be as lean and mean as the core operators, but having
the ability to do it would be extremely useful...
BTW, doesn't this smells like the string-operator point(s) brought up
earlier.  A good syntax helps...

>> 3. Ability to use existing C libraries and headers.
>At least to interface with the loader the same as other languages. As
>for headers: one of the first standard applications will be a program to
>convert C function prototypes to this language. (Having the same macro
>processing is too much to ask, because C's macro processor is so
>limited. But most libraries do fine with just the function interface.)

I agree.

>> 	Seriously, I would consider the ability to link in existing
>> 	libraries, one way or another, an absolute must.
>I agree.

Well, now that that's over...
Some of the other replies to your original article mentioned a language
with semi-colons, with indentation providing the information about
loop bodies, etc...  HERE HERE!  This feature is extremely cheap
to provide.

In reference to article <4489:05:14:19@stealth.acf.nyu.edu>:

In general you have my vote (for what it's worth) on your responses
in the article.  The following items cause me to input:

>Is there anyone out there who really wouldn't like loop ...
>end/endloop/pool, etc.?
My own personnal feeling is that they are contrary to human nature.
Well, maybe not that bad, but... (maybe)

>Anyway, I favor a syntax that doesn't depend on lines or indentation:
>otherwise it's too easy to make syntax errors.

The same can, most definitely, be said of the trad C form...

raymond@twinkies.berkeley.edu (Raymond Chen) (02/20/90)

[I regret that I have but one line to give for my...MUNCH]

One of the few things I like about Pascal is its rigid typing.

If I have defined [using C-style notation]

typedef int xcoord;
typedef int ycoord;
typedef int attrib;

void put_as_at(char c, attrib a, xcoord x, ycoord y) { ... }

then it would be nice if the compiler would flag

{   xcoord x; ycoord y; char c; attrib a;
    foo(y,x,a,c);
}

as potentially erroneous (A warning is fine).  It's amazing how many
stupid errors are caused by passing parameters to a function in the
wrong order.  (And yes, of course, there should be a way to tell
the compiler "No, really, I know what I'm doing, trust me.")

Would also be fun if I could invoke the function above as

	put c as a at (x,y)
	^^^   ^^   ^^       <- these guys are the function name

(In the never-ending quest to make pseudo-code a proper computer language!)

As for using indentation to indicate block structure:  This can lead
to ridiculous code when your indentation marches off the edge of the
paper.  It also could create entries into the Obfuscated [language-name]
Code Contest like this:

foo(a,b,c)
	if blah
		while blah
			for blah
				if blah
					for blah
						while blah
							if blah
								for blah
									if blah
										while blah
											if blah
												grumble
		blurfle		// guess what indentation level this is at!
bar(x,y,z)			// this too.  Is it a function call or
				// a function declaration?


Apart from having immense fun with indentation, it also causes problems
if you cut and paste a clump of code from one place to another if the
destination has a different indentation level from the source.

If my memory serves me right, this experiment with "indentation determines
block structure" was used in a Pascal dialect many years ago.  I don't
remember what eventually happened to it.
--
 raymond@math.berkeley.edu         mathematician by training, hacker by choice

poser@csli.Stanford.EDU (Bill Poser) (02/20/90)

In article <2346@castle.ed.ac.uk> nick@lfcs.ed.ac.uk (Nick Rothwell) writes:
>
>You're kidding, right?. If you're going to mess around with antiquated
>low-level languages like C, I don't see the point of bringing yet
>another one into the world. Look at the functional languages like ML,...

I don't quite agree. It's true that there are too many low-level
languages (and there probably always will be), but there is a role
for languages of this type. Many of the new-fangled languages are either
nice for limited sorts of tasks (e.g. logic programming languages)
or carry with them a lot of overhead and or "protection" from low-level
aspects of the machine. However much you may like ML or Prolog or
whatever, there is a role for system programming languages, and C
is one of the best. Nonetheless, it isn't perfect, so it makes sense
to design improved C-like languages.
Perhaps eventually some of the nicer higher-level languages will prove
to be good for the sorts of applications that C is used for, but
that hasn't yet proved to be the case, has it?

Another point to consider is how easy it is to try out innovations.
Languages that are highly developed in certain directions may be
quite difficult to modify in other directions. Lower-level languages
like C are often better testbeds for innovations. Consider, for example,
the sources of some programming language innovations. Pattern matching
and goal-directed evaluation, as in ICON, have their origin in SNOBOL
(and to some extent in still more primitive languages like COMIT).
SNOBOL's structuring is poor even by the standards of the day (compare
ALGOL) but it led to some important and interesting ideas. Similarly,
object-oriented programming started off with Simula, a language that
in other respects probably wasn't state-of-the-art. And look at how
many innovations come from LISP. (To ward off the claim that LISP
is a high-level language, let me point out that while it was innovative
from the beginning, LISP was for many years quite low level, in terms,
for example of the primitive nature of its control structures, its
lack of aggregate data structures other than lists, and,
to take a stand on a religious issue, the lack of useful syntax.
It is only in recent years that LISP has acquired more reasonable
control structures and other modern conveniences.)

freek@fwi.uva.nl (Freek Wiedijk) (02/20/90)

In article <111355@ti-csl.csc.ti.com> gateley@m2.csc.ti.com (John Gateley)
writes:
>In article <8475@wpi.wpi.edu> jhallen@wpi.wpi.edu (Joseph H Allen) writes:
><(3)	Block structure indicated by indentation level:
><...
>Now, just take it a little further: get rid of all infix notation,
>and let "blocks" be denoted by ( and ) and you get:
>  ...
>and you have achieved a truly simple easy to use syntax where you
>dont have to worry about indentation (the editor does it for you),
>and programs which manipulate source code are much much easier to
>write.

Ehrm, no, I don't like this, because it is too verbose:

% cat foo.foo
while a!=b
	int q
	q=z*5
	r+=foo(q)
% cat foo.sch
(begin
  (while (!= a b)
    (let ((q integer))
      (= q (* z 5))
      (+= r (foo q)))))
% wc -c foo.*
      36 foo.foo
      92 foo.sch
     128 total
% bc
scale=2
92/36
2.55

Your solution has two-and-a-half times as many characters!  In my
opinion the main advantage of C with respect to Pascal, is that C
enables you to write "int" where Pascal forces you to say "integer" :-)

Also, I don't like this amount of parentheses:

      (+= r (foo q)))))
                  ^^^^^

I know that you can let the editor handle it, but it still confuses me.

--
Freek "the Pistol Major" Wiedijk                  Path: uunet!fwi.uva.nl!freek
#P:+/ = #+/P?*+/ = i<<*+/P?*+/ = +/i<<**P?*+/ = +/(i<<*P?)*+/ = +/+/(i<<*P?)**

kjj@varese.UUCP (Kevin Johnson) (02/21/90)

In article <1990Feb20.025947.16211@agate.berkeley.edu> raymond@math.berkeley.edu (Raymond Chen) writes:
>Apart from having immense fun with indentation, it also causes problems
>if you cut and paste a clump of code from one place to another if the
>destination has a different indentation level from the source.

Surely you jest...

peter@ficc.uu.net (Peter da Silva) (02/21/90)

In article <1990Feb20.025947.16211@agate.berkeley.edu> raymond@math.berkeley.edu (Raymond Chen) writes:
> 	put c as a at (x,y)
> 	^^^   ^^   ^^       <- these guys are the function name

> (In the never-ending quest to make pseudo-code a proper computer language!)

Sounds like Smalltalk. Except then it'd be:

	Put: c as: a at: x@y
-- 
 _--_|\  Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>.
/      \
\_.--._/ Xenix Support -- it's not just a job, it's an adventure!
      v  "Have you hugged your wolf today?" `-_-'

nick@lfcs.ed.ac.uk (Nick Rothwell) (02/21/90)

In article <12336@csli.Stanford.EDU>, poser@csli (Bill Poser) writes:
>I don't quite agree. It's true that there are too many low-level
>languages (and there probably always will be), but there is a role
>for languages of this type. Many of the new-fangled languages are either
>nice for limited sorts of tasks (e.g. logic programming languages)
>or carry with them a lot of overhead and or "protection" from low-level
>aspects of the machine. However much you may like ML or Prolog or
>whatever, there is a role for system programming languages, and C
>is one of the best.

True; but the original article said something about programmers being
able to agree on the nice features of a language; whereas what you're
saying above (and I agree) is that this will never happen, since (for
example) I'm going to be hacking away in C on my Mac for a long while
yet, even though I think C is dreadful and that programming language
design has moved on a long way.

I think, though, that any serious time and effort on designing a
better C-style language would be better spent getting decent modern
languages to a state of maturity.

		Nick.
--
Nick Rothwell,	Laboratory for Foundations of Computer Science, Edinburgh.
		nick@lfcs.ed.ac.uk    <Atlantic Ocean>!mcvax!ukc!lfcs!nick
~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~
		       ...als das Kind, Kind war...

siebren@piring.cwi.nl (Siebren van der Zee) (02/21/90)

gateley@m2.csc.ti.com (John Gateley) writes:

>In article <12507@mcdphx.phx.mcd.mot.com> kjj@varese.UUCP (Kevin Johnson) writes:
>>1. How about string operators.
>>	I hate handling allocing of space for something silly like strings...

>But, string sizes are not known at compile time, and so must be handled
>by the heap (i.e. alloc).

Right. Now if you're gonna put dynamic allocation in your language
anyway, don't forget to handle "automatic" growing of the stacks
in multithreaded environments.

This cannot be done by the operating system, since the virtual
memory just above the top of the stack that needs to be grown may
be used by another thread's stack.

The compiler can do this by checking at each procedure invocation
that the stack is at least big enough for this frame, and if not,
allocate a stackframe at the heap.  If this allocation fails, you
got a stack overflow. (I did something similar in a multithreading
package for the AtariST).

You can also take this message as a funny way to try to convince
you that you're probably not going to succeed in designing a language
that will please everybody. I guess the original poster is probably
already convinced by messages suggesting to implement Lisp, C++,
ML, and what-have-you-there :-)

	Siebren, siebren@cwi.nl

gateley@m2.csc.ti.com (John Gateley) (02/21/90)

In article <447@fwi.uva.nl> freek@fwi.uva.nl (Freek Wiedijk) writes:
>In article <111355@ti-csl.csc.ti.com> gateley@m2.csc.ti.com (John Gateley)
>writes:
>> [Introduces Lisp style syntax]
>Ehrm, no, I don't like this, because it is too verbose:
>
>[character counts deleted]
>
>Your solution has two-and-a-half times as many characters!  In my
>opinion the main advantage of C with respect to Pascal, is that C
>enables you to write "int" where Pascal forces you to say "integer" :-)

If you are worried about character counts, you can abbreviate
the words I spelled out in my example, that takes care of some.
However, as long as the verbosity does not clutter the language (as is
the case with cobol, though some will probably argue that), I feel that
it should not be an issue. You can always custom build an editor with
macros to do your typing for you.

>Also, I don't like this amount of parentheses:
>      (+= r (foo q)))))
>I know that you can let the editor handle it, but it still confuses me.

This is the main complaint I have heard with Lisp style syntax. However,
after a few sessions with the editor to learn how to use it, the confusion
goes away. Each paren has a matching paren, thats not confusing, its wondering
which paren goes with which when you see )))))) that makes you feel confused.
However, with automatic indenting (no human mistakes), things line up
nicely, and with the editor to jump from any paren to its partner, life
becomes easy.

John
gateley@m2.csc.ti.com

jlg@lambda.UUCP (Jim Giles) (02/21/90)

From article <22569:05:10:24@stealth.acf.nyu.edu>, by brnstnd@stealth.acf.nyu.edu:
> [...]     Take C as a starting point for good ideas and feel free to
> use parts of any other language.  [...]

I'm sorry, but I can't find any good ideas in C.

J. Giles

poser@csli.Stanford.EDU (Bill Poser) (02/21/90)

In article <2386@castle.ed.ac.uk> nick@lfcs.ed.ac.uk (Nick Rothwell) writes:
>
>True; but the original article said something about programmers being
>able to agree on the nice features of a language; whereas what you're
>saying above (and I agree) is that this will never happen,

Yes, I agree. Different languages are good for different tasks,
and even within a given area there are real differences in individual
taste. Attempts to design a language that does everything generally
seem to produce unpleasant results (predictable swipe at ADA ommitted
for brevity.)

jlg@lambda.UUCP (Jim Giles) (02/21/90)

From article <4489:05:14:19@stealth.acf.nyu.edu>, by brnstnd@stealth.acf.nyu.edu:
> [...]
> A line-based syntax also feels very dirty: there are exceptions for
> multiple statements on a line, exceptions for single-statement
> structures, etc.

I don't understand what excaptions you are refering to.  If the recognized
statement separater is semicolon, then make the end_of_line character an
alias for semicolon.  Now you have a line-based syntax which is (almost)
identical to the non line-based version you started with.  Productivity
experiments have shown that people work better if the end_of_line is
also the end_of_statement and the end_of_comment marker.  (And, don't
bring up compound statements at this point.  Experiments have also
shown that people work better if flow control is _not_ done with 
so-called compound statements.)

> 8. Eliminate arrays in favor of pointers and macros.
> 
> Say what? You need some way to express the concept of a contiguous
> region of memory. That's what arrays are for. How do pointers cleanly
> express multidimensional arrays? The language should know something
> about arrays, even if just for efficiency.

Hear, hear!!!

> 13. Declarations anywhere.
> 
> Yeah.

I third that motion!  Compilers are multipass these days everywhere.
Why not take advantage of it?

> 14. Control flow statements, control structures: [ various ]
> 
> I have some rather heretical thoughts on this subject. I'll make them
> clear in another message. (Remember that this isn't Ada. Given an
> infinite loop ... endloop, if, and break, you don't need to provide
> a terminating loop as a basic construct. Define it instead as a standard
> macro. Ada's infinite variety of control structures is awful.)

I disagree.  Ada's control constructs are among the few thing they did well.
Furthermore, they aren't all that complicated - they are completely defined
in the last 7 pages of chapter 5 in the Ada standard document (and, _MOST_
of that text is examples).

> [...]
> I agree with C's philosophy of only allowing bit packing inside
> structures. Other packing methods would really mangle the concept of
> pointers.

In languages with recursive data types, direct dynamic memory (like 
ALLOCATABLE in Fortran 90), and type coersion I've never seen the 
need for pointers _AT_ALL_!!  So, rejecting something because it
interferes with pointers is a null issue.

> 20. Parameter passing: [ various ideas ]
> 
> This has to be dealt with very carefully. I like C's solution: it's clean
> while allowing every trick Ada can do. A general principle here (which
> you appear to disagree with) is that the form of a function call can
> make clear the fact that a variable is not modified.

It is a good idea for side-effects to be visible.  On the other hand, it
is also a good idea (an often more important) for aliasing (or the lack
thereof) to be visible.  C's solution is anything but 'clean'.

J. Giles

scott@bbxsda.UUCP (Scott Amspoker) (02/21/90)

In article <14241@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
>From article <22569:05:10:24@stealth.acf.nyu.edu>, by brnstnd@stealth.acf.nyu.edu:
>> [...]     Take C as a starting point for good ideas and feel free to
>> use parts of any other language.  [...]
>
>I'm sorry, but I can't find any good ideas in C.

Well, here are a few ideas in C (although not unique to C):

     comments
     if-then-else
     looping
     arrays
     structures

*sigh*  I was thinking about joining in on this discussion but now I think
I'll pass.

-- 
Scott Amspoker
Basis International, Albuquerque, NM
(505) 345-5232
unmvax.cs.unm.edu!bbx!bbxsda!scott

ted@nmsu.edu (Ted Dunning) (02/21/90)

In article <12520@mcdphx.phx.mcd.mot.com> kjj@varese.UUCP (Kevin Johnson) writes:

   In article <1990Feb20.025947.16211@agate.berkeley.edu> raymond@math.berkeley.edu (Raymond Chen) writes:
   >Apart from having immense fun with indentation, it also causes problems
   >if you cut and paste a clump of code from one place to another if the
   >destination has a different indentation level from the source.

   Surely you jest...

actually what he meant is that when you take the cards from one part
of the deck and move them to another, you have trouble with
indentation.

--
	Offer void except where prohibited by law.

brnstnd@stealth.acf.nyu.edu (02/21/90)

In article <111357@ti-csl.csc.ti.com> gateley@m2.csc.ti.com (John Gateley) writes:
> >3. Ability to use existing C libraries and headers.
> This is truly a difficult problem, because you have to say: "whats so
> special about C, I want my <foo> libraries" where <foo> might be Ada,
> Lisp, PDP-11 assembler, or whatever.

No. Under UNIX, for example, one can without any trickery load Fortran,
Pascal, and C routines together. This is useful, though it does dictate
that the stack be used in particular ways.

With N languages running around it's impossible to write N^2 translators.

---Dan

xanthian@saturn.ADS.COM (Metafont Consultant Account) (02/21/90)

Better, make the language strictly postfix, give it exactly one
grouping operator, parentheses will do, and you get a regular syntax
and lots of help for the compiler.  See the on going discussion in
comp.lang.forth.  For operators where it makes sense, have a default
number of arguments, and also an optional compilation (overloading) to
handle the variable length list of arguments case.  This completely
eliminates questions of precedence, etc.

For example:

	a b +           is an expression for the sum of a and b.
	(a b c d) +     is an expression for the sum of a, b, c, and d.
and
        a b c Func1     is a fixed number of arguments function call
        (a b c d) Func2 is a variable number of arguments function call

--
xanthian@ads.com xanthian@well.sf.ca.us (Kent Paul Dolan)
Again, my opinions, not the account furnishers'.

kevin@argosy.UUCP (Kevin S. Van Horn) (02/22/90)

In article <14242@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
>
>In languages with recursive data types, direct dynamic memory (like 
>ALLOCATABLE in Fortran 90), and type coersion I've never seen the 
>need for pointers _AT_ALL_!!  So, rejecting something because it
>interferes with pointers is a null issue.
>

Would you care to expand on this?  I'm not sure what "direct dynamic memory"
is, for starters.

------------------------------------------------------------------------------
Kevin S. Van Horn            | The means determine the ends.
kevin@argosy.maspar.com      |

jlg@lambda.UUCP (Jim Giles) (02/22/90)

From article <628@bbxsda.UUCP>, by scott@bbxsda.UUCP (Scott Amspoker):
> In article <14241@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
> [...]
>>I'm sorry, but I can't find any good ideas in C.
> 
> Well, here are a few ideas in C (although not unique to C):
> [...]

I'm sorry, I should have been more clear.  I can't find any good ideas
in C which aren't done as well or better (usually better) in many other
languages.  This includes languages which predate the invention of C.

> [...]
>      arrays
> [...]

C doesn't _have_ arrays!  It has a strange variant of pointers which
can (on rare occasions) simulate arrays in a way that is almost as
efficient and easy to read as arrays would have been.  Usually,
however, the simulated arrays are less efficient and cumbersome
to use.

J. Giles

gateley@m2.csc.ti.com (John Gateley) (02/22/90)

In article <10979@saturn.ADS.COM> xanthian@saturn.ADS.COM (Metafont Consultant Account) writes:
>
>Better, make the language strictly postfix, give it exactly one

So, why is postfix better than prefix? 
One other poster mentioned postfix, and made the claim that it
was better than prefix as well, and I am curious why
y'all think so. (i'd take postfix over infix any day, but prefer
prefix because I am used to it).

John
gateley@m2.csc.ti.com

cg@myrias.com (Chris Gray) (02/22/90)

In article <22569:05:10:24@stealth.acf.nyu.edu> brnstnd@stealth.acf.nyu.edu (Dan Bernstein) writes:
>So what do you want in a compiled, imperative, perhaps object-oriented
>language? Take C as a starting point for good ideas and feel free to
>use parts of any other language. Remember: This isn't Ada. If it gets
>too complicated, trash it. Simple is beautiful. Modular design is
>beautiful. And above all, remember that this is going to be a language
>people can actually like.

The first thing to decide is a bit more detail on what kind of language you
are after - many of the decisions about features are affected by that. Your
hope that one language will satisfy most programmers is pretty well doomed
to failure - programmers are much too particular. For example, the
description posted by Joseph H Allen represents a language I would be very
uncomfortable with.

One goal for ANY language is that it be quickly readable by anyone, whether
they are familiar with that class of language or not. Another goal for a
compilable language is that it be reasonably compilable. C's main problem
with compilation is that the syntax is so ambiguous that a single error
(try putting a semicolon after a function header in gcc!) can lead to
hundreds of error messages. Some of Mr. Allen's ideas would be even worse
in terms of error recovery (and, in my opinion, in terms of readability).

So as to contribute a different viewpoint to this discussion, let me try to
summarize my language. It has some problems, but I and others have found it
to be quite usable. The language is called "Draco" and it currently exists
only for prehistoric CP/M computers and for the Commodore Amiga. Its syntax
is somewhat like that of Algol68, but with much less overloading. Its
semantics is somewhat like C's, but it is much more strongly typed.

Draco is a very "dull" language - it uses regular rules for identifiers, has
pretty standard syntax and semantics, is only slightly extendible, and has
no especially interesting new ideas. It is very easy to parse, and fairly
simple to generate good code for. Most people have no trouble at all reading
it for the first time (given experience in C, Pascal, Algol, whatever).

Rather than try to given a grammar, I'll just type in a mishmash program
that tries to use most of the features.

#drinc:dos/libraries.g		/* include a system header */

extern fred(int i; short j; long k; char c; bool flag)float;

uint MAX_DISKS = 10;

proc hanoi(unsigned MAX_DISKS n; *char left, middle, right)void:

    if n ~= 0 then
	hanoi(n - 1, left, right, middle);
	writeln("Move disk ", n, " from ", left, " peg to ", right, " peg.");
	hanoi(n - 1, middle, left, right);
    fi;
corp;

proc doit()void:

    hanoi(5, "left", "right", "center");
corp;

proc matmult([*,*] float a; [*,*] float b; [*,*] float c)bool:
    register uint i, j, k;
    float sum;

    if dim(a, 2) ~= dim(b, 1) or dim(a, 1) ~= dim(c, 1) or
	dim(b, 2) ~= dim(c, 2)		/* did I get this right???? */
    then
	false
    else
	for i from 0 upto dim(a, 1) - 1 do
	    for j from 0 upto dim(b, 2) - 1 do
		sum := 0;
		...
	    od;
	od;
	true
    fi
corp;

/* actually all declarations have to be inside functions or before all
   functions, but this is only an example */

type
    array_t = [MAX_DISKS * 3, MAX_DISKS + 17] uint,
    element_t = unknown 100,			/* 100 bytes long */
    list_t = struct {
	*list_t l_next;
	element_t l_this;
    },
    thingType_t = enum {red, blue, green},	/* NOT just ints */
    otherThingType_t = union {
	*char ott_name;
	long ott_counter;
	*somethingorothertype ott_default;
    };
/* I won't try to do a user extendible type here - the compiler comes with
   a complex number package that I can use like: */

complex I = (0.,1.);
complex a, b, c;
*list_t BaseList;
[0b101] struct {
    *char name;
    thingType_t colour;
} words := {
    {"red", red}, {"blue", blue}, {"green", green}, {"white", red}, {"", red}
};

proc insert(element_t e)void:
    register *list_t l;

    l := new(list_t);				/* language construct */
    l*.l_next := BaseList;
    l*.l_this := e;
    BaseList := l;
corp;

proc constructs(**[2,3,4][5,6]*float youGottaBeKidding)void:
    thingType_t tt;
    *list_t l;
    ulong n;

    a := b + c;
    if re(a) < 0.0 then
	writeln("a = ", a);
    elif a = complex(1.1, 1.1) then
	readln(a, b, c);
    fi;
    l := BaseList;
    while l ~= nil do
	...
	l := l*.l_next;
    od;
    case tt
    incase red:
	writeln("It was red");
    incase green:
	writeln("It was green");
    incase blue:
	writeln("It was blue");
    default:
	writeln("Somebody boobooed!\(BEL)\(0x07)\(0x06+2)");
    esac;
    l := if tt = red then nil else l*.l_next fi;
    if tt = blue then return fi;
    while
	write("This is a prompt. Enter two integers: ");
	readln(i, j)
    do
	writeln("You entered ", i, '(', i:x:8, ':', i:b:32, ':', i:o:7, ")");
    od;
    n := (0xfdecba98 & 0o237458 >< 0b100000001) << ~n;
    l := pretend(128, *list_t) + 6 * sizeof(list_t);
corp;

Anyone who wants to try out the language and compiler is invited to go out
and buy an Amiga - the compiler, etc. are available freely.
-- 
Chris Gray		Myrias Research, Edmonton	+1 403 428 1616
	cg@myrias.COM          {uunet,alberta}!myrias!cg

djones@megatest.UUCP (Dave Jones) (02/22/90)

From article <14242@lambda.UUCP>, by jlg@lambda.UUCP (Jim Giles):
> Experiments have also
> shown that people work better if flow control is _not_ done with 
> so-called compound statements.)
> 


Please elaborate.


>> ...
>> I agree with C's philosophy of only allowing bit packing inside
>> structures. Other packing methods would really mangle the concept of
>> pointers.
> 
> In languages with recursive data types, direct dynamic memory (like 
> ALLOCATABLE in Fortran 90), and type coersion I've never seen the 
> need for pointers _AT_ALL_!!  So, rejecting something because it
> interferes with pointers is a null issue.
> 


Huh? I think somebody missed the point. Those 'recursive data structures'
etc. are just teaming with pointers. Whether the programer declares them,
or the compiler sneaks them in, the hardware still wants them to point to
properly aligned data.

brnstnd@stealth.acf.nyu.edu (02/22/90)

In article <1873@wrgate.WR.TEK.COM> daniels@teklds.WR.TEK.COM (Scott Daniels) writes:
> A pet peeve: why only decimal radix for floating point?

Yeah. (Anyone know why Ada stops at base 16?)

> Another thing I would like to be able to do is to indicate that a function
> is "pure" (only depends on its args),

I like this.

> On strings: SAIL had strings which consisted of a length and pointer part

Library problem. It should be kept in mind as a syntax issue.

> (1) How about some way to provide structs which have negatively indexed 
>     fields as well as positively indexed fields.

I'm not sure what you're looking for.

  [ struct layout ]

Must this be forced upon the compiler? Quality-of-implementation issues
are important but I can't imagine a portable program taking advantage of
this.

> (3) Allow "anonymous" incorporation of a structure (or union):

Yeah. JHA suggested ``inherit''---makes perfect sense to me.

  [ integral[1..29] ]

This seems reasonable.

  [ % ]

Would someone please show me an example of a real program that uses C's
% in a context where not knowing the sign would be okay? Until that
example shows up, this argument is purely facetious.

It's fine to have two portable operators with different results, like
Ada's rem and mod.

> >I agree with C's philosophy of only allowing bit packing inside structures. 
> But, it would be nice to have a packed vector of bits available somewhere
> (inside structures only would be fine with me).

Do you mean an actual array of bits? How would you integrate this with
the normal meaning of arrays?

> Type coersions:
>     Something between C's "forget all your type checking" and many other 
> language's "you can't get there from here."  How about a coersion that
> has both `from' and `to' parts.

Wait a minute! C's weak typing has nothing to do with the overloading of
its conversion functions. Even Ada, with the strongest typing around,
has overloaded conversions. Merely introducing the above change wouldn't
do anything to C's type checking.

---Dan

brnstnd@stealth.acf.nyu.edu (02/22/90)

In article <10790@june.cs.washington.edu> machaffi@fred.cs.washington.edu.cs.washington.edu (Scott MacHaffie) writes:
> Unconditional loops have a serious problem: you have to read all of the
> code inside the loop to find out when (or if) it terminates. Replacing
> while and for with loop would be a bad idea. Even providing loop
> means that people will use it and stick "end loop" inside the loop
> (this happens in ada).

This is specious. Assume break and if; then while, do, for, and loop
are all equivalent, in the sense that each can be written purely
syntactically in terms of any of the others. Therefore (in line with
the goals of the language, to evolve in another thread) the simplest
construction wins. (There are several criteria for choosing ``loop''
as the simplest; but I don't think anyone will argue, so I won't
elaborate further.)

---Dan

brnstnd@stealth.acf.nyu.edu (02/22/90)

In article <14242@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
  [ make syntax line-based ]

I agree! Let's make it column-based, too. And let's reserve a column for
indicating comments. Wow, I think we're onto something here. :-)

> Productivity
> experiments have shown that people work better if the end_of_line is
> also the end_of_statement and the end_of_comment marker.

Comments, yes. Statements, no. The studies you're referring to compared
line-based syntax to Pascal or C, where adding a semicolon after a block
or statement can cause a syntax error or change the meaning of the code.
Such syntactic problems disappear when loops are explicitly terminated.

A study in Computer Languages several years back found that terminated
loops ended up with by far the fewest errors per programmer.

> Ada's control constructs are among the few thing they did well.

Given the lack of any macro facility, they did fine. They would have
done much better to provide a standard method to define new control
structures, then reduced the standard set to three or four with no
special cases.

> In languages with recursive data types, direct dynamic memory (like 
> ALLOCATABLE in Fortran 90), and type coersion I've never seen the 
> need for pointers _AT_ALL_!!  So, rejecting something because it
> interferes with pointers is a null issue.

Oh, yeah! In a language with tasks, automatic threading, and message
passing, I've never seen the need for semaphores _AT_ALL_!! C'mon, be
serious.

> It is a good idea for side-effects to be visible.

One solution is declaring ``pure'' functions. A more general solution is
to allow a block to list each outside variable explicitly. This leads to
the idea that the compiler should be able to automatically generate such
lists within the code.

> On the other hand, it
> is also a good idea (an often more important) for aliasing (or the lack
> thereof) to be visible.  C's solution is anything but 'clean'.

You have this hangup with aliasing... :-) I agree that there should at
least be a way to specify that parameters aren't aliased, and to include
that information into the function call syntax. This would take care of
most of the common cases.

---Dan

preston@titan.rice.edu (Preston Briggs) (02/22/90)

In article <24349:04:46:47@stealth.acf.nyu.edu> brnstnd@stealth.acf.nyu.edu (Dan Bernstein) writes:
>Given the lack of any macro facility, they did fine. They would have
>done much better to provide a standard method to define new control
>structures, then reduced the standard set to three or four with no
>special cases.

Consider perhaps the Scheme ideal of lambda, if, call, set!, and catch
as a base, then adding all the rest of the control structure with macros.

>> In languages with recursive data types, direct dynamic memory (like 
>Oh, yeah! In a language with tasks, automatic threading, and message
>passing, I've never seen the need for semaphores _AT_ALL_!! C'mon, be
>serious.

It's not such a bad point.  ML (and other languages) with recursive data
types manage very nicely without (explicit) pointers.  Heap allocated data
is probably needed, but automatic type coercions aren't.

So, how about garbage collection?

khan@milton.acs.washington.edu (I Wish) (02/22/90)

In article <4489:05:14:19@stealth.acf.nyu.edu> brnstnd@stealth.acf.nyu.edu (Dan Bernstein) writes:
>14. Control flow statements, control structures: [ various ]
>
>[...]                      Remember that this isn't Ada. Given an
>infinite loop ... endloop, if, and break, you don't need to provide
>a terminating loop as a basic construct. Define it instead as a standard
>macro. Ada's infinite variety of control structures is awful.

This may be a nit-picky detail, but what's the difference whether a
terminating loop is a "basic construct" or a "standard macro"?  If it's
"standard," everyone using ALPAL wwill have it, and, I believe, use it.

If you're trying to discourage this variety of control structures, it
seems you'd want to discourage this sort of macro.... and whether it's
actually handled in the preprocessor or the compiler proper is an
efficiency-of-implementation detail that could be transparent.
(Ideally -- I sure *hope* no one wants to redefine "for" (: )
-- 
"indecipherable strangers handing out inexplicable humiliation and an
 unidentified army of horsemen laughing at him in his head ..."
                                                           -- Douglas Adams
Erik Seaberg (khan@milton.u.washington.edu)

new@udel.edu (Darren New) (02/22/90)

In article <111706@ti-csl.csc.ti.com> gateley@m2.csc.ti.com (John Gateley) writes:
>So, why is postfix better than prefix? 

Well, for one it is much easier to parse. I would estimate it is about as 
much easier to parse compared to prefix as prefix is to infix.

Two, it is much more flexible.  For example, in FORTH there is a word 
(aka procedure, function, ...) called : (pronounced COLON :-) that
reads the next string from the input and starts compiling
a new function with its name being that string.  Much like defun in
LISP, except that the syntax is not fixed. With prefix, you must make
a distinction between functions that evaluate their arguments and
functions that do not (defun, cons, etc).  In postfix, the evaluated
arguments come first and the non-evaluated arguments come afterwards.

Of course, I've been working on my own language that is postfix and
completely syntax-free; this may make me see some of this stuff in
a prejudiced way. Maybe it is possible to make prefix as flexible as
postfix, but I don't know how off hand. 

		 -- Darren

mike@cs.arizona.edu (Mike Coffin) (02/22/90)

From article <14244@lambda.UUCP>, by jlg@lambda.UUCP (Jim Giles):
> C doesn't _have_ arrays!  It has a strange variant of pointers which
> can (on rare occasions) simulate arrays in a way that is almost as
> efficient and easy to read as arrays would have been.

A correction for those readers not familiar with C: the above is not
true.  Arrays and pointers are different beasts.  The confusion arises
because array names are *converted* to pointers when passed as
parameters and because the [] operator can be used on both.  To make
an analogy, in both Fortran and C, integers are sometimes converted
automatically to reals (floats in C) and many of the same operators apply
to integers and reals but that doesn't mean that Fortran and C don't
really _have_ an integer data type.

-- 
Mike Coffin				mike@arizona.edu
Univ. of Ariz. Dept. of Comp. Sci.	{allegra,cmcl2}!arizona!mike
Tucson, AZ  85721			(602)621-2858

gateley@m2.csc.ti.com (John Gateley) (02/23/90)

READABILITY CONSIDERED HARMFUL!

In article <635635738.28255@myrias.com> cg@myrias.com (Chris Gray) writes:
>One goal for ANY language is that it be quickly readable by anyone, whether
>they are familiar with that class of language or not.

I disagree with this statement. It might be fair to say that
a language should be readable by anyone who is familiar with
the basic concepts, but what Chris's statement does is limit
a language to concepts that everyone will be familiar with.
Anything that is subtle, powerful, or exciting also has the unfortunate
drawback that it is harder to read. Consider "call/cc" in Scheme,
forward inferencing a la OPS5, backwards inferencing a la Prolog,
Combinatory logic, first class functions and ...

John
gateley@m2.csc.ti.com

toma@tekgvs.LABS.TEK.COM (Tom Almy) (02/23/90)

In article <111706@ti-csl.csc.ti.com> gateley@m2.csc.ti.com (John Gateley) writes:
>So, why is postfix better than prefix? 
>One other poster mentioned postfix, and made the claim that it
>was better than prefix as well, and I am curious why
>y'all think so. (i'd take postfix over infix any day, but prefer
>prefix because I am used to it).

Most postfix fanatics (typically Forth programmers, of which I am one) will
say postfix is better than prefix (thinking of LISP) because it eliminates
the need for all of those parenthesis. In fact, parenthesis (as grouping
operators) are only needed if you don't know how many arguments are needed
for a function. You can get rid of most parenthesis in a prefix language,
for instance LOGO behaves much like a parenthesis-free LISP. The only
real advantage of postfix is that it can be directly executed in a stack
architecture.

What bothers me is not so much pre vs. in vs. post but what I call mixfix.
Mixfix is a hodgepodge of pre/in/post fix notation that can be very
confusing. At least LISP is consistantly prefix (LOGO isn't).

Some offenders:

Back in the days of the calculator wars, there was HP touting postfix while
TI touted algebraic (which I would consider to be infix diadic functions and
prefix monadic functions eg 4 + sin(-x) ). Yet HP wasn't fully postfix --
the register access functions (sto and rcl) were prefix. And TI's monadic
functions were really postfix eg 4 + (x - sin) ).

Forth, the "king of postfix" uses prefix functions for string arguments.

C is a hodgepodge of pre/post/in fix.  eg ++ is either pre or post, * is
either pre or in, [] is post. In an integer constant 0 and 0x are pre and L
is post.

Tom Almy
toma@tekgvs.labs.tek.com
Standard Disclaimers Apply

econrad@thor.wright.edu (Eric Conrad) (02/23/90)

From article <10979@saturn.ADS.COM>, by xanthian@saturn.ADS.COM (Metafont Consultant Account):
> Better, make the language strictly postfix, give it exactly one
                                     ^^^^^^^

Why not prefix notation?  Prefix notation is more common than postfix
in mathematical literature,
	f(x,y,z) rather than (x,y,z)f
I suspect that it is a easier to read for those of us used to reading
from left to right since it emphasizes the operators rather than the
operands.

Of course I haven't used an HP calculator in a long time so I am
probably prejudiced.

-- Eric Conrad
+----------------------------------------------------------+
| Eric Conrad - Wright State University                    |
| "Progress was all right once, but it went on too long."  |
+----------------------------------------------------------+

jlg@lambda.UUCP (Jim Giles) (02/23/90)

From article <390@argosy.UUCP>, by kevin@argosy.UUCP (Kevin S. Van Horn):
> In article <14242@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
>>In languages with recursive data types, direct dynamic memory (like 
>>ALLOCATABLE in Fortran 90), and type coersion I've never seen the 
>>need for pointers _AT_ALL_!!  So, rejecting something because it
>>interferes with pointers is a null issue.
> 
> Would you care to expand on this?  I'm not sure what "direct dynamic memory"
> is, for starters.

Ok.  I'll tackle each feature separately.

I)   Recursive data types.
        These are things like linked lists, graphs, trees, etc..  Internally,
        they are almost certainly implemented using pointers although, there
        are other implementations possible if you don't mind a constraint
        against cyclic structures - C.A.R. Hoare claims that cyclic data
        objects are unstructured monsters that shouldn't be allowed.
        (Functional languages usually don't allow cyclic data structures.)

        As an example, consider a possible declaration of a binary tree
        data type (all examples are in a Fortran/C-ish syntax - that is,
        the data types are on the left and a possible initialization
        is on the right of each declared item):

              type b_tree is recursive
                 node   value
                 b_tree left
                 b_tree right
              end type b_tree

        and the data type 'node' might be a discriminated union so that
        the tree could hold a variety of data.  Note that the declaration
        is very like what C does, except that C requires the left and right
        objects to be pointers.  So what is the advantage of this over
        pointers?  Well, for one thing, after you get used to it, it's
        easier to use and to read.  There are no 'dereferencing' operators
        and there is no distinction between '->' and '.' for field selectors.

              btree x = null
              btree y = null

        Recursive data structures can be empty, these declarations explicitly
        initialize 'x' and 'y' to be empty (which is probably the default,
        but it's better to be sure).

              btree a = {1,null,null}

        This declares 'a' to be a tree with one node already defined.  It
        also assumes that data type 'node' is assignment conformable with
        the integer '1' (which the compiler should check).

              x = btree{2,a,null}
              y = btree{3,x,btree{4,null,null}}
              a.value = 7

        These executable statements have built the tree:

                  y:3
                 /  \
               x:2   :4
              /  
            a:7

        Note that new nodes are allocated by a the type name applied
        to a conformable list.  Nodes are deallocated when there are
        no references to them (for example: y.right = null will cause
        the node with the value '4' to be deallocated).  Deallocation
        is done through reference counts or garbage collection.  If
        overhead is a problem, an explicit deallocation could be added
        to the language.  The implicit deallocation has the advantage
        of completely eliminating two common types of pointer errors:
        dangling pointers (which still point to memory that some other
        part of the code has deallocated) and orphanned data (which is
        not accessable through any pointer but was never returned to
        the memory manager).

II)  Direct dynamic memory.

        The concept behind this is simple.  If you want an object to
        be dynamically allocated, you say so in the declaration and
        then you have to code an allocation statement which must be
        executed before the object can be used.  The Fortran 90 syntax
        is used in the following example:

              real, allocatable :: x(:,:)
               ...
              allocate (x(100,1000))
               ... code using 'x'
              deallocate (x)
               ...

        Between the execution of the ALLOCATE and DEALLOCATE statements,
        the object behaves (from the user's point of view) _exactly_
        the same as a statically allocated object does.  In particular,
        dynamically allocated objects are _NOT_ aliased with any other
        objects!  This means that code can be fully optimized, etc..

        Note that arrays can be dynamically allocated to any size.
        Although I didn't show it here, the dimensions of 'x' could
        have been supplied by any integer expression.  For some reason,
        the fortran committee restricted the ALLOCATABLE attribute
        to arrays only.  Obviously, there is no reason that scalar
        objects can't be dynamically allocated as well.

        The function ALLOCATED(x) returns whether the array is currently
        allocated or not.  The object is automatically deallocated when
        control returns from the its scope unless the object also has
        the SAVE attribute.

        There are several advantages to this.  The obvious one is that
        no aliasing is involved (or even possible - without pointers).
        Since ALLOCATE is a statement and not a function call, it is
        generic in its arguments: no error prone (and oft ommitted)
        type casting on the returned pointer (like C has), and no
        need to manually scale the memory request by sizeof() multiples.
        The ALLOCATE statement is more legible as are the uses of the
        allocated object (no dereferencing operator to mess with, no
        confusion between the object and its pointer, etc.).

III) Run-time type coersion.

        There are two different activities that are both called 'coersion'.
        One is type conversion (like x = (double) i; in C).  You can
        debate whether the language should be 'strict' and not do any
        such conversions automatically vs. a 'non-strict' language
        which allows 'mixed-mode' operations.  This is _NOT_ the kind
        of coersion I am refering to.

        When doing system programming it is often necessary to 'ignore'
        the data type of an object in order to directly manipulate its
        internal structure.  This requires the ability to alter the
        type-tag that the compiler sees for the object - _WITHOUT_
        altering the data itself.  This is the type of coersion I'm
        refering to in this discussion.  'Structured Programming'
        enthusiasts will claim this is ugly and you shouldn't do it.
        Unfortunately, it is often the only efficient way to get
        something done (or, would you rather your customers chose
        someone else's system?).  I won't go into the 'ethical'
        question here.  I am only going to talk about _how_ to do
        the deed - once you've decided that it's a useful feature.

        In C, the usual way is to use a pointer cast:

              double x;
              struct dbl_struct {  /* structure of an IEEE double */
                 int sign_bit : 1;
                 int expon    : 11;
                 int fraction : 54;
              } *p;
               ...
              p = (dbl_struct *) &x; 
               ...

        Unfortunately, this won't work since C is allowed to 'pad'
        bit-fields in structs.  So, C users usually just cast to
        a char pointer and shift&mask the fields they need.  Either
        way, this is nothing more than (or less than) a run-time
        EQUIVALENCE statement.  But, what is needed has nothing
        to do with aliasing or pointers!  What is really needed is
        a way do the type-coersion directly.  How about:

              double x;
              map x as struct {
                 int sign_bit : 1;
                 int expon    : 11;
                 int fraction : 54;
              }
               ...
              x = 1.0 /* x still works as a regular 'double' if no 'map
                         fields are present                            */
              x.sign_bit = 1; /* force x negative */
              x.expon = x.expon-3; /* divide x by 8 */
               ...

        This, together with a rule that 'map' structures are never padded,
        will accomplish what's needed without pointers.  Your code will
        not take a performance hit from possible aliasing, etc..

Notice that all these mechanisms are much more precise than the pointer
implementations are.  Only the recursive data structures involve possible
aliasing - but you could argue that you _want_ to allow aliasing here.
Since they are more precise, they are easier to read and to write as well
as being easier to compile.  I have yet to find any algorithms for which
the above kinds of features aren't sufficient (both functionally and
for efficiency).  So, I don't see the need for pointers at all!

J. Giles

jlg@lambda.UUCP (Jim Giles) (02/23/90)

From article <24349:04:46:47@stealth.acf.nyu.edu>, by brnstnd@stealth.acf.nyu.edu:
> In article <14242@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
>   [ make syntax line-based ]
> I agree! Let's make it column-based, too. And let's reserve a column for
> indicating comments. Wow, I think we're onto something here. :-)

Don't laugh! There are people seriously considering using indentation
level as the flow-control mechanism.  If that's not column sensitive,
I don't know what is!  (By the way, indentation level _has_ been used
this way in an existing language: MODCAP (used to be called MADCAP - don't
know why they changed the name).)

> [... end_of_line == end_of_construct ...]
> Comments, yes. Statements, no. The studies you're referring to compared
> line-based syntax to Pascal or C, where adding a semicolon after a block
> or statement can cause a syntax error or change the meaning of the code.
> Such syntactic problems disappear when loops are explicitly terminated.

On the other hand, I've seen a study of C which indicated a large percentage
of syntax errors was due to missing semicolons.  The overwhelming majority
of such errors were at the end_of_line.  This seems to indicate that even
C programmers tend to regard the end_of_line as the end_of_statement (at
least subconsiously).  Meanwhile, I've done my own study of C code available
to me (sources that come with SUN, etc.).  Out of over 5000 lines of C,
I found only 11 cases of a simple statement which wrapped across a line
boundary.  One of these 11 was, manifestly, an error!  So, it causes a
lot of problems and is used very infrequently - why keep it?

> A study in Computer Languages several years back found that terminated
> loops ended up with by far the fewest errors per programmer.

Yes, and the infamous 'Hare' experiments in the 70's showed that programs
using 'IF ... ENDIF' had fewer errors than 'IF() GOTO L ... L:', which
in turn had _FEWER_ errors than 'IF()BEGIN ... END'.  After that, even
N. Wirth gave up on 'compound-statements' as a flow control mechanism.

> [...]
>> Ada's control constructs are among the few thing they did well.
> 
> Given the lack of any macro facility, they did fine. They would have
> done much better to provide a standard method to define new control
> structures, then reduced the standard set to three or four with no
> special cases.

There _are_ only three or four!  IF ... ELSEIF ... ELSE ... ENDIF,
LOOP ... END LOOP, CASE ... END CASE, GOTO.  That's four.  The special
cases all apply to loops: the FOR ... LOOP, the WHILE ... LOOP, and
the LOOP ... EXIT ... END LOOP.

Also, there is a problem with using macros to make control structures:
the lack of standardization.  If you make up a CASE construct which
I find peculiar, I will have difficulty with your code.  Giving me the
expanded version won't be of much help either - unless it turns into
_very_ legible standard code (in which case, why bother inventing a
new control structure?).

> [... no need for pointers _AT_ALL_ ...]
> Oh, yeah! In a language with tasks, automatic threading, and message
> passing, I've never seen the need for semaphores _AT_ALL_!! C'mon, be
> serious.

I am serious!  By your argument the following is valid:

  Oh, yeah! On a machine with conditional jumps, built-in arithmetic,
  and instructions in memory, I've never seen the need for a Turing
  machine _AT_ALL_!!   C'mon, be serious.

And yet, I don't know anyone who does production work on Turing machines.
In the language you described above, semaphores would probably _not_ be
very heavily used (at least, not directly).  And, with a well designed
language, pointers would probably not be needed either (at least, not
directly).  The advantage of these higher level ways of computing is
that they allow the programmer to be more _precise_ about what the
program does.

By the way, the correct analogy for pointers is GOTO statements.  Pointers
and GOTOs are isomorphic to each other when you compare control constructs
to data structuring features.  Pointers can lead to 'spaghetti' data in
the same way that GOTOs lead to 'spaghetti' code.  Pointers have more
severe disadvantages though: whereas GOTOs are usually restricted to
purely local targets, pointers can, and usually do, point outside the
local scope.  The compiler is therefore completely unable to make any
simplifying assumptions about the data flow of any code which involves
pointers together with global data or other pointers or even local data
which has been passed by reference to some other procedure!

J. Giles

jlg@lambda.UUCP (Jim Giles) (02/23/90)

From article <12098@goofy.megatest.UUCP>, by djones@megatest.UUCP (Dave Jones):
> [... don't need pointers ...] 
> 
> Huh? I think somebody missed the point. Those 'recursive data structures'
> etc. are just teaming with pointers. Whether the programer declares them,
> or the compiler sneaks them in, the hardware still wants them to point to
> properly aligned data.

Huh? I think somebody missed the point.  Those 'structured' control constructs
are just teaming with GOTOs!  Whether the programer uses them directly, or the
compiler sneaks them in, the hardware still wants them to point to code.

The point is (so to speak), in both these cases, if the programmer doesn't
code the thing directly the compiler can make simplifying assumptions about
the way they are used.  Also, the code is more readible and easier to
maintain is you use the higher-level constructs.

J. Giles

jlg@lambda.UUCP (Jim Giles) (02/23/90)

From article <18172@megaron.cs.arizona.edu>, by mike@cs.arizona.edu (Mike Coffin):
> [...]
> A correction for those readers not familiar with C: the above is not
> true.  Arrays and pointers are different beasts.  The confusion arises
> because array names are *converted* to pointers when passed as
> parameters and because the [] operator can be used on both.  To make
> an analogy, in both Fortran and C, integers are sometimes converted
> automatically to reals (floats in C) and many of the same operators apply
> to integers and reals but that doesn't mean that Fortran and C don't
> really _have_ an integer data type.

A correction for those readers not familiar with C: the above is not
true.  Arrays are pretty useless unless you can pass them around as
procedure arguments.  C converts all arrays to pointers when passing
them to procedures.  AND: YOU _CAN'T_ CONVERT THEM BACK ONCE YOU'RE
THERE!!!!!  They are not treated as arrays anywhere except the scope
in which they were declared - and CAN'T be treated as arrays anywhere
except their home scope.

To make an analogy, it's as if, once an integer was converted to real,
it could _never_ be converted back!  And normal usage of integers
_forces_ you to convert them to real on frequent occasions.  So that,
in effect, you really _DON'T_ have integers.  Fortunately, even C
doesn't really do this to integers.  But is DOES do the corresponding
thing to arrays.

J. Giles

new@udel.edu (Darren New) (02/23/90)

In article <14245@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
>I)   Recursive data types.
>        As an example, consider a possible declaration of a binary tree
>        data type (all examples are in a Fortran/C-ish syntax - that is,
>              y = btree{3,x,btree{4,null,null}}

Ok.  Now how do I write the declaration of a function that returns 
a pointer to the first sub-tree with the value four at its root
without winding up with an aliased value?  AND how do I do this
without using recursion?          -- Darren

sakkinen@tukki.jyu.fi (Markku Sakkinen) (02/23/90)

In article <18172@megaron.cs.arizona.edu> mike@cs.arizona.edu (Mike Coffin) writes:
>From article <14244@lambda.UUCP>, by jlg@lambda.UUCP (Jim Giles):
>> C doesn't _have_ arrays!  It has a strange variant of pointers which
>> can (on rare occasions) simulate arrays in a way that is almost as
>> efficient and easy to read as arrays would have been.
>
>A correction for those readers not familiar with C: the above is not
>true.  Arrays and pointers are different beasts.  The confusion arises
>because array names are *converted* to pointers when passed as
> ...

A further correction for those readers not familiar with C:
Giles's posting is a slight exaggeration. In the sense of memory allocation,
C does have arrays: if you define an external, static, or automatic
array, space is really reserved for it. But arrays certainly are not
first-class objects in the same way as records (struct's) are.
The confusion between arrays and pointers is perhaps the worst
single flaw in C.

Markku Sakkinen
Department of Computer Science
University of Jyvaskyla (a's with umlauts)
Seminaarinkatu 15
SF-40100 Jyvaskyla (umlauts again)
Finland
          SAKKINEN@FINJYU.bitnet (alternative network address)

ted@nmsu.edu (Ted Dunning) (02/24/90)

In article <14246@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:

   I don't know what is!  (By the way, indentation level _has_ been used
   this way in an existing language: MODCAP (used to be called MADCAP - don't
   know why they changed the name).)


we can only hope that modcap isn't an existing language much longer.

not only did it use a wildly non-standard character set as well as
indentation for blocking, it used variable amounts of white space
around operators to vary their precedence.

--
	Offer void except where prohibited by law.

jlg@lambda.UUCP (Jim Giles) (02/24/90)

From article <11911@nigel.udel.EDU>, by new@udel.edu (Darren New):
> In article <14245@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
>>I)   Recursive data types.
>>        As an example, consider a possible declaration of a binary tree
>>        data type (all examples are in a Fortran/C-ish syntax - that is,
>>              y = btree{3,x,btree{4,null,null}}
> 
> Ok.  Now how do I write the declaration of a function that returns 
> a pointer to the first sub-tree with the value four at its root
> without winding up with an aliased value?  AND how do I do this
> without using recursion?          -- Darren

Well, since ther aren't any pointers, a function which returns a pointer
is not a problem - it's impossible.  And, who said anything about not
using recursion?  I didn't.  I think recursion is a grand idea!

Actually, you didn't read my entire post.  I pointed out that recursive
data structures probably _SHOULD_ allow aliasing.  It was the only one
of the set of data structures I gave which did.  However, there are
implementations of recursive data structures which _DON'T_ allow
aliasing. (Many functional programming languages don't allow aliasing -
they most certainly _DO_ have recursion in both data and function
structures.  If you think about it, aliasing isn't an issue in functional
programming since assignment isn't allowed.)

The advantage of using recursive data structures instead of pointers is
mostly notational (no 'dereferencing' operator, no confusion between
pointers and the objects they point to, etc.).  This makes programs
easier to read and maintain.  Also, it helps the compiler to determine
that the _only_ thing a 'b_tree' (for example) can be aliased to is
another 'b_tree'.  There isn't even the possibility of a pointer 'cast'
accidentally (or deliberately) aliasing a 'b_tree' to a char string or
something.  This improves the compilers ability to optimize the code.

J. Giles

mike@umn-cs.cs.umn.edu (Mike Haertel) (02/24/90)

In article <14245@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
>I)   Recursive data types.
  [...]
>              type b_tree is recursive
>                 node   value
>                 b_tree left
>                 b_tree right
>              end type b_tree

This is reasonable, even elegant.

My one question is:  How do you do mutually recursive data structures,
rather than just diretly recursive ones?
-- 
Mike Haertel <mike@ai.mit.edu>
"We are trying to support small memory machines." -- Larry McVoy

jlg@lambda.UUCP (Jim Giles) (02/24/90)

From article <3528@tukki.jyu.fi>, by sakkinen@tukki.jyu.fi (Markku Sakkinen):
> In article <18172@megaron.cs.arizona.edu> mike@cs.arizona.edu (Mike Coffin) writes:
>> [...] Arrays and pointers are different beasts. [...]

This is the only part of Mike Coffin's submission which wasn't totally
misleading.  The question is, if arrays and pointers are different beasts,
why does C mangle them together?  Why doesn't it provide a mechanism
for un-mangling them once the damage has been done?

> A further correction for those readers not familiar with C:
> Giles's posting is a slight exaggeration. [...]

But _only_ a slight exaggeration.  Modularity and library use is _very_
important.  You can't do all your array manipulation with just globals
or restrict your work to the same procedure in which the arrays were
declared.  In fact, for programs of any really useful size, passing
array arguments is vital.  In this respect, at least, C really _DOESN'T_
have arrays.

> [...]
> The confusion between arrays and pointers is perhaps the worst
> single flaw in C.

I almost agree.  But C has an awful lot of flaws.  It's hard to choose
just this one as _the_ worst.

J. Giles