[comp.compilers] Enumerated data types.

mandel@forwiss.uni-passau.de (Luis Mandel) (08/23/90)

Hello world,

I'm studing types, in particular enumerated data types of programming
languages. For example using Pascal enumerated data types, we can write
the following:

type
	car_colours = (red, blue, brown, black);

and also we obtain pred, suc and ord as associated functions to the new
data type:

		 suc(red) = blue        and so on.

Then we can't define another enumerated data type that involves the
tokens red, blue, brown or black, because of these functions pred and
suc. For example, if it were possible to write

type 
    bike_colours = (orange, red, green, white);

it were an ambiguity with suc(red) because there are two possibilities.
The same is with functions pre and ord.

Now my question is: anybody knows if there are languages that allows
anything like

	car_colours = (red, blue, brown, black);
    bike_colours = (orange, red, green, white);

and have these functions defined with an extra parameter, for example:

	  suc (car_colour, red) = blue
	  suc (bike_colour, red) = green

Are there problems with this definition of enumerated data types? 
Could be there difficulties trying to specify the semantics of suc,
pred and ord?, Any suggestion?

Thanks in advance, Luis  (mandel@forwiss.uni-passau.de)

--
Luis Mandel                                mandel@forwiss.uni-passau.de
[I'd think that if you always qualified your names, you might as well
call them car_colour_red and bike_colour_red. -John]
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

moss@cs.umass.edu (Eliot Moss) (08/24/90)

What you suggest has indeed been done in the Modula-3 language. One can write
down two types such as (the syntax may not be exact; I don't have my Modula-3
Report here at home):

  TYPE t1 = ENUM red, black END;
       t2 = ENUM green, yellow, red END;

To mention the constants anywhere one must write t1.red, t1.black, t2.green,
t2.yellow, or t2.red. Thus there is no ambiguity. It is quite reasonable to
write SUCC (t2.green). There is an ORD function (ORD (t1.red) = 0, etc.)  and
a VAL function (VAL (t2, 1) = t2.yellow), etc. It is specifically illegal, and
must be detected at run-time, if an attempt is made to go out of range with
VAL, SUCC, PRED, etc. There are INC and DEC operators that work on
enumerations as well as integers; enumerations may be bit packed; and you can
ask for the number of values of an enumeration type or variable (NUMBER (t2) =
3). This all seems to work nicely.

Ada also allows multiple types to use the same name, and if a particular
enumeration constant identifier is ambiguous in context, then it can be
qualified (e.g., t2'red) to disambiguate. The Ada overloading may be a little
more convenient for writing, but the Modula-3 approach has the advantage that
adding a new enumeration type in a particular scope never breaks the existing
code.

Cheers!							Eliot

--
J. Eliot B. Moss, Assistant Professor
Department of Computer and Information Science
Lederle Graduate Research Center
University of Massachusetts
Amherst, MA  01003
(413) 545-4206; Moss@cs.umass.edu
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

perelgut@turing.toronto.edu (Stephen Perelgut) (08/24/90)

The Turing Programming Language (developed at the University of Toronto and
the subject of previous postings here) has an approach to the problem.
	type carColor : enum(red, blue, yellow, black)
	type bikeColor: enum(red, orange, yellow, green, blue)

Thus succ(carColor.red) is carColor.blue and never bikeColor.orange. 
Similarly, ord(carColor.red) = ord(bikeColor.red) only co-incidentally and
ord(carColor.blue) not= ord(bikColor.blue)

If you want more information on Turing, try emailing
    "distrib@turing.toronto.edu"
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

skrenta@amix.commodore.com (Rich Skrenta) (08/25/90)

>	car_colours = (red, blue, brown, black);
>	bike_colours = (orange, red, green, white);
>	suc (car_colour, red) = blue
>	suc (bike_colour, red) = green
 
> Are there problems with this definition of enumerated data types? 
> Could be there difficulties trying to specify the semantics of suc,
> pred and ord?, Any suggestion?

The problems aren't with suc() and pred(), but rather with determining
the type of "red" when the parser sees it in the source.  Consider:

	procedure foo (b: bike_colour)

	foo(red);

In order to determine which type "red" is, we need to consult the definition
of "foo".  Does Pascal do this for integer/real constants anyway?

	procedure bar (r: real)

	bar (12);	/* sent in an integer */

Would the 12 be coerced to type real, or will a Pascal compiler flag this
as an error?  If it does the coercion, I don't see a problem with overloading
enumerated type names.

If there was a problem, you could always lexically distinguish enumerated
type names.  Although it takes some getting used to, you could write

	bike_color(blue)

instead of just "blue".

Rich
--
skrenta@blekko.commodore.com
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

dik@cwi.nl (Dik T. Winter) (08/25/90)

(Couldn't mail because the mailer tells me host unknown.)

In article <1990Aug23.134826.2865@forwiss.uni-passau.de> mandel@forwiss.uni-passau.de (Luis Mandel) writes:
 > Now my question is: anybody knows if there are languages that allows
 > anything like
 > 
 > 	car_colours = (red, blue, brown, black);
 >     bike_colours = (orange, red, green, white);

Yes, Ada.

 > and have these functions defined with an extra parameter, for example:
 > 
 > 	  suc (car_colour, red) = blue
 > 	  suc (bike_colour, red) = green

No extra parameter is required in Ada.  The reason is that overload
resolution (the two functions suc overload each other) is not only done
on the type of parameter but also on the required type of the result.
I.e. in:
	ford_colour := suc(red)
it is known that ford_colour is of type car_colour and so it is known
that suc is the function on type car_colour.  There are some contexts
where this is not possible:
	colour_number: integer;
	function num(a: car_colour) return integer ...;
	function num(a: bike_colour) return integer ...;
now the statement:
	colour_number := num(red);
is ambigous because it is valid for both possible explanations of red.
In this case (but also in non-ambiguous cases) the programmer has the
possibility to indicate what type red is meant:
	colour_number := num(car_colour'(red));

Disambiguating this can be far from trivial, especially in complicated
expression.  However, I have read that somebody has proven that when you
scan the expression first from the leaves to the root, next back again
to the leaves, back again to the root and still again to the leaves
would resolve all non-ambigous expressions (and ambiguous expressions are
illegal in Ada).

-- 
dik t. winter, cwi, amsterdam, nederland
dik@cwi.nl
[Similar comments on Ada arrived from Doug Hahn <hahn@tekcrl.labs.tek.com>,
and Mike Murphy <murphy@mips.COM>.]
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

ok@goanna.cs.rmit.OZ.AU (Richard A. O'Keefe) (08/27/90)

In article <1990Aug23.134826.2865@forwiss.uni-passau.de>, mandel@forwiss.uni-passau.de (Luis Mandel) writes:
[asks about re-using the names of enumeration constants]
[and how to make PRED and SUCC and so on work with them]

In a word:  Ada.

Example from LRM 4.7:
	type MASK is (FIX, DEC, EXP, SIGNIF);
	type CODE is (FIX, CLA, DEC, TNZ, SUB);

	M: MASK;
	C: CODE;

	PRINT(MASK'(DEC));	-- DEC is of type MASK
	PRINT(CODE'(DEC));	-- DEC is of type CODE

	for J in CODE'(FIX) .. CODE'(DEC) loop ... end loop;
	-- one of the two bounds has to be qualified, the other needn't be:
	for J in CODE'(FIX) .. DEC loop ... end loop;
	for J in FIX .. CODE'(DEC) loop ... end loop;
	-- or you can do it as a subrange
	for J in CODE range FIX .. DEC loop ... end loop;

	M := FIX;		-- FIX is of type MASK
	C := FIX;		-- FIX is of type CODE

	if M = DEC then ... end if;	-- DEC is of type MASK
	if C = DEC then ... end if;	-- DEC is of type CODE
	if CODE'(FIX) = CODE'(DEC) then ... end if
	-- one of the two operands may omit the type qualifier

	M := MASK'SUCC(M);	-- uses the SUCC function proper to MASKs
	C := CODE'SUCC(C);	-- uses the SUCC function proper to CODEs

If one doesn't like using MASK' and CODE' all the time, one can say

	function SUCC(X: MASK) return MASK renames MASK'SUCC;
	function SUCC(X: CODE) return CODE renames CODE'SUCC;

and then
	M := SUCC(M);		-- uses MASK'SUCC
	C := SUCC(C);		-- uses CODE'SUCC

How does all this work?  Overloading.  Ada lets you have lots of different
things around with the same name, provided their types are sufficient to
tell them apart.  Overloaded name resolution has been described often in
SigPlan Notices and the like.

Algol 68 was the first language I met that allowed overloading, but from
the published discussions of the Algol 68 committee overloading was already
a well known idea then.  Anyone know where it first showed up?  (PL/I had
"generic" functions, was that the first?)  The new functional language
Haskell goes in for overloading in a big way.

The thing which made overloading tricky in Algol 68 was that Algol 68
combined overloading with automatic coercions.  Ada doesn't go in for
automatic coercion.
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

jejones@microware.com (James Jones) (08/27/90)

In article <1990Aug23.134826.2865@forwiss.uni-passau.de> Luis Mandel writes:
>[query about enumerated types with name clashes]

If I remember rightly, Russell has enumerated types.  Since constants in
Russell are linguistically 0-ary functions, figuring out whether a given
instance of "red" would in full Russell be bike_colours$red[] or
car_colour$red[] is a particular case of the type inference problem, and
the same kind of context (i.e. how is the result used?) would be used to
decide whether succ[red] is bike_colours$succ[bike_colours$red[]] or ...
well, you get the idea. :-)

The BSD version of Russell had a kind of convoluted but usually workable
heuristic for type inference.   Hans Boehm showed that the problem of type
inference is undecidable, but I think he's recently written a paper that
appeared in SIGPLAN Notices that perhaps gives a useful subset of the
problem that's easier to figure out.  I hope that a new Russell or language
like it appears that implements such work, because Russell is, IMHO, a
very spiffy language and merits study by any would-be language designer.

	James Jones
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

dik@cwi.nl (Dik T. Winter) (08/28/90)

In article <3621@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.OZ.AU (Richard A. O'Keefe) writes:
 > The thing which made overloading tricky in Algol 68 was that Algol 68
 > combined overloading with automatic coercions.  Ada doesn't go in for
 > automatic coercion.

That is incorrect.  Algol 68 allows overloading for operators only, and
coercions are not performed on operands of operators.  For instance in
the context of:
	'proc'('real','int')'real' + = ('real' r, 'int' i)'real': ....
the expression
	1.0 + 1.0
can still not be resolved, i.e. widening does not take place.  Similar
for other coercions.  The only kind of coercions allowed for operands
is 'dereferencing', but that is common to other languages (and implicitly
also in Ada).

Ada is much trickier because overloading not only depends on operand
types (as in Algol 68) but also on result type.
--
dik t. winter, cwi, amsterdam, nederland
dik@cwi.nl
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

grover@brahmand.Eng.Sun.COM (Vinod Grover) (08/28/90)

In article <3621@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.OZ.AU (Richard A. O'Keefe) writes:
>Algol 68 was the first language I met that allowed overloading, but from
>the published discussions of the Algol 68 committee overloading was already
>a well known idea then.  Anyone know where it first showed up?  ...

I believe that Christopher Strachey used the term "ad hoc polymorphism" to
refer to a breed of overloading. I *think* that was before Algol 68. 

Vinod Grover
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

corbett@lupa.Eng.Sun.COM (Robert Corbett) (08/29/90)

In article <141425@sun.Eng.Sun.COM> grover@brahmand.Eng.Sun.COM (Vinod Grover) writes:
>In article <3621@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.OZ.AU (Richard A. O'Keefe) writes:
>>Algol 68 was the first language I met that allowed overloading, but from
>>the published discussions of the Algol 68 committee overloading was already
>>a well known idea then.  Anyone know where it first showed up?  ...

>I believe that Christopher Strachey used the term "ad hoc polymorphism" to
>refer to a breed of overloading. I *think* that was before Algol 68. 

>Vinod Grover
--- 

Many languages that supported overloading were designed before ALGOL 68.
MAD supported user-defined operator overloading.  The operators were
defined by giving assembly code for their implementation.  Aad van
Wijgaarden's paper "a Generalization of ALGOL" described a language that
supported overloading through user-defined rewriting rules.  The form of
overloading provided in ALGOL 68 pales by comparison.

During the early development of ALGOL 68, the ALGOL committee planned to
produce two languages, ALGOL X and ALGOL Y.  ALGOL X was to be a small
change to ALGOL 60 to fix its few remaining bugs.  ALGOL Y was to be the
long-term solution.  ALGOL X originally did not support extensibility.
Eventually, the ALGOL committee decided to retrofit retrofit some of the
advanced features from ALGOL Y to ALGOL X.

					Yours truly,
					Bob Corbett
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

pjj@cs.man.ac.uk (Pete Jinks) (08/29/90)

mandel@forwiss.uni-passau.de (Luis Mandel) writes:

>anyone knows if there are languages that allows anything like
>	car_colours = (red, blue, brown, black);
>	bike_colours = (orange, red, green, white);

I realise that this is an answer to a different question, but in Pascal, in
this limited case, one could write:
	all_colours = (blue, brown, black, red, orange, green, white);
	car_colours = blue .. red;
	bike_colours = red .. white;

Of course, I have had to re-order the colours so that succ & pred now work
differently, but if you are just using them to cycle through all possibilities
it is still OK.
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

anw@maths.nott.ac.uk (Dr A. N. Walker) (08/29/90)

In article <2019@charon.cwi.nl> dik@cwi.nl (Dik T. Winter) writes:
[in Algol]
>	'proc'('real','int')'real' + = ('real' r, 'int' i)'real': ....
	^^^^^^ 'op'
> [...].  The only kind of coercions allowed for operands
>is 'dereferencing', [...]

	Also deproceduring and uniting, but not widening, rowing or voiding.
The reason is, of course, to prevent ambiguity -- you are not allowed to
overload operators with operands varying [eg] only in the reference levels,
but you are with operands varying [eg] only in row levels, so you can
disambiguate operands with the wrong reference levels, but not with the
wrong row levels.

	I suspect that at least rowing could have been moved into the other
camp [though there may be some subtleties involving the interaction between
rows and references], but pragmatically one may well want to overload, in
particular, "*" to mean different things for scalars, vectors and matrices.

-- 
Andy Walker, Maths Dept., Nott'm Univ., UK.
anw@maths.nott.ac.uk
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

kurt@tc.fluke.COM (Kurt Guntheroth) (08/29/90)

Ada allows enumerated types whose member names are overloaded.  You resolve
the ambiguity by specifying the type in ambiguous situations.   The syntax
is <type_specifier>'(<expression>) in Ada (though I'm sure Ada uses
different names for the metasymbols).

    car_colors is (red, blue, brown, black);
    bike_colors is (orange, red, green, white);
	 . . .

    x := car_colors'(red);

There was a certain amount of discussion of how this ought to be done in
SIGPLAN Notices in 1981 or so, but I havn't bothered to go look them up.  As
I remember, the papers were mostly picking on Ada because it was very stylish
to pick on Ada back then.  Now it is stylish simply to ignore Ada.
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

compilers-sender@esegue.segue.boston.ma.us (08/31/90)

dik t. winter (dik@cwi.nl) wrote:
>In article <1990Aug23.134826.2865@forwiss.uni-passau.de> mandel@forwiss.uni-passau.de (Luis Mandel) writes:
>> Now my question is: anybody knows if there are languages that allows
>> anything like
>> 
>> 	car_colours = (red, blue, brown, black);
>>     bike_colours = (orange, red, green, white);
>
>Yes, Ada.
>
>> and have these functions defined with an extra parameter, for example:
>> 
>> 	  suc (car_colour, red) = blue
>> 	  suc (bike_colour, red) = green
>
>No extra parameter is required in Ada.  The reason is that overload
>resolution (the two functions suc overload each other) is not only done
>on the type of parameter but also on the required type of the result.
>I.e. in:
>	ford_colour := suc(red)
>it is known that ford_colour is of type car_colour and so it is known
>that suc is the function on type car_colour.  
> ...

This is not true, because 'suc' is neither a function nor an operator in Ada,
it is an attribute.

Thus (translating Luis Mandel's example into Ada) we have:

    type CAR_COLOURS is (RED, BLUE, BROWN, BLACK);
    type BIKE_COLOURS is (ORANGE, RED, GREEN, WHITE);

and

    CAR_COLOURS'SUCC(RED) = BLUE
    BIKE_COLOURS'SUCC(RED) = GREEN

This shows that the SUCC attribute (whose value is a function) must be
qualified by the type (or subtype) of value to which it is to be applied.

Dik's other comments about the resolution of expressions containing 
enumeration values are quite correct.

Mike.


Michael P. Harrison - Software Group - Inmos Ltd. UK.
UK : mph@inmos.co.uk, US : mph@inmos.com
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.