[comp.lang.scheme] ML-like type-checker for Scheme subset?

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (05/13/91)

I am pushing for Scheme to be adopted as the (or an) introductory
programming language here.  All of the essential programming concepts
can be presented using Scheme, with the exception of two:

    -- the idea that programming language implementations are entitled
       to quietly deliver incorrect results (Pascal implementations
       are often like this, and C implementations almost always are)

    -- static type checking.

It occurred to me that static type checking could be introduced _after_
the other concepts have been presented, and that an ML-style type checker
would suit Scheme rather well.  There are a handful of functions like memq
which would require union types, but well-typed replacements are easy to
define.

Has anyone already done this?  I _could_ write the code myself, but it
would be a pity to re-invent the wheel.

-- 
Bad things happen periodically, and they're going to happen to somebody.
Why not you?					-- John Allen Paulos.

Chris Haynes <chaynes@sunfish.cs.indiana.edu> (05/13/91)

>Type theory confuses me.
>How does it fits into a language's semantics?
>I see two reasonable, yet mutually exclusive,
>
>
>A) Some have explained typed functional languages
>   as being based on a _typed lambda calculus_.
>...
>
>B) Others have suggested that a type discipline is
>   `imposed' on a language (i.e. that it is
>   not actually _part_ of the language,
>   but is rather a notation for the user
>   to embed statically-checkable invarient assertions).
>...

A and B *may* be mutually exclusive.  In some contexts the model theory of
type systems is studied without regard for the problems of static type
analysis (A), and at other times static type disciplines are studied
without regard for semantics models (B).  But in the case of most practical
type systems views A and B *can* coexist.  That is, a type system may have
both a model theory and a decidable syntactic deduction system.  In this
case one hopes for completeness and consistency theorems, which establish
that these two views are suitably related.

Chris Haynes

fs@caesar.cs.tulane.edu (Frank Silbermann) (05/14/91)

In article <5710@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes:
>	I am pushing for Scheme to be adopted as the
>	(or an) introductory programming language here.
>	The essential programming concepts can be presented
>	with the exception of two:
>
>    -- the idea that programming language implementations are entitled
>       to quietly deliver incorrect results (Pascal implementations
>       are often like this, and C implementations almost always are)
>
>    -- static type checking.
>
>	Static type checking could be introduced _after_
>	the other concepts have been presented,
>	and an ML-style type checker would suit Scheme rather well.
>	There are a handful of functions like memq which would require
>	union types, but well-typed replacements are easy to define.
>	Has anyone already done this?


Type theory confuses me.
How does it fits into a language's semantics?
I see two reasonable, yet mutually exclusive,
interpretations:


A) Some have explained typed functional languages
   as being based on a _typed lambda calculus_.

   If we interpret lambda calculus as a functional notation,
   then an assertion that some lambda expression has type A->B
   means that `A' is the name of the function's input domain,
   and `B' the name of its codomain.

   Therefore, the word `type' must be synonymous with
   `domain' or, more specifically, `pointed complete partial order'.
   (Yet, most authors neglect to develop this association,
    or even to mention the idea of a partial ordering.)


B) Others have suggested that a type discipline is
   `imposed' on a language (i.e. that it is
   not actually _part_ of the language,
   but is rather a notation for the user
   to embed statically-checkable invarient assertions).

   Such a language would be based on a lambda calculus
   of a single type (i.e. untyped).
   To prevent use of undefined operations,
   the type system permits the user to limit
   a function's application to just one of several
   component subdomains.


I get confused when a author neglects to state
which of the two interpretations he is using,
or worse yet, if he alternates from one to the other.
Unfortunately, this confusion is very common.

	Frank Silbermann	fs@cs.tulane.edu
	Tulane University	New Orleans, Louisiana  USA

chaynes@sunfish.cs.indiana.edu (Chris Haynes) (05/14/91)

    It occurred to me that static type checking could be introduced _after_
    the other concepts have been presented, and that an ML-style type checker
    would suit Scheme rather well.  There are a handful of functions like memq
    which would require union types, but well-typed replacements are easy to
    define.

    Has anyone already done this?  I _could_ write the code myself, but it
    would be a pity to re-invent the wheel.

With the help of a student, Hsianlin Dzeng, I have done this.  The
resulting typed dialect of Scheme is called Infer.  It supports most
features of the ML type system, less modules, plus a few additional
features.  I'm pleased with the result, and it has already proved useful at
Indiana University in denotational semantics and types courses.  It may
someday be used in introductory courses as well, but not until it's been
developed further and tutorial material has been written.  Currently the
implementation (about 40 pages of Infer code) still has a lot of rough
edges and I've got fair sized list of features I'd like to add.  My hope is
to contribute a version to the Scheme yellow pages for general use next
spring.

Chris Haynes

jrk@sys.uea.ac.uk (Richard Kennaway) (05/14/91)

In <5710@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes:

>    -- the idea that programming language implementations are entitled
>       to quietly deliver incorrect results (Pascal implementations
>       are often like this, and C implementations almost always are)

What are you referring to here?

Richard Kennaway          SYS, University of East Anglia, Norwich, U.K.
Internet:  jrk@sys.uea.ac.uk		uucp:  ...mcsun!ukc!uea-sys!jrk

rockwell@socrates.umd.edu (Raul Rockwell) (05/15/91)

Frank Silbermann:
   Type theory confuses me.  How does it fits into a language's
   semantics?  I see two reasonable, yet mutually exclusive,
   interpretations:

      the word `type' must be synonymous with `domain' or, more
      specifically, `pointed complete partial order'.  (Yet, most
      authors neglect to develop this association, or even to mention
      the idea of a partial ordering.)

      Others have suggested that a type discipline is `imposed' on a
      language (i.e. that it is not actually _part_ of the language,
      but is rather a notation for the user to embed
      statically-checkable invarient assertions).

   I get confused when a author neglects to state which of the two
   interpretations he is using, or worse yet, if he alternates from
   one to the other.  Unfortunately, this confusion is very common.

Well, after participating for the last month in the comp.lang.misc
"dynamic typing" (vs. static typing) threads, I almost hesitate to
follow up on this (give me another week :-).

A part of the confusion lies in the tendency of a language implementer
to think both in terms of the implemented language and the target
language.  Quite often, types which are different in one are not
different in the other.  (Though you could easily argue that for such
confusion to exist the implementer hasn't thought things through well
enough).

Anyway, the Right Orthodox View of Types (based on current compiler
technology and what I've been told on comp.lang.misc) seems to be that
the above two cases (type as domain vs type as static assertion) are
synonomous, and that there are no practical differences between the
two.

Your milage will vary.

Raul Rockwell

fs@caesar.cs.tulane.edu (Frank Silbermann) (05/16/91)

>>		A) Some have explained typed functional languages
>>		   as being based on a _typed lambda calculus_.
>>	      	   ... the word `type' must be synonymous with `domain',
>>
>>		B) Others have suggested that a type discipline is
>>		   `imposed' on a language (i.e. that it is
>>		   not actually _part_ of the language,
>>		   but is rather a notation for the user
>>		   to embed statically-checkable invarient assertions).

chaynes@sunfish.cs.indiana.edu (Chris Haynes):
>	In some contexts the model theory of type systems
>	is studied without regard for the problems
>	of static type analysis (A),


(I.e., describing the meaning of functional expressions 
that are _assumed_ to be well-typed)


>	and at other times,static type disciplines are studied
>	without regard for semantics models (B).


Agreed.
If you can both describe the model theory
_and_ perform the static type analysis,
the result is a typed functional programming language
which fits paradigm A).


rockwell@socrates.umd.edu (Raul Rockwell) adds:
>	type as domain vs type as static assertion
>	seem to be synonomous, with no practical differences.

The difference has to do with
the interpretation of type clashes.

If types are domain names (A),
then a type-clash is mathematical _nonsense_.
Under this paradigm, a functional language
can include _only_ well-typed programs.
Ill-typed programs have no meaning.
This severely hampers the usefulness of runtime typechecking,
since the program is invalid _even if_ the type clash
occurs in an unevaluated subexpression.

In paradigm (B),
a type clash _does_ have meaning in the language
(even if that meaning is BOTTOM or ERROR).
Instead of deciding mathematical validity,
the type system weeds out (valid) programs
which are viewed as _undesirable_ (i.e. those
not likely to have captured the programmer's intent).
Because well-typing is merely desired
(but not mathematically essential),
it need not be "strong."  The choice between
dynamic and static typing becomes a tradeoff
between flexibility versus efficiency and safety.

In conclusion, the determination of whether the type signature
names a domain or whether it screens out undesirable portions
of the domain has GREAT bearing upon:

	i)  the language's model theory;
and 
	ii) the validity of dynamic typechecking.

Frank Silbermann	fs@cs.tulane.edu
Tulane University	New Orleans, Louisiana  USA

Chris.Holt@newcastle.ac.uk (Chris Holt) (05/16/91)

fs@caesar.cs.tulane.edu (Frank Silbermann) writes:

>Type theory confuses me.
>How does it fits into a language's semantics?

Do you include the operators used to generate the values of a type
in with that type?

>I see two reasonable, yet mutually exclusive,
>interpretations:

>A) Some have explained typed functional languages
>   as being based on a _typed lambda calculus_.

>   If we interpret lambda calculus as a functional notation,
>   then an assertion that some lambda expression has type A->B
>   means that `A' is the name of the function's input domain,
>   and `B' the name of its codomain.

where a function is a triple
        <domain, codomain, pairs>
I assume you mean.  Okay so far.

>   Therefore, the word `type' must be synonymous with
>   `domain' or, more specifically, `pointed complete partial order'.
>   (Yet, most authors neglect to develop this association,
>    or even to mention the idea of a partial ordering.)

Are you requiring your functions to be monotonic?  Or is this
a red herring?

>B) Others have suggested that a type discipline is
>   `imposed' on a language (i.e. that it is
>   not actually _part_ of the language,
>   but is rather a notation for the user
>   to embed statically-checkable invarient assertions).

>   Such a language would be based on a lambda calculus
>   of a single type (i.e. untyped).
>   To prevent use of undefined operations,
>   the type system permits the user to limit
>   a function's application to just one of several
>   component subdomains.

Is this the same as saying that the type definition
        f: A -> B
means that f is prefixed with a retract for A, and postfixed
with a retract for B?  Or is this what you meant with the
typed version in (A) above?

As far as I can see, the difference between your alternatives
is whether an application to a value not in the domain returns
bottom or is undefined, in the sense of being outside the system.
Personally I have no trouble with
        3 + true
yielding bottom, but it worries some people. Is this what you mean?


-----------------------------------------------------------------------------
 Chris.Holt@newcastle.ac.uk      Computing Lab, U of Newcastle upon Tyne, UK
-----------------------------------------------------------------------------
 "And when they die by thousands why, he laughs like anything." G Chesterton

rockwell@socrates.umd.edu (Raul Rockwell) (05/16/91)

Frank Silbermann:
   If types are domain names then a type-clash is mathematical
   _nonsense_.  Under this paradigm, a functional language can include
   _only_ well-typed programs.  Ill-typed programs have no meaning.
   This severely hampers the usefulness of runtime typechecking, since
   the program is invalid _even if_ the type clash occurs in an
   unevaluated subexpression.

Now I'm confused.

Simple example: x % (y - z) where % indicates division.  It seems to
me that (A) this program has meaning, and (B) there is the potential
for a type clash (y - z might be outside the domain of %).  It also
seems to me that this might be an unevaluated subexpression for
conditions where the type clash occurs.

Color me perplexed.

Raul Rockwell

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (05/17/91)

In article <1065@sys.uea.ac.uk>, jrk@sys.uea.ac.uk (Richard Kennaway) writes:
> In <5710@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes:
> >    -- the idea that programming language implementations are entitled
> >       to quietly deliver incorrect results (Pascal implementations
> >       are often like this, and C implementations almost always are)
> 
> What are you referring to here?

In most Fortran, Pascal, C, ... implementations I have used,
you can add two positive numbers and get a negative number.
If you read the Fortran, Pascal, and C standards with sufficient
care and attention, you will see that quietly delivering the
wrong answers to integer arithmetic operations like this is _legal_.
Implementations are _also_ free to report overflows (nice one, MIPS!)
but often don't.  Indeed, some systems makes it hard to _notice_ the
overflow.  (For example, one Pascal compiler I used on a 68010
_did_ generate code to trap on overflows, but the support routines
for 32-bit multiplication and division forgot to set the V flag.)

I like rationals, but I *love* bignums.
-- 
There is no such thing as a balanced ecology; ecosystems are chaotic.

ted@nmsu.edu (Ted Dunning) (05/18/91)

In article <5808@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes:

   In most Fortran, Pascal, C, ... implementations I have used,
   you can add two positive numbers and get a negative number.
   If you read the Fortran, Pascal, and C standards with sufficient
   care and attention, you will see that quietly delivering the
   wrong answers to integer arithmetic operations like this is _legal_.
   Implementations are _also_ free to report overflows (nice one, MIPS!)
   but often don't.  Indeed, some systems makes it hard to _notice_ the
   overflow.  (For example, one Pascal compiler I used on a 68010
   _did_ generate code to trap on overflows, but the support routines
   for 32-bit multiplication and division forgot to set the V flag.)



the terribly sad thing is that there are really two camps, one that
assumes that doing arithmetic modulo 2^32 is natural, and one that
assumes it is a *SIN*.

the result is that when programming in c, i miss bignums in a terrible
way, but when programming in prolog or lisp and trying to `simulate'
real hardware, i take a _big_ hit in performance because i have to do
a silly and redundant mod operation.


why can't both sides realize that both options are important.

--

if feeling bad is a justification for not living, 
    then living is a justification for feeling good.

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (05/18/91)

In article <TED.91May17160826@kythera.nmsu.edu>, ted@nmsu.edu (Ted Dunning) writes:
> the terribly sad thing is that there are really two camps, one that
> assumes that doing arithmetic modulo 2^32 is natural, and one that
> assumes it is a *SIN*.
> 
> the result is that when programming in c, i miss bignums in a terrible
> way, but when programming in prolog or lisp and trying to `simulate'
> real hardware, i take a _big_ hit in performance because i have to do
> a silly and redundant mod operation.
> 
> why can't both sides realize that both options are important.

But the Lisp side *does* realise that both options are important.
Any number of Lisp systems provide something like, oh,
	(%fixnum-+ e1 ... en)
	(%fixnum-* e1 ... en)
	(%fixnum-< e1 ... en)
and so on.  Do, however, remember that there is nothing particularly
natural or "real" about 32-bit integers.  Due to my particular history,
I have come to think of 40-bit (B6700, sign and magnitude, integers
look like floats with exponent 0, hence the missing 8 bits), 24-bit
(B1700), 16-bit (CAI Alpha-LSI-2), and 36-bit (DEC-10) integers are
"natural" and "real".  This is _hardware_ I'm talking about.  (Actually,
the B6700 could support 79-bit integers, but one normally didn't bother.)
In a few years I expect to learn to think of 64 bits as the "obvious" size
for hardware "integers".  I've also had to live with programming languages
which implemented 14-bit, 28-bit, 29-bit, and 30-bit integers, as well
as one rather nice system where SMALLPs were 17 bits (love that Xerox 1109).

There is precisely *one* "major" current language which comes even close
to realising that both options are important (correct implementation of
integer arithmetic + access to the restricted hardware), and that is C.
"unsigned" arithmetic in C is defined to be 2s-complement, and must _not_
signal overflow.  "signed" arithmetic in C is defined as integer arithmetic;
the fact that it _may_ signal overflow comes as an unpleasant surprise to
many C programmers.

Note that in many Lisp implementations, full machine-word-length fixnums
are nearly as expensive to manage as bignums, because they'd really like
to "borrow" a few bits for a tag.

I don't think anybody in the "please give us correct answers" camp would
call it "a *SIN*" to do arithmetic modulo 2^32.  The sin is to make the
false, deceptive, lying, and risky claim that doing so is INTEGER
arithmetic.  Have all the modular arithmetic you want, just don't mistake
it for *integer* arithmetic.

A further point about the naturalness of modular arithmetic:
on the B6700 where I learned to program, integer arithmetic was NOT
modular.  If the hardware couldn't represent the right answer, you
got an "integer overflow" interrupt.  There weren't _any_ instructions
for modular arithmetic.  And do you know what?  Nobody solving their
physics problems, or their chemistry problems, or their OR problems, or
their mathematics problems, or their accounting problems, ..., ever missed
modular arithmetic.  Indeed, people who moved their programs to the B6700
from an IBM/360 often discovered that the results they had been getting in
the past were wildly wrong.

The rising number of requests for C routines to do 64-bit or 128-bit
integer arithmetic on comp.lang.c suggests that many C programmers aren't
finding "modulo 2^32" all that natural either.


Something worth noting:  the "mod" operator is NOT silly (it is after all
what you _want_) nor is it redundant (the Lisp system in which you are
asking for "mod 2^32" may prefer to work in units of 30 bits) nor need it
be a cause of inefficiency.  This is what the C crowd call "a quality-of-
implementation issue"; in the presence of suitable declarations (which are
not _standard_ in Scheme, but are available in several Scheme implementations;
CL has an entirely adequate declaration system) a compiler could notice that
the operation you want can be done directly by the hardware.  No doubt
someone will write in to tell us of a Scheme or CL compiler (Python?) which
does this already.

-- 
There is no such thing as a balanced ecology; ecosystems are chaotic.

fs@caesar.cs.tulane.edu (Frank Silbermann) (05/22/91)

>>   If types are domain names then a type-clash is mathematical
>>   nonsense.  Under this paradigm, a functional language can include
>>   _only_ well-typed programs.  Ill-typed programs have no meaning.
>>   This severely hampers the usefulness of runtime typechecking, since
>>   the program is invalid _even if_ the type clash occurs in an
>>   unevaluated subexpression.

<ROCKWELL.91May15223913@socrates.umd.edu> rockwell@socrates.umd.edu 
Raul Rockwell:
>
>	Simple example: x % (y - z) where % indicates division.
>	It seems to me that
>		(A) this program has meaning, and
>		(B) there is the potential for a type clash
>			(y - z might be outside the domain of %).

One would like to say
that the output domain of `y-z' is the set of integers,
and that the input domain of `%' is the set of nonzero integers.
Alternatively, one might say that `%' is a _partial_ function.

Overlapping domains and partial functions are mathematically
difficult to work with, and  ordinary typed lambda calculus
doesn't seem appropriate.

Domain theory solves this problem by using a new BOTTOM value
to trivially extend partial functions into total functions.
E.g. (x % 0) is given the value BOTTOM.
This converts the problem for a question of,
	"Does this expression make sense, mathematically?"
to
	"Does this expression have a nontrivial meaning?"

Whether you call division-by-zero a type error depends
on whether your types represent domains
(in which case division by zero is not a type error),
or whether they represent computer checkable assertions,
inserted as a runtime debugging aid (in which case
division by zero _would_ be a type error).

My point is that these two concepts have different
logical implications.

	Frank Silbermann	fs@cs.tulane.edu
	Tulane University	New Orleans, Louisiana  USA