[comp.lang.misc] Optional static typing limits

liberte@ncsa.uiuc.edu (Daniel LaLiberte) (04/17/91)

I wonder if we can raise the level of the dicussion 
on static vs dynamic types to consider more generally where
static typing fails to be sufficient or convenient, and why.

I am also interested in the various ways languages that support
*optional* static typing limit the extent of static typing one can do,
and why.  First of all which languages support optional static typing?

I would guess that static typing is made optional in some languages
because the designers recognized that static typing does not handle
all typing situations, and thus dynamic typing (or, alternatively,
C-like void typing) is a useful analog of the goto statement.
But given optional static typing, the language designers could then 
get lazy and not provide sufficient static typing capability for even
well known situations.

One case of optional static typing that I am aware of is in
Objective-C.  And one limitation I have come up against is the static
typing of mutually recursive instance variable types.  An instance
variable in each class is of type pointer to an instance of the other
class.  The problem comes in somehow declaring each class before the
other is declared.  Here is an example:

----- file A.h ----------
// interface of class A

#include "B.h"   // A needs to include B's interface

@interface A:Object
{
	B *b;   // instance variable b is a pointer to B
	...
}

----- file B.h ----------
// interface of class B

#include "A.h"   // B needs to include A's interface  <-- *ERROR*

@interface B:Object
{
	A *a;   // instance variable a is a pointer to A
	...
}

In C++ you could handle this situation by declaring "extern class A"
and "extern class B" instead of including A.h and B.h.  Objective-C
provides no such solution, that I could discover at least.  One is
forced to declare either a or b as just a generic "id" to break the
circularity.  This is simply a silly limitation of the language, but I
am curious to learn of other such examples.

I suppose the problem of hetrogeneous arrays (a realistic application
is where the types of elements and the functions to be applied to them
are not known until runtime) is another area where static typing
systems fall short of the ideal.  A union of types would be better
when it is appropriate than requiring either dynamic typing or
unwieldy static typing and (most likely) multiple inheritance.

Homogeneous generic types are not handled well in some languages,
in that the base type(s) may not be specified so that instances are
suitably constrained.

Here are some more examples:

From: brm@neon.Stanford.EDU (Brian R. Murphy)
> For example, I can't I write a function which returns
> either a boolean or an integer in a complex way.  I can't write my own
> Y combinator.  I can't write a function which allows either a sequence
> of functions which range over numbers or a sequence of numbers as an
> argument.

What are some other categories of problems that don't fit in existing
static typing systems?  Is it possible to come up with a "fix" for
these problems in the ideal language, or will dynamic typing always be
"necessary", like goto is.  Can we organize these and maybe learn
something about the problem?

Dan LaLiberte
National Center for Supercomputing Applications
liberte@ncsa.uiuc.edu

strom@watson.ibm.com (Rob Strom) (04/17/91)

In article <LIBERTE.91Apr16164831@babbage.ncsa.uiuc.edu>, liberte@ncsa.uiuc.edu (Daniel LaLiberte) writes:
|> I wonder if we can raise the level of the dicussion 
|> on static vs dynamic types to consider more generally where
|> static typing fails to be sufficient or convenient, and why.
|> 
|> I am also interested in the various ways languages that support
|> *optional* static typing limit the extent of static typing one can do,
|> and why.  First of all which languages support optional static typing?

I'll answer this question for the Hermes language.

Hermes provides both strong static typing and strong dynamic typing.
That is, it is never possible to apply an operation to an operand
of an inappropriate type.  This guarantee is sometimes enforced
by compile-time checking (static), and sometimes by run-time 
checking (dynamic).  Which technique is used is the programmer's
choice --- in this sense static typing is "optional".

However, I will candidly admit that Hermes is biased towards 
the static approach.  That is, the language design assumes that
most of the time, a variable will: (1) hold values of only a single
type, and (2) that type will be known at compile-time.  The cases
in which the type is not known statically are assumed to occur
less frequently.  The constructs for manipulating dynamically
typed data are more verbose.  This bias comes partially from
the historical tradition of the language, and partially from
the intended application domain:  large, long-lived, multi-application
systems, in which ease of maintenance is more important than
ease of initial writing.  In that domain, type declarations
serve as a form of machine-readable documentation.


|> 
|> I would guess that static typing is made optional in some languages
|> because the designers recognized that static typing does not handle
|> all typing situations, and thus dynamic typing (or, alternatively,
|> C-like void typing) is a useful analog of the goto statement.
|> But given optional static typing, the language designers could then 
|> get lazy and not provide sufficient static typing capability for even
|> well known situations.
|> 
|> [examples from Objective-C and C++]
|> 
|> What are some other categories of problems that don't fit in existing
|> static typing systems?  Is it possible to come up with a "fix" for
|> these problems in the ideal language, or will dynamic typing always be
|> "necessary", like goto is.  Can we organize these and maybe learn
|> something about the problem?
|> 
I agree that there will always be problems for which static typing
will be inadequate and therefore that an escape into dynamic typing
will be necessary.  I don't see the need for "language wars" between
proponents of static vs. dynamic typing.  When static typing is
possible, I prefer it, because I always prefer errors to be caught
before a program module begins executing rather than in the middle
of execution.  When static typing is impossible, I prefer dynamic typing
over escaping from type-checking altogether.  Had Hermes a different
application domain, we might have chosen to make dynamic typing
the default and static the exception, rather than the other way around.

Here are some applications for which Hermes uses dynamic typing rather than static:

o On-the-fly program creation:  Hermes has a data type "program" whose value
  is a syntax-free representation of a Hermes program.  Program values
  can be built at run-time and then instantiated as processes
  with a particular initialization interface.  It is checked at instantiation
  time that the program you have built is type consistent with the
  initialization interface you expect.

o Name servers:  In Hermes, the right to access a service is encoded as
  a value of a particular "output port" type.  A name server is a process
  which, in its simplest form, owns a table of pairs <name, port>.
  This table is typically heterogeneous since the "port" components
  will be of different types, representing the different services
  being named.

o File systems:  This is a situation similar to name servers.  A file
  system is a table of pairs <name, thing>, but the <thing>s are of
  different types.

Here is a situation in which Hermes can use static typing where some other
languages cannot:  Suppose I am managing a collection of operator consoles.
These operator consoles have heterogeneous protocols:  some are ASCII
glass teletypes, some are bitmap displays, some are "windows", etc.
However, all support a standard *interface* for printing and reading
lines, prompting, etc.  In Hermes these console "objects" (called
"processes" in Hermes) are visible only through their ports.  Since
the ports have the same interface, they appear to the console
manager as a homogeneous collection.   The fact that these console
objects have different internals does *not* make them different types.
In some other languages, they *would* look heterogeneous.  For
example, they would be different task types in Ada, and different
classes in Smalltalk.  Hermes would not need dynamic typing to
handle a case like this.

Your Objective-C problem has no analog in Hermes.

Please note that followups are directed to comp.lang.misc.
-- 
Rob Strom, strom@ibm.com, (914) 784-7641
IBM Research, 30 Saw Mill River Road, P.O. Box 704, Yorktown Heights, NY  10958

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (04/18/91)

In article <LIBERTE.91Apr16164831@babbage.ncsa.uiuc.edu> liberte@ncsa.uiuc.edu (Daniel LaLiberte) writes:
> What are some other categories of problems that don't fit in existing
> static typing systems?  Is it possible to come up with a "fix" for
> these problems in the ideal language, or will dynamic typing always be
> "necessary", like goto is.  Can we organize these and maybe learn
> something about the problem?

Amazing! Someone else is actually trying to solve problems and make
programming easier, instead of arguing about exactly how hard it is!
I like this guy's attitude.

I propose we focus on just one problem: What should a type system look
like in order to allow everything people want to do with types? In other
words, what kinds of constraints should the programmer be able to
express upon objects (in the general sense, not the OO sense) in order
to achieve both safety and generality?

Let's not argue whether dynamic typing is necessary. Let's just do what
we have to so that it's obviously unnecessary. (Is that too upbeat for
comp.lang.misc readers?)

As a starting point: Ben, could you describe the Russell type system in
more detail, or point us to references? I think the basic idea makes
sense. If you apply certain operations to a value inside function f,
then f should specify the value as anything subject to those operations.

And, as a first sample problem to be solved by a typing system: How do
we write a generic, type-safe sorting routine? What's the best that
available languages can do to make this work? Remember, type-safe means
that the sort routine will never tell you ``can't apply < to glorps'' at
run-time, as a dynamically typed system would.

We may have to separate the issues of typing for storage management and
typing for typechecking/security/etc.

Followups to comp.lang.misc.

---Dan

chased@rbbb.Eng.Sun.COM (David Chase) (04/18/91)

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>I propose we focus on just one problem: What should a type system look
>like in order to allow everything people want to do with types? In other
>words, what kinds of constraints should the programmer be able to
>express upon objects (in the general sense, not the OO sense) in order
>to achieve both safety and generality?
>
>Let's not argue whether dynamic typing is necessary. Let's just do what
>we have to so that it's obviously unnecessary. (Is that too upbeat for
>comp.lang.misc readers?)

It might be overambitious to make it "unnecessary".  "Rarely
necessary" should suffice.  Unless you want to make a religious issue
out of it, you might make some allowances for what could be done by a
suitably-designed optimizer (or type-inference engine) instead of
unduly burdening the programmer in the name of mandatory efficiency.

Anyhow, check out "Type-dependent Parameter Inference" by Cormack and
Wright, in the June 1990 SIGPLAN Notices (conference proceedings of
PLDI).  I read it recently, and found it quite interesting.  I didn't
understand all of it, but I'm working on it.

See also "Typeful Programming" by Cardelli, Digital Systems Research
Center report # 45.  Their address (since this might not be in your
local library) is

  Digital Systems Research Center
  130 Lytton Avenue
  Palo Alto, CA 94301

You might also look at Report #10, "A Polymorphic \lambda-calculus
with Type:Type".  Same author.

William Cook (formerly of HP, now at Apple) may have published
something on this subject in the last couple of years, but the last
time I remember asking him about it, he told me to go read Cardelli's
papers.

Here's a list of the Russell papers:

  Two by Demers and Donahue in the 1980 POPL.

  Boehm, Demers & Donahue, "An informal description of Russell", Cornell
  CS department TR 80-430.

  Demers & Donahue, "Data Types are Values", Cornell CS dept TR 79-393
  (possibly also a Xerox PARC blue&white -- I'm misplaced my copy of
  this)

  Demers & Donahue, "The semantics of Russell, an exercise in abstract
  data types". Cornell CS TR 80-431.

  Hook, "Understanding Russell - A first attempt", Semantics of Data
  Types (Proceedings), Springer-Verlag Lectuer Notes in Computer Science
  #173.

  Boehm, Demers & Donahue, "A Programmer's Introduction to Russell",
  Rice CS tech report TR85-16.

  Boehm, "Type Inference in the Presence of Type Abstraction", SIGPLAN
  Notices, Jul7 1989 (PLDI Conference proceedings).

If you're serious about type systems and type theory, go read these
papers.  The authors put a lot of thought into them, and the authors
are not stupid people.  And no, I won't digest them for the net in ten
easy installments - I don't have the time, and that's not what I'm
paid to do.

David Chase
Sun

lerman@stpstn.UUCP (Ken Lerman) (04/18/91)

In article <LIBERTE.91Apr16164831@babbage.ncsa.uiuc.edu> liberte@ncsa.uiuc.edu (Daniel LaLiberte) writes:
|>
|>----- file A.h ----------
|>// interface of class A
|>
|>#include "B.h"   // A needs to include B's interface
|>
|>@interface A:Object
|>{
|>	B *b;   // instance variable b is a pointer to B
|>	...
|>}
|>
|>----- file B.h ----------
|>// interface of class B
|>
|>#include "A.h"   // B needs to include A's interface  <-- *ERROR*
|>
|>@interface B:Object
|>{
|>	A *a;   // instance variable a is a pointer to A
|>	...
|>}
|>
|>Dan LaLiberte
|>National Center for Supercomputing Applications
|>liberte@ncsa.uiuc.edu

A (not very nice) way of handling this in Stepstone's Objective-C
compiler is to write struct __objcA *a and struct __objcB *b to
declare the instance variables.

A better way would be for the compiler to permit @interface A {} @end
to be an incomplete definition of class F which could be completed
later on.

A further deficiency of Objective-C regarding automatic and static
instances of objects (those that are declared Object foo, instead of
Object *foo) is that the compiler doesn't automatically call an
initializer for them so that only the isa pointer is set.  This
implies that the writer of the class must know that it will be used
this way.

I personally don't use them and suggest that others avoid them, also.


Ken

Disclaimer: In case you haven't noticed, I work for Stepstone.  I
don't speak for them in this matter.

lerman@stepstone.com

sfk@otter.hpl.hp.com (Steve Knight) (04/18/91)

Dan writes:
> I wonder if we can raise the level of the dicussion 
> on static vs dynamic types to consider more generally where
> static typing fails to be sufficient or convenient, and why.

My view is that the key issues for convenience are arbitrary unions and,
by implication, arbitrary subtyping.  The advantage of dynamic typing is
essentially that any collection of types is itself a valid type & that
any collection of types can be extended so as to become a super-type or
restricted to form a sub-type.

He also asks:
> I am also interested in the various ways languages that support
> *optional* static typing limit the extent of static typing one can do,
> and why.

The obvious cost of this is that type inference is severely restricted.
Furthermore, it is not possible (in general) to find a most general type
for an expression in such a system.

Steve

anw@maths.nott.ac.uk (Dr A. N. Walker) (04/19/91)

In article <8493:Apr1719:30:0791@kramden.acf.nyu.edu>
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:

>And, as a first sample problem to be solved by a typing system: How do
>we write a generic, type-safe sorting routine? What's the best that
>available languages can do to make this work?

	I quote [without permission, and somewhat edited] from Charles
Lindsey's paper on modals, Algol Bulletin #37, p28, July 1974:

[Start of quote.]

Example 1 -- A (very slow) sort routine.

    proc sort = (mode z, ref [] z vec, proc (ref z, ref z) bool swapped)
           void: for i from upb vec by -1 to lwb vec + 1
                  do for j from i to upb vec
                        while swapped (vec[j-1], vec[j]) do skip od
                  od;

    [For example ...]

	mode person = struct (string name, int age);
	proc ageswap = (ref person lo, hi) bool:
		if age of hi < age of lo
		  then person p := hi; hi := lo; lo := p; true
		  else false
		fi;

whereupon a row of persons could be sorted thus:

	[1000] person people; read (people);
	sort (person, people, ageswap)

[End of quote.]

Re-implementing "sort" as a Quicksort or similar is left as an exercise!
Apart from the actual sorting technique, this seems to me to be the same
as C's "qsort" library routine, but type-safe in Dan's sense.  Any
advances in the last 17 years?

	I don't think there is any theoretical objection to deleting the
"swapped" parameter to "sort", replacing the call "while swapped (...)"
by "while vec[j-1] < vec[j]", and defining

	op < = (ref person lo, ho) bool: (...)

in which case "sort" would compile for any "z" for which the operator
"<" was defined.

	C could do this if types were allowed to be parameters to
functions.

-- 
Andy Walker, Maths Dept., Nott'm Univ., UK.
anw@maths.nott.ac.uk

wg@opal.cs.tu-berlin.de (Wolfgang Grieskamp) (04/20/91)

liberte@ncsa.uiuc.edu (Daniel LaLiberte) writes:

 >I am also interested in the various ways languages that support
 >*optional* static typing limit the extent of static typing one can do,
 >and why.  First of all which languages support optional static typing?

There is an experimental language called AMBER designed by Luca
Cardelli, AT&T. (Cardelli signs also for the MODULA-3 Type System).

AMBER supports this kind of record inclusion polymorphism, i.e.
having a record r={a: Int, b: Int} then both the records r1={a: Int}
and r2={b: Int} are implicitely supertypes of r, i.e. every function
accepting r1 resp. r2 accepts also objects of type r.

AMBER is weak functional like Standard ML, i.e. it supports assignments
but updatable objects have to be typed appropriate.

The point now according to your question is that AMBER has a type
"Dynamic". Every value can be made dynamic using the expression
"dynamic e", where e is an expression. The statically known type
information belonging to e is paired with the value of e, resulting
in an entity of type Dynamic. Dynamic values can be coerced back
to a known type using the expression "coerce e to t" which 
produces a run-time error if e has not type t. There are also
possibilities to inspect the type of dynamic values: there is
a type "Type" and a function "typeOf: Dynamic -> Type" for this
purpose. Objects of type Type are descriptions of the
type information usally known to a compiler.

The introduction of dynamic types of this kind seem to cause no
extraordinary problems in a primarily statically typed environment. 
Run-time type errors are limited to the construction "coerce e to t".
Its a matter of taste to call AMBER dynamically typed since such
run-time errors may occure at all.

AMBER has been implemented for the Mac, but dont ask me where to
get this implementation and the language description. Probably there 
is a FTP side anywhere.

--
Wolfgang Grieskamp 
wg@opal.cs.tu-berlin.de tub!tubopal!wg wg%opal@DB0TUI11.BITNET

liberte@ncsa.uiuc.edu (Daniel LaLiberte) (04/20/91)

There seems to be a body of literature out there on better static
typing systems (referenced by David Chase) so I am hesitant to say
much more before studying the issue.  (That hasnt stopped alot of
people, of course :-) But let me ask a couple more questions.

In article <2400044@otter.hpl.hp.com> sfk@otter.hpl.hp.com (Steve Knight) writes:
>   My view is that the key issues for convenience are arbitrary unions and,
>   by implication, arbitrary subtyping.  

Are statically declared, arbitrary (anonymous?) supertyping (same as
unions?) and subtyping sufficient?  They would certainly be helpful in
taking care of many static typing problems, but are they always
enough?  Suppose that for some data it is *impossible* for the
compiler or programmer to know what restrictions (i.e.  data type
constraints) may be placed on the data until run time.  Languages that
support reflection would be prone to this class of problems, I
imagine.  Debuggers (that typically reflect on some other process)
seem to require dynamic typing, no?

If you dont know what constraints may be placed on some data even at
run time, then you are out of luck - you dont know what you can safely
do with the data and may risk core dumps.  Maybe the data knows what
its limits are (the object oriented approach) and maybe exceptions are
handled gracefully, but obviously something has gone wrong.

Speaking of which, is there some study of using exceptions not as
errors but as a control mechanism.  Consider that when a message is
sent to an object and its class does not understand the message, this
is a kind of exception.  But rather than stopping there, the message
is forwarded to the super class, etc.  If no class all the way up the
hierarchy understands the message, then we give up.  But why stop
there?  Perhaps the original sender would like to try again with a
different message.

One final thought concerns another kind of convenience argument
sometimes given for using dynamic typing.  Even when a type is
statically known and the language supports its specification
conveniently, is it sometimes (or always?) better to not explicitly
specify the type?  If the type of a variable changes during the course
of development, then other uses (but not necessarily all other uses)
of the type will often need to be changed to maintain consistency -
and this is extra work, especially during early development when many
things are changing.

Part of this problem could be solved by a better programming
environment that helped the programmer mutate the program
consistently.  Another tack is languages which infer types statically.
But how about an explicit (programmer specified) static typing system
in which the types are unlikely to change, except for typos?  That
assumes that once an variable's type is declared, it is most likely
the correct type.  This doesnt seem possible.  Any ideas?

Dan LaLiberte
National Center for Supercomputing Applications
liberte@ncsa.uiuc.edu

schwartz@groucho.cs.psu.edu (Scott Schwartz) (04/21/91)

anw@maths.nott.ac.uk (Dr A. N. Walker) writes:
| Re-implementing "sort" as a Quicksort or similar is left as an exercise!
| Apart from the actual sorting technique, this seems to me to be the same
| as C's "qsort" library routine, but type-safe in Dan's sense.  Any
| advances in the last 17 years?

This version of quicksort comes with the SML/NJ documentation:

% sml
- fun quick(le:'a*'a->bool) =
=   let fun sort [] = []
=	  | sort [x] = [x]
=	  | sort (a::rest) = (* the head "a" is the pivot *)
=	      let fun split(left,right,[]) = sort left @ (a::sort right)
=		    | split(left,right,x::l) =
=			if le(x,a) then split(x::left,right,l)
=				   else split(left,x::right,l)
=	      in split([],[],rest)
=	      end
=   in sort
= end;
val quick = fn : ('a * 'a -> bool) -> 'a list -> 'a list

- quick (op<) ["foo", "bar", "biff"];
val it = ["bar","biff","foo"] : string list
- quick (op>) [1,2,3,9,8,7];
val it = [9,8,7,3,2,1] : int list

The 17 year advance is type inference, I guess.

Chris.Holt@newcastle.ac.uk (Chris Holt) (04/22/91)

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:

>And, as a first sample problem to be solved by a typing system: How do
>we write a generic, type-safe sorting routine? What's the best that
>available languages can do to make this work? Remember, type-safe means
>that the sort routine will never tell you ``can't apply < to glorps'' at
>run-time, as a dynamically typed system would.

What would be nice would be something of the form

Sort S <=
    where
    S : sequence of D
    D : domain
    <= : partial_order (D x D -> boolean)
        where
        partial_order : transitive & reflexive & antisymmetric

Of course, partial_order would be part of the standard toolbox of
the programmer; it maps the domain of dyadic D-to-boolean operations
to that subset satisfying the given properties.  I will dream on...
-----------------------------------------------------------------------------
 Chris.Holt@newcastle.ac.uk      Computing Lab, U of Newcastle upon Tyne, UK
-----------------------------------------------------------------------------
 "And when they die by thousands why, he laughs like anything." G Chesterton

cs450a03@uc780.umd.edu (04/23/91)

Chris Holt    >
Dan Bernstein >>

>>And, as a first sample problem to be solved by a typing system: How do
>>we write a generic, type-safe sorting routine?

>What would be nice would be something of the form
>Sort S <=
>    where S : sequence of D
>          D : domain
>         <= : partial_order (D x D -> boolean)
>            where partial_order : transitive & reflexive & antisymmetric
>Of course, partial_order would be part of the standard toolbox of the
>programmer; it maps the domain of dyadic D-to-boolean operations to
>that subset satisfying the given properties.

If you're going to sort, why not just go with sorting by a key?  If
you have some domain which is ordered, but which the builtin <=
doesn't understand, why not just map it into some type which the
builtin recognizes? (For example build a table of integers, where each
integer corresponds to an element in the unsorted sequence.)

You would then need to get at the permutation information which "sort"
generates, but that need not be difficult.  Also, sort had better deal
with sorting strings of some sort, but that need not be difficult
either.

If the mapping from sequence of type X to sequence of builtin type is
simple enough, there is no reason that a compiler couldn't optimize it
out as a distinct step.

I don't see any reason, by the way, to apply sort to a partial order
which is not a complete order.

Raul Rockwell

new@ee.udel.edu (Darren New) (04/24/91)

In article <3111@opal.cs.tu-berlin.de> wg@opal.cs.tu-berlin.de writes:
>The point now according to your question is that AMBER has a type
>"Dynamic". Every value can be made dynamic using the expression
>"dynamic e", where e is an expression. [...]

This sounds essentially like Hermes' "polymorph" type.  It
seems to do the job for everything I can think of for dynamic
typing except dynamic binding (in the OO sense).  -- Darren

-- 
--- Darren New --- Grad Student --- CIS --- Univ. of Delaware ---
----- Network Protocols, Graphics, Programming Languages, FDTs -----
     +=+=+ My time is very valuable, but unfortunately only to me +=+=+
+=+ Nails work better than screws, when both are driven with screwdrivers +=+

gudeman@cs.arizona.edu (David Gudeman) (04/24/91)

In article  <1991Apr22.153149.8082@newcastle.ac.uk> Chris Holt writes:
]What would be nice would be something of the form
]
]Sort S <=
]    where
]    S : sequence of D
]    D : domain
]    <= : partial_order (D x D -> boolean)
]        where
]        partial_order : transitive & reflexive & antisymmetric

I know that is the standard solution, but I wonder if a better
solution wouldn't be a function $ord : D -> Int$ that maps domain
objects into arbitrary integers with the property that if $d1 < d2$
by the partial order, then $ord(d1) < ord(d2)$.  ($d1 < d2$ means that
$d1 <= d2$ and $d1 =/= d2$.)

You would lose the partial order with this function (so you might want
to implement <= anyway), but it seems that $ord$ has more general
uses.  It works for a wider variety of sorting methods, for example,
and could also be used for a hash function.
--
					David Gudeman
gudeman@cs.arizona.edu
noao!arizona!gudeman

boehm@parc.xerox.com (Hans Boehm) (04/24/91)

gudeman@cs.arizona.edu (David Gudeman) writes:

>In article  <1991Apr22.153149.8082@newcastle.ac.uk> Chris Holt writes:
>]What would be nice would be something of the form
>]
>]Sort S <=
>]    where
>]    S : sequence of D
>]    D : domain
>]    <= : partial_order (D x D -> boolean)
>]        where
>]        partial_order : transitive & reflexive & antisymmetric

>I know that is the standard solution, but I wonder if a better
>solution wouldn't be a function $ord : D -> Int$ that maps domain
>objects into arbitrary integers with the property that if $d1 < d2$
>by the partial order, then $ord(d1) < ord(d2)$.  ($d1 < d2$ means that
>$d1 <= d2$ and $d1 =/= d2$.)

There is no such function mapping character strings to integers.  Note
that "a" < "aa" < "aaa" < ...infinitely many strings... < "b".  No subset
of the integers has this property.  See any set theory book for a
discussion of transfinite ordinals.

Hans
boehm@xerox.com

cs450a03@uc780.umd.edu (04/24/91)

Hans Boehm    >
David Gudeman >>
Chris Holt [sort taking args:  unsorted list and ordering predicate]

>>I know that is the standard solution, but I wonder if a better
>>solution wouldn't be a function $ord : D -> Int$ that maps domain
>>objects into [list which requires the same permutation as D to sort]

>There is no such function mapping character strings to integers.

I agree with Mr. Gudeman that mapping into some domain for which sort
is defined is a good idea.  And, I agree with Mr. Boehm that it is
important to be able to sort strings.  I disagree with the idea of
using a predicate to specify the order.

If "sort" can order a list of sequences (in addition to a list of
atomic numbers), that should be adequate.  Then if you have some
internal type that sort doesn't know how to deal with you can provide
a sort key which maps to sequences which sort can deal with.

Why provide sort keys, rather than a predicate?

(1) building a key takes O(n) steps, rather than O(n log n) where n is
the length of the unsorted sequence.  (And often, the predicate will
have to extract keys for two arguments each time it is evaluated).

(2) I believe key extraction reduces the communication costs on highly
parallel machines.  [somebody want to verify this or disagree?]

(3) It is simple to derive a predicate from the specification of how
to build a key (if you choose to have the compiler optimize for
minimum dynamic space requirements).  It is not so simple to derive a
key from the specification of how to build a predicate, even when that
is a desireable optimization.

(4) Mapping from one domain to another is a classic (and very general)
mathematical technique.  It is synergistically good to put effort into
supporting this sort of mapping.

Er.. in other words, you don't need to complicate the implementation
of "sort" by trying to generalize the sorting process to all cases.
Just optimize the hell out of a handful of cases and let other
features of the language deal with generalizing to "user-defined"
cases.

Finally, note that the function used for mapping data -> key could be
dynamically bound if the situation calls for it (providing the same
kind of generality as supplying a predicate to sort).

Raul Rockwell

Chris.Holt@newcastle.ac.uk (Chris Holt) (04/25/91)

boehm@parc.xerox.com (Hans Boehm) writes:

>gudeman@cs.arizona.edu (David Gudeman) writes:

>>In article  <1991Apr22.153149.8082@newcastle.ac.uk> Chris Holt writes:
[about sorting using a partial order relation as a parameter]

>>I know that is the standard solution, but I wonder if a better
>>solution wouldn't be a function $ord : D -> Int$ that maps domain
>>objects into arbitrary integers with the property that if $d1 < d2$
>>by the partial order, then $ord(d1) < ord(d2)$.  ($d1 < d2$ means that
>>$d1 <= d2$ and $d1 =/= d2$.)

>There is no such function mapping character strings to integers.  Note
>that "a" < "aa" < "aaa" < ...infinitely many strings... < "b".  No subset
>of the integers has this property.  See any set theory book for a
>discussion of transfinite ordinals.

There can be a such a function for the finite strings; you use
the lengths of the strings, so
  a < b < ... < z < aa < ab < ... az < ba < bb < (rest of 2-sequences)
    < (3-sequences) < (4-sequences) etc.
Agreed the ordering given by Hans Boehm requires a dense codomain,
but again the rationals are dense, and so can handle finite and
repeating strings, and they can be mapped to the integers (though
without the obvious ordering being preserved).

A partial order can always be strengthened to a total order, of
course; but one thing I wondered as I decided to make my example
partial is, must every algorithm for partial order sorting be
at least o(n**2)?
-----------------------------------------------------------------------------
 Chris.Holt@newcastle.ac.uk      Computing Lab, U of Newcastle upon Tyne, UK
-----------------------------------------------------------------------------
 "And when they die by thousands why, he laughs like anything." G Chesterton

boehm@parc.xerox.com (Hans Boehm) (04/26/91)

Chris.Holt@newcastle.ac.uk (Chris Holt) writes:

>boehm@parc.xerox.com (Hans Boehm) writes:

>>gudeman@cs.arizona.edu (David Gudeman) writes:

>>>In article  <1991Apr22.153149.8082@newcastle.ac.uk> Chris Holt writes:
>[about sorting using a partial order relation as a parameter]

>>>I know that is the standard solution, but I wonder if a better
>>>solution wouldn't be a function $ord : D -> Int$ that maps domain
>>>objects into arbitrary integers with the property that if $d1 < d2$
>>>by the partial order, then $ord(d1) < ord(d2)$.  ($d1 < d2$ means that
>>>$d1 <= d2$ and $d1 =/= d2$.)

>>There is no such function mapping character strings to integers.  Note
>>that "a" < "aa" < "aaa" < ...infinitely many strings... < "b".  No subset
>>of the integers has this property.  See any set theory book for a
>>discussion of transfinite ordinals.

>There can be a such a function for the finite strings; you use
>the lengths of the strings, so
>  a < b < ... < z < aa < ab < ... az < ba < bb < (rest of 2-sequences)
>    < (3-sequences) < (4-sequences) etc.
>Agreed the ordering given by Hans Boehm requires a dense codomain,
>but again the rationals are dense, and so can handle finite and
>repeating strings, and they can be mapped to the integers (though
>without the obvious ordering being preserved).

I thought the original point of this was to specify an order for sorting.
Certainly the set of character strings, rationals, and integers all have
the same cardinality.  Thus I can find 1-1 mappings between them.  But to
be useful, they have to preserve the desired ordering.  For character
strings, this usually means lexicographic ordering, and there's no way
to preserve that by mapping to the integers.

I agree that taking ord: D -> Rationals would work in all common cases, but it
seems a bit expensive in practice.

Hans
boehm@xerox.com