[comp.std.c] Typedef names vs. new types.

dwp@k.gp.cs.cmu.edu (Doug Philips) (10/20/88)

Regarding the recent discussion of typedef names and their intent and
use:  

I have found that it can be very useful to use typedef to create a
conceptual entity (this is probably an abuse of the mechanism).  The
problem with this (as mentioned before) is that the compiler doesn't
really consider this a new type.  Even lint is, at first glance, no
help.  However, you can somewhat easily get lint to help you out.  I'd
be interested in finding out what other solutions along these lines
people use.  My solution is as follows:

For convenience, I define all my new typedef names in one place.  The
rationale is that they often are used to indicate quantities and
ranges which are 'configurable' depending on what you want the range
of the type to be.  This is not a requirement of my solution however.

Consider a two dimensional graph where the new typedef'ed names are
arranged along the X axis.  The Y axis is the "size" of the type.
By plotting the points associated with each type/size pair (size is
determined by the derivee type), a "type terrain" is visible.  What
lint can do is tell you, in certain cases, where usage involves
elements that are at different altitudes.  Particularly useful to me
with a 4.2BSD lint is the ability to detect mixture of longs with
smaller types.  The initial problem is that it is the altitude of the
types that makes lint/cc unhappy with their intermixture.  What I do
is to walk the type terrain, and modify one typedef at a time.  If the
type is already a long, I make it an int and if it's not a long, I
make it one.  After each and every transformation I run lint to catch
the cases where I'm assigning with mixed types.  When I move onto the
next type in the terrain I restore the previous type to its original
value.

The obvious next step is to come up with some way of automating the
process so that I'll do it more frequently.

		-Doug

-- 
Doug Philips                  "All that is gold does not glitter,
Carnegie-Mellon University       Not all those who wander are lost..."
dwp@cs.cmu.edu                     -J.R.R. Tolkien

karl@haddock.ima.isc.com (Karl Heuer) (10/25/88)

In article <3355@pt.cs.cmu.edu> dwp@k.gp.cs.cmu.edu (Doug Philips) writes:
>I have found that it can be very useful to use typedef to create a conceptual
>entity (this is probably an abuse of the mechanism).  The problem with this
>(as mentioned before) is that the compiler doesn't really consider this a new
>type.  Even lint is, at first glance, no help. ...

This issue comes up every now and then, and each time it does, I threaten to
post my ideas on the subject, but I keep holding off because they're not
finished.  Well, some feedback from the net might make it easier, so I'm
posting what I've got.

disclaim {
    The ideas described below are incomplete.  I already know that.
    Criticism is okay, but let's keep it constructive.

    Don't followup to tell me how to do it in C++.  I already know that,
    too, and I don't think it's what I'm looking for.  There ought to be a
    way to do this without having to define a new class and operator set
    for each conceptually different type, and for each combination of them
    that I might want.

    I agree that the syntax used below is poor.  Once we have the
    semantics nailed down, we can pretty it up.

    This probably isn't the right newsgroup (I'm certainly not proposing
    that this change for ANSI C), but this is where the most recent
    discussion has been taking place.
}

Many people have noted that typedef only creates a synonym for an existing
type, so that after creating types `foo' and `bar' (both typedef'd to int) it
is still legal to write `f1 = b1'.  Let's try to design a consistent language
that would detect this, yet still be useful.

First question: is `f1 = 1' legal?  How about `f1 + i' (where `i' is int)?
Someone suggested (last time this topic came up) that the new types should be
intermixable with the original, but not with each other.  Presumably this
means that the int is converted to a foo, and that this is the type of the
result.

I don't agree that this would be useful behavior, at least for my programs.
It would produce two incorrect warnings for `dist = vel * time', and silently
allow the (dimensionally incorrect) `time = i / time'.  Why bother to apply a
half-solution to the special case of mixed assignment if it isn't going to
behave properly for arithmetic?

A related problem is that of enums.  ANSI C makes them ints in disguise,
because nothing else would do The Right Thing.  Clearly one would like to be
able to distinguish enum from int, but it should also be possible to use an
enum as the subscript of an array.  (In fact, it should *not* be legal to use
a plain int if the array is expecting to be subscripted by the enum!)

Currently, a C expression has a type and a value.  Let us postulate a stricter
language, called CDA, in which an expression has a type, a value, and a
dimension.  The dimension of all existing C arithmetic types is "Scalar".
Each pointer type has its own additive dimension (see below for definition).

(I envision this as "a new language" only in the same sense that the language
acceptable to lint is different from that of cc.  In fact, I would imagine a
lint-like processor which recognizes a new set of pragmas embedded in what cc
would interpret as comments or as macros with empty expansion.  In the
examples below, assume `#define _CDA(ignore)' and you'll get the normal C
syntax; the macro's argument is visible only to the CDA program.)

Let's enhance the syntax of declarations to include a dimension-specifier
(defaulting to Scalar) in addition to a type-specifier (defaulting, as usual,
to int).

There are two types of nonscalar dimensions: Additive and Multiplicative.  An
additive dimension A satisfies the rule { Scalar + A --> A } and the other two
rules implied by this (viz. { A + Scalar --> A } and { A - A --> Scalar }),
whereas a multiplicative dimension M satisfies { Scalar * M --> M } (etc.),
and also { M + M --> M }.

Example 1: Define types "length_t" and "area_t" with dimensional information.
  typedef _CDA(new_m_dim()) double length_t;
  typedef _CDA(length_t * length_t) double area_t;
  length_t x, y, z;
  area_t   surface = 2*(x*y + y*z + x*z);    /* ok */
  foo(x*y*z);
The expression x*y*z is acceptable and has dimensions of "cubic length_t" even
though no type has been declared yet to have such dimensions.  When "foo" is
defined it must expect an argument of such a type:
  void foo(_CDA(length_t * length_t * length_t) double v) { ... }
Note that it is not necessary to define a "volume_t".  (Neither was "area_t"
necessary; "_CDA(length_t * length_t) double" would have been equivalent.)

Example 2: Temperature conversions.
  typedef _CDA(new_m_dim()) double Kelvin;
  typedef _CDA(new_m_dim()) double Rankine;
  typedef _CDA(new_a_dim() * Kelvin) double Centigrade;
  typedef _CDA(new_a_dim() * Rankine) double Fahrenheit;
  #define MULTIPLIER ((_CDA(Rankine / Kelvin) double)1.8)
  #define ADDER ((_CDA(Fahrenheit-Centigrade*Rankine/Kelvin) double)32)
  Fahrenheit ctof(Centigrade c) { return (c * MULTIPLIER) + ADDER; }

Example 3: Make an array to be subscripted by enum.
  typedef _CDA(new_a_dim()) enum { RED, BLUE, GREEN } color;
  #define NCOLORS 3
  typedef long  array_by_int[NCOLORS];
  typedef _CDA(-color+) array_by_int array_by_color;
  typedef long *pointer_into_array_by_int;
  typedef _CDA(-color+) pointer_into_array_by_int pointer_into_array_by_color;
  array_by_color a;
  pointer_into_array_by_color p;
  long *q;
  p = &a[0];      /* ok; same as p = a */
  q = &a[RED];    /* ok; same value as p, but not identical */
  p[BLUE] = *q++; /* ok; p may be subscripted by a color, q by a scalar */
  p += BLUE-RED;  /* ok; BLUE-RED is a scalar int */
  *p = 0L;        /* illegal: p does not have pointer dimensions */
  q[GREEN] = 0L;  /* illegal: q+GREEN does not have pointer dimensions */
  a[0] = 0L;      /* illegal: a+0 does not have pointer dimensions */
  if (q-p == RED) /* ok */

That's as far as I've gotten...

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
Reread the disclaimers before you followup.

gwyn@smoke.BRL.MIL (Doug Gwyn ) (10/26/88)

In article <10006@haddock.ima.isc.com> karl@haddock.ima.isc.com (Karl Heuer) writes:
>Let's enhance the syntax of declarations to include a dimension-specifier...

Unfortunately this is too simplistic.  For example, there are several
ways to combine the Cartesian-projected elements of a tensor to produce
a quantity having different dimension, but the resulting dimension is
not ascertainable from a syntactic analysis of the combination.  Also,
units that people usually consider different turn out to be the same
when one has a sufficiently powerful physical theory.

Thus it is better to provide definitional facilities along the lines
of classes, whereby one can specify the allowed operations and the
properties of the results.  In general structured types are necessary
to encode all the possibly relevant dimensional attributes of a quantity.

karl@haddock.ima.isc.com (Karl Heuer) (10/29/88)

In article <8755@smoke.BRL.MIL> gwyn@smoke.BRL.MIL (Doug Gwyn) writes:
>In article <10006@haddock.ima.isc.com> karl@haddock (Karl Heuer) writes:
>>Let's enhance the syntax of declarations to include a dimension-specifier...
>
>Unfortunately this is too simplistic.  For example, there are several
>ways to combine the Cartesian-projected elements of a tensor to produce
>a quantity having different dimension, but the resulting dimension is
>not ascertainable from a syntactic analysis of the combination.  Also,
>units that people usually consider different turn out to be the same
>when one has a sufficiently powerful physical theory.

I think my proposal could handle the latter situation; if you want to freely
mix (say) distance and time, you can declare them to be the same unit, while
if you prefer to treat them as independent, you can explicitly multiply by
speed_of_light (which would have value 1, so the compiler can optimize it out,
but dimensions of velocity, so CDA would sanction it).

If the tensor problem can't be resolved at compile-time, then of course CDA
can't do anything about it.  (But it can still pass it through with no
warnings; CDA includes C as a trivial case.)  I could live with that; it's no
worse than what we have now.  Similarly, I described the language as being
able to handle dimensions that were a composite of additive and multiplicative
units, but aside from the temperature-conversion example I can't think of any
applications for it, so I'd probably drop that feature (provided it's still
possible to get array-by-enum working).

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint