[comp.lang.misc] What assertions are needed to optimize high-level data structures?

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (02/03/91)

C supports several elementary data structuring mechanisms: pointers,
structs, unions, arrays. It's possible to implement any of the usual
higher-level mechanisms---lists, generalized Lists, sets, relations,
whatever---in terms of the lower-level mechanisms. But you shouldn't
forget that you started with the higher-level mechanism, because you
can often optimize computation or register use more effectively with
that knowledge than if you simply regarded the program as a big pile
of pointer, struct, and array manipulations.

So: What kinds of assertions should the compiler understand in order
to effectively optimize higher-level data structures expressed in C?
And what particular assertions are useful for specific operations on 
specific data structures?

For example, in doubly-linked list insertion it's useful to tell the
compiler that a node is not the same as either of its neighbors. For
this it's enough to have invariants like ``p != p->next''.

Note that I am asking these questions as a language designer: I want
constructive suggestions that I can use in Q, rather than complaints
about how difficult it is to handle assertions like this in C or Ada
or whatever language.

I'm a little worried about bringing this issue up after the last set
of pointer-versus-array flame wars, but I figured there was at least
a hope of some useful technical discussion that time, and it's worth
trying again. I hope that we can avoid Jim Giles's confusion between
syntax and semantics by asking purely semantic questions like ``What
assertions help optimization?''

One last meta-comment: Jim Giles implied in the last discussion that
a language that ``understands'' some high-level data structures will
be more efficient than C plus assertions. I find this position quite
silly, as all optimizations must eventually be expressed in terms of
machine language, which has only pointers and numeric types. I can't
believe that there are any optimizations for machine-language target
code which can't be expressed just as effectively as assertions in a
lower-level language. But I also realize that arguments about issues
like this are pointless. If I'm right, then I don't have to prove it
with some theoretical argument; all that matters is that every piece
of code Jim writes in his high-level language, the rest of the world
can write just as efficiently in an easy-to-use, low-level language.
And that's why I'm restarting this discussion.

---Dan