[comp.lang.c] Representation of unions on capability machines

barmar@think.COM (Barry Margolin) (09/16/89)

Jeff Rosenfeld's (jdr+@andrew.cmi.edu) posting in the "effect of
free()" chain got me to thinking about how unions containing pointers
and non-pointers would be implemented on a particular class of
machine.  The class of machines I'm wondering about are capability
machines.  On many of these machines, pointers are actually
capabilities; the access rights that a process has to some data is
encoded in the capability word.  A consequence of this is that
capabilities may only be created by secure code, although they
generally can be copied by user code; if user-mode code could create a
capability, it could fill in whatever address and access rights it
wanted and thus bypass the system security mechanism.  The hardware
implementation of this is usually a tagged memory, where the tag
indicates whether the word contains a capability or data; data and
capabilities may only be read into address and data registers,
respectively, while storing a register sets the tag from the type of
register.  It might also be implemented by marking pages or segments
as being able to hold data or capabilities, and then the register type
must agree with the register type.

Without getting into debates over the reasonability of capability
architectures, I'd like to ask what the appropriate way to implement a
union like

union pi {
    char *ptr;
    unsigned long num;
} x;

is on a machine that has separate data and capability pages.  Does C
require that x.ptr and x.num actually occupy the same storage?  It
doesn't appear to require that assigning to one member actually affect
the other member, because it is invalid to reference the other member
after such an assignment (so how could you tell if it were affected?).
Does C require that &x.ptr == ((char *) *)&x.num?

Barry Margolin
Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

gwyn@smoke.BRL.MIL (Doug Gwyn) (09/17/89)

In article <29561@news.Think.COM> barmar@think.UUCP (Barry Margolin) writes:
-... I'd like to ask what the appropriate way to implement a union like
-union pi {
-    char *ptr;
-    unsigned long num;
-} x;
-is on a machine that has separate data and capability pages.

-Does C require that x.ptr and x.num actually occupy the same storage?

No.

-It doesn't appear to require that assigning to one member actually affect
-the other member, because it is invalid to reference the other member
-after such an assignment (so how could you tell if it were affected?).

Right.

-Does C require that &x.ptr == ((char *) *)&x.num?

Yes.  However, note that the constraint ("a pointer to a union object,
suitably converted, points to each of its members") can be met by the
implementation in numerous ways, since the conversion operation provides
an opportunity for the implementation to recognize that unions need
special handling.

dricejb@drilex.UUCP (Craig Jackson drilex1) (09/25/89)

In article <29561@news.Think.COM> barmar@think.UUCP (Barry Margolin) writes:
>Jeff Rosenfeld's (jdr+@andrew.cmi.edu) posting in the "effect of
>free()" chain got me to thinking about how unions containing pointers
>and non-pointers would be implemented on a particular class of
>machine.  The class of machines I'm wondering about are capability
>machines.  On many of these machines, pointers are actually
>capabilities; the access rights that a process has to some data is
>encoded in the capability word.  A consequence of this is that
>capabilities may only be created by secure code, although they
>generally can be copied by user code; if user-mode code could create a
>capability, it could fill in whatever address and access rights it
>wanted and thus bypass the system security mechanism.  ...

[More information about capability machines.]

>Without getting into debates over the reasonability of capability
>architectures, I'd like to ask what the appropriate way to implement a
>union like
>
>union pi {
>    char *ptr;
>    unsigned long num;
>} x;
>
>is on a machine that has separate data and capability pages.  Does C
>require that x.ptr and x.num actually occupy the same storage?  It
>doesn't appear to require that assigning to one member actually affect
>the other member, because it is invalid to reference the other member
>after such an assignment (so how could you tell if it were affected?).
>Does C require that &x.ptr == ((char *) *)&x.num?
>
>Barry Margolin
>Thinking Machines Corp.
>barmar@think.com

Although they are not particularly designed as such, the Unisys A-Series
machines (descendents of the B6700) meet the description above.  That is,
security information is coded into the hardware pointer type, which can
only be constructed by secure hardware/software(operating system).

The usage shown may not be Standard C (I don't have a draft copy nearby to
check) but it is certainly conventional C.  Therefore, I believe that
the only way that "pointers" on such machines can be implemented securely
is through simulation.  That is, C pointers must be implemented as integer
subscripts into a large array which is defined through the standard
capability mechanism.  Due to the fungibility of C pointers, and conventional
C coding practices, it will probably work best if all the pointers point
into the same large array, although the Standard specifically doesn't
require this (I believe).

In any case, the new A-Series C compiler does implement pointers this way,
somewhat to the detriment of performance.  Not only are all malloc'ed items
placed in such a heap array, but all arrays which are pointed to, plus any
other automatics (including parameters) which are pointed to, are placed
in the heap array.  This requires management of a software-controlled stack
area in the heap array, inconjunction with the regular hardware-managed stack.

One consequence of this is that array which are never pointed to are more
efficient, because they save one level of indirection.  Admittedly, such arrays
are rare in C, but I'm glad Unisys went to the trouble of implementing
this optimization.

I hope to write more about the A-Series C compiler in the future.  It is
not based on any existing C implementations, and was written from scratch
as an ANSI C compiler for an unusual architecture.  As such, it has many
constructive lessons for those who believe that "All-the-world's-a-Vax/Sun".
-- 
Craig Jackson
dricejb@drilex.dri.mgh.com
{bbn,ll-xn,axiom,redsox,atexnet,ka3ovk}!drilex!{dricej,dricejb}