[comp.software-eng] Writing upgradable data structures

daveb@geac.UUCP (David Collier-Brown) (02/21/88)

In article <2079@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes:
>Sooner than you can say "UNIX is a Trademark of ...", 47000 AD will be
>here.  The greatest mistake a designer can make is to assume that a
>certain date and time will never come.  Such short-sightedness has
>caused problems over and over again, yet we see it again and again.
>
>The right way to do it is as follows:
[ example of linked list elided]

  The idea of a infinitely-extendable date is one solution, and is
particularly applicable to things like dates, which should be
monotonically increasing. (But a library system would *freak* at
the late starting date).

  A more general solution predates Unix, and has therefore been
forgotten by many members of the Unix community: version-numbered
structures.

  Let us consider a date structure as follows:
	struct date_t {
		int	v;
		unsigned d;
	} foo = { 0, 32766};
  This describes a range of dates and times, encoded into a single
unsigned value (which should be insufficient for anything but the
execution of a single filter). Let us assume that this is the date
format of our machine.
  As soon as 47000 AD comes around (or whatever date breaks the
date/time system), the library containing the date routines is
replaced [note 1] with one which checks the version number, "v".  If
it is non-zero, the structure is trashed [note 2] and replaced by
one with the following form:
	struct date_t {
		int	v;
		unsigned long d;
	} foo = { 1, 32766};
  The new code changes the structure to type "1", and then
continues. This can happen over and over again, and allow the data
structure to grow or shrink, as necessary.

  The disadvantages are as follows:

	1) You have to make version-number checks in your library
	   routines, and fault to updating code if they fail
	2) You can't have "structure constants", because they would
	   be the wrong size (you have to stick with c initializers)
	3) You can't take sizeof(struct time_t) unless you build the
	   version stuff into the compiler (vstruct time_t, for
	   example). The size changes on you, so you have to keep a
	   dope-vector.
	4) You CAN'T use a severely type-checked language like
	   Pascal because it knows that you don't want variable
	   sized structures. (Its hard in Ada[note 3] too. Source:
	   Paul Stachour).

  This technique is used in the ARPAnet and on Multics.  Ritchie and
Thompson didn't include it in Unix and C, probably for the good
reason that it approached overkill. (:-))
  Some days, though, you have to kill something repeatedly.  Dates
are a good example.

--dave c-b
[note 1] Dynamic linking is needed: I assume we'll have it by then:
	I had it in the late '70s.
[note 2] Probably by being realloc'd. Malloc usually has a
	dope-vector available for this.
[note 3] Ada is a trademark of the Gods of War (Ada Joint Program
	Office).
-- 
 David Collier-Brown.                 {mnetor yunexus utgpu}!geac!daveb
 Geac Computers International Inc.,   |  Computer Science loses its
 350 Steelcase Road,Markham, Ontario, |  memory (if not its mind) 
 CANADA, L3R 1B3 (416) 475-0525 x3279 |  every 6 months.

sommar@enea.se (Erland Sommarskog) (02/26/88)

David Collier-Brown (daveb@geac.UUCP) presents a data structure for date
handling which gets lager and larger as time goes by and then continues
with some disadvantages with the method:
>	4) You CAN'T use a severely type-checked language like
>	   Pascal because it knows that you don't want variable
>	   sized structures. (Its hard in Ada[note 3] too. Source:
>	   Paul Stachour).

Doesn't seem very diffcult doing it in Ada. What about:
    Type Date_type is array(integer range <>) of integer;
    Type Date_pointer is access Date_type;
To begin with you allocate a date with:
    Date_p := new Date_type'(1);
When time is come you get a new date with:
    Date_p := new Date_type'(2);
and so on. I have not included a version field here, since I think we
can trust the attribute 'SIZE of Date_p.all.

Note however that this method, as well in C as in Ada, will eventually
break down. Sooner or later theree will not be memory to store the dates.
And if we assume an infinite memory, there is still a limitation of the
number of elements in the array, at least in Ada. Could this be circum-
vented in C?
-- 
Erland Sommarskog       
ENEA Data, Stockholm        
sommar@enea.UUCP           "Souvent pour s'amuser les hommes d'equipages
                            and it's like talking to a stranger" -- H&C.

daveb@geac.UUCP (David Collier-Brown) (03/08/88)

Date: Fri, 4 Mar 88 23:33 EST
From: Barry Margolin <rutgers!think!barmar>
Subject: Re: Writing upgradable data structures
To: David Collier-Brown <uiucdcs!uiucuxc!unicus!geac!daveb>
In-Reply-To: <8803011228.AA18007@geac.UUCP>
Message-Id: <19880305043329.6.BARMAR@OCCAM.THINK.COM>

 While I was in the Multics group at Honeywell we did some
 investigation of implementing a Multics followon in Ada.  I thought
 the same thing regarding versioned structures, but I was convinced by
 some more experienced Ada people that we could do this using variant
 records.  When you upgrade the structure you add a new variant.  This
 does require recompiling users of the structure, but it doesn't force
 old, stable programs to be recoded unless they want to take advantage
 of the new structure's features.
 

>from dave:
>      Would you consider posting a variant of this? The requirement that
>    one update is an important one, and should be faced by the Ada
>    community.

Here's an example, but the idea is pretty simple.  If you want to post
it to comp.software-eng (I don't think it is appropriate for
comp.lang.c, and I don't subscribe to comp.software-eng), feel free.

type THING is
     record
       VERSION: constant (THING_VERSION_1, THING_VERSION_2, THING_VERSION_3, ...);
       -- first, the components common to all versions
       PART1: <type>;
       PART2: <type>;
       case VERSION of
	    -- No THING_VERSION_1 case, because it only has the common parts
	    when THING_VERSION_2 => PART3: <type>;
				    PART4: <type>;
	    when THING_VERSION_3 => PART3: <type>;
				    PART4: <type>;
				    PART5: <type>;
				    PART6: <type>;
	    when ...
       end case;
     end record;

Unfortunately, Ada requires you to repeat the common prefixes in each
version.  One thing I thought of was to use

type THING is
     record
       VERSION: constant (THING_VERSION_1, THING_VERSION_2, THING_VERSION_3, ...);
       -- first, the components common to all versions
       PART1: <type>;
       PART2: <type>;
       case VERSION of
	    -- No THING_VERSION_1 case, because it only has the common parts
	    when THING_VERSION_2 => PART3: <type>;
				    PART4: <type>;
	    when THING_VERSION_3 => COMMON: THING (VERSION => THING_VERSION_2);
				    PART5: <type>;
				    PART6: <type>;
	    ...
	    when THING_VERSION_n => COMMON: THING (VERSION => THING_VERSION_n-1);
				    PARTm-1: <type>;
				    PARTm: <type>;
       end case;
     end record;

First of all, I don't know whether Ada allows recursive type
declarations like this; if it doesn't, the above becomes:

type THING_V1 is
     record
       PART1: <type>;
       PART2: <type>;
     end record;

type THING_V2 is
     record
       COMMON: THING_V1;
       PART3: <type>;
       PART4: <type>;
     end record;

...

type THING_Vn is
     record
       COMMON: THING_Vn-1;
       PARTm-1: <type>;
       PARTm: <type>;

type THING is
     record
       VERSION: constant (THING_VERSION_1, THING_VERSION_2, THING_VERSION_3, ...);
       case VERSION of
	    when THING_VERSION_1 => CONTENTS: THING_V1;
	    when THING_VERSION_2 => CONTENTS: THING_V2;
	    ...
	    when THING_VERSION_n => CONTENTS: THING_Vn;
       end case;
     end record;

A problem with this, though, is that I think Ada requires the program to
specify all levels of a structure in a reference to a component, so a
program that wants to access PART1 of a THING must use X.CONTENTS.PART1
if it is a version 1 THING, but X.CONTENTS.CONTENTS.CONTENTS.PART1 if it
is a version 3 THING.  PL/I permits the program to leave out structure
qualifiers if the reference is unambiguous.

I don't subscribe to comp.software-eng, so if anyone has any comments on
this, reply via mail.

--
Barry Margolin
Thinking Machines Corp.

barmar@think.com
uunet!think!barmar

daveb@geac.UUCP (David Collier-Brown) (03/10/88)

In article <2417@geac.UUCP> Barry Margolin <rutgers!think!barmar>
writes:
> While I was in the Multics group at Honeywell we did some
> investigation of implementing a Multics followon in Ada.  I thought
> the same thing regarding versioned structures, but I was convinced by
> some more experienced Ada people that we could do this using variant
> records.  When you upgrade the structure you add a new variant.  This
> does require recompiling users of the structure, but it doesn't force
> old, stable programs to be recoded unless they want to take advantage
> of the new structure's features.

   The thing that is hard in Ada[tm] was dealing with a structure
which was *later* than the one you expected. Normal Ada
scope/recompilation rules keep versioned structures in a linked
executable from getting out of date with respect to themselves (with
due care, of course), but don't deal well with a structure which is
sent to it from something external.
   Since this technique came from the ARPAnaut world, you might
guess that the data structures can be packets...

   My work-around is to make sure I pass pointers to things, and not
depend on the `size attribute (which is not exactly what I'd call
good practice... I wonder if its even ethical?).

 --dave c-b
-- 
 David Collier-Brown.                 {mnetor yunexus utgpu}!geac!daveb
 Geac Computers International Inc.,   |  Computer Science loses its
 350 Steelcase Road,Markham, Ontario, |  memory (if not its mind) 
 CANADA, L3R 1B3 (416) 475-0525 x3279 |  every 6 months.