[comp.databases] Self-describing host-independent data structures

william@kaula.keck.hawaii.edu (William Lupton) (06/18/91)

I am posting this to "comp.databases" because at first sight it appears to be
the most appropriate group. However, I have not subscribed to it in the past
(and am not a database expert) so if it is the wrong group, please let me know,
and suggest better groups.

I am involved in a project to produce a data acquisition software environment
for use at various telescopes around the world (please note that this project
is NOT associated with the Keck telescopes, for which I currently work). This
environment supports multiple processors of differing types and a central
aspect of it is a message system which allows a process on one machine to send
a message to a process on the same or another machine.

We have decided that we want the messages to be self-describing rather than to
require both ends of the communications link to have to "know" the structure of
each message type (the major motivation behind this is to reduce coupling
between system components, which may be being produced by separate groups).
Accordingly, we propose a "low-level data system" which will be able to take
structured data and encode it as a byte stream which will include all necessary
information about the structure of the data, the names and types of its
components, and details of necessary network type conversions (eg, byte order
and floating point representation). The actual message will contain this byte
stream and the recipient will use low-level data system routines to find out
what is in it and to retrieve its contents.

So to the question: surely lots of people have already solved this problem? If
you have done so or can provide a pointer to further information, please mail
me and let me know. I will summarize findings and post them. Note that
something like the Sun XDR (eXternal Data Representation) approach used in RPCs
(Remote Procedure Calls) is NOT what we want, since that does not satisfy our
"self-defining" requirement.

		William Lupton (wlupton@keck.hawaii.edu)

cortesi@informix.com (David Cortesi) (06/18/91)

In article <13503@uhccux.uhcc.Hawaii.Edu> william@kaula.keck.hawaii.edu writes:
>I am involved in a project to produce a data acquisition software environment
>for use [with] ... multiple processors of differing types and a central
>aspect of it is a message system which allows a process on one machine to send
>a message to a process on the same or another machine.
>
>We have decided that we want the messages to be self-describing...
>e... (the major motivation behind this is to reduce coupling
>between system components, which may be being produced by separate groups).

>So to the question: surely lots of people have already solved this problem?

This exact problem has been faced by the ISO in defining the telecomm
standard for Open Systems Integration, ISO-OSI.  The specific standards
you want to read are ISO-8824, Abstract Syntax Notation 1 (ASN.1) which
covers an abstract notation for data, and ISO-8825, Specification of
Basic Encoding Rules for ASN.1, which covers how the abstract data is
to be represented in a byte stream.

cant@mrmarx.UUCP (Jim Cant) (06/19/91)

>We have decided that we want the messages to be self-describing rather than to
>require both ends of the communications link to have to "know" the structure of
>each message type (the major motivation behind this is to reduce coupling
>between system components, which may be being produced by separate groups).
>Accordingly, we propose a "low-level data system" which will be able to take
>structured data and encode it as a byte stream which will include all necessary
>information about the structure of the data, the names and types of its
>		William Lupton (wlupton@keck.hawaii.edu)

One type of 'self-describing' data that seems to be pretty flexible is the
TIFF (Tagged Image File Format) used by graphics types to move bitmapped
images around.  It appears you could put most anything in it and don't
need much smarts to read it.   The format is basically a header, then a 
table of contents to tags; each tag contains a data type, length, pointer
to actual data, etc.  You can define your own tags to handle your own
(derived) data types (like structs);  of course, the reader will have to
know about the derived data type (unless you pass the definition in another
tag??).

Might TIFF be readily adaptable to wlupton's problem?

Jim Cant, cant@mrmarx.msc.com   I don't care what the company thinks
Mainstream Software Corp.	about what I think or say nearly as much
411 Waverly Oaks Road		I care about how my boy responds to same.
Waltham MA 02154, 617-894-3399