[comp.lang.c] Variable-length messages.

ljz%fxgrp.fx.com@ames.arc.nasa.gov (Lloyd Zusman) (10/21/88)

We have a difference of opinion at our site as to what is the most
desirable way to handle variable-length messages in C.  This is
basically a difference in philosophies.  I'm showing you folks on the
net the two opposing approaches in question here, and I would like to
find out which of these each of you prefers and why.

First of all, the problem:

I am writing a library of routines to handle message passing.  These
messages consist of, among other things, a message type, a length, and
a message body.  The message body can be of any length, depending on
the message type.  The users of this library can define their own
message types, and hence the message-passing routines cannot determine
the message length from the context of the message.

One group of us here says it's OK to handle this case as follows:

	struct message {
		int msgType;		/* msg type code */
		int msgLength;		/* length of msg body */
		char msgBody[1];	/* variable length msg body */
	};


	/* routine for sending a message */
	int
	sendMessage(fd, msg)
	int fd;				/* file descriptor of message socket */
	struct message *msg;		/* pointer to message */
	{}
		Send the message pointed to by 'msg' through the socket
		associated with file descriptor 'fd'.  The length is
		contained in the message structure.

	/* routine for receiving a message */
	int
	receiveMessage(fd, msg, length)
	int fd;				/* file descriptor of message socket */
	struct message *msg;		/* pointer to message */
	int length;			/* length of message buffer */
	{}
		Receive a message into the message buffer pointed to by
		'msg'.  The 'length' parameter is the size of this buffer
		and it must be large enough to hold the largest message that
		might be received.  The message comes through the socket
		associated with the file descriptor 'fd'.

Another group says that since the 'msgBody[1]' field really isn't one
byte long, its use is misleading and would confuse programmers and
debugging software, not to mention the fact that they feel it isn't
"pure".  Since C doesn't support this sort of variable-length
structure as part of the language, these people say we cannot do what
is outlined above, but must define our message-passing constructs as
follows:

	struct message_header {
		int msgType;			/* msg type code */
		int msgLength;			/* length of msg body */
	};


	/* routine for sending a message */
	int
	sendMessage(fd, header, body)
	int fd;				/* file descriptor of message socket */
	struct message_header *header;	/* pointer to message header */
	char *body;			/* pointer to message body */
	{}
		Send the message whose header is pointed to by 'header'
		and whose body is pointed to by 'body' through the socket
		associated with file descriptor 'fd'.  The length is
		contained in the message structure.

	/* routine for receiving a message */
	int
	receiveMessage(fd, header, body, length)
	int fd;				/* file descriptor of message socket */
	struct message_header *header;	/* pointer to message header */
	char *body;			/* pointer to message body */
	int length;			/* length of message body buffer */
	{}
		Receive a message into the message header pointed to by
		'header' and the message body pointed to by 'body'.  The
		'length' parameter is the size of the message body buffer
		and it must be large enough to hold the largest message body
		that might be received.  The message comes through the socket
		associated with the file descriptor 'fd'.

I realize that both constructs could be implemented.  I also realize that
the above examples are oversimplifications for the purpose of illustrating
these two opposing philosophies.  All I am interested in here is to find
out the preferences of the C gurus out there in netland as to which one
of the two variations they prefer.  Is there an "accepted practice" in this
area?  Is it really a no-no to use the 'msgBody[1]' construct as defined
above?

I assume this would be of general interest and could generate a
debate, so please post replies to the net.

Sincerely,

--
  Lloyd Zusman                  Internet:  ljz@fx.com
  Master Byte Software                  or ljz%fx.com@ames.arc.nasa.gov
  Los Gatos, California                 or fxgrp!ljz@ames.arc.nasa.gov
  "We take things well in hand."    uucp:  ...!ames!fxgrp!ljz
  [ our Internet connection is down: use uucp or mail to the entry above it ]

jagardner@watmath.waterloo.edu (Jim Gardner) (10/21/88)

From Waterloo Port comes another way to do it:

#define sendMessage(id,msg)  _sendMessage(id, &(msg), sizeof(msg))
#define receiveMessage(id,msg)	_receiveMessage(id, &(msg), sizeof(msg))

where msg is a struct. If you have a lot of data to pass, you pass a
buffer address and call another routine (Transfer_to or Transfer_from
in Port) to move the data between processes.

peter@ficc.uu.net (Peter da Silva) (10/21/88)

For what it's worth, the Amiga message routines are declared more or less
like so:

	struct message_header { ... };

	struct message {
		struct message_header header;
		body....;
	}

	send_message(message, port)
	struct message_header *message;
	struct message_port *port;
	{
	}

	struct message_header *recv_message(port)
	struct message_port *port;
	{
	}

And used more or less like:

	send_message((struct message_header *)message, port);

	message = (struct message *)recv_message(port);

This is a substantial multitasking operating system, with 'C' as the
effective systems programming language.

Disclaimer: names of functions and argument order have been changed to
	protect the guilty, and to simplify things for this forum.
-- 
Peter da Silva  `-_-'  Ferranti International Controls Corporation
"Have you hugged  U  your wolf today?"     uunet.uu.net!ficc!peter
Disclaimer: My typos are my own damn business.   peter@ficc.uu.net

rkl1@hound.UUCP (K.LAUX) (10/22/88)

	I definitely would go with version 2 because in version 1, you most
certainly are being misleading.  I know both versions will work, but, in the
long run, don't get (pardon the phrasing), but don't get 'cute' with your
code.

	It is well known that C programs can be written as to be totally
illegible (remember the 'Most Obfuscated C Program Contest'?).  That isn't
the point of writing the code.  If, for example, you have a program that
is very large and complex, then, 6 months (years?) from now when it needs
to be upgraded, modified, fixed, whatever, even the orginal programmer can
be mightily confused as to exactly what was going on then and why (i.e. what
was the original intent).

	As a consultant, I spend a lot of time cleaning up code that was
written ala version 1. (If you want to keep me in business, DO write it the
first way 8-). )  Even though there may be a tendency of 'Aw, we can get
away with it', don't.

	Remember, the code should be flexible and easily maintainable by
*those who didn't have anything to do with originally writing it in the
first place*.  There, please write your code in a straight-forward manner,
without throwing up an obstacles to clear understanding as to how the code
works

--rkl

djones@megatest.UUCP (Dave Jones) (10/22/88)

From article <LJZ.88Oct20131114%fxgrp.fx.com@ames.arc.nasa.gov>, by ljz%fxgrp.fx.com@ames.arc.nasa.gov (Lloyd Zusman):
> We have a difference of opinion at our site as to what is the most
> desirable way to handle variable-length messages in C.

...

> One group of us here says it's OK to handle this case as follows:
> 
> 	struct message {
> 		int msgType;		/* msg type code */
> 		int msgLength;		/* length of msg body */
> 		char msgBody[1];	/* variable length msg body */
> 	};
> 

This looks unsafe to me.  That one-char-long "msgBody" can be
packed into the structure at any of a number of different places,
depending on the compiler's alignment strategy.

The "msgBody" is not a one-char-long arrary. In general, it's not nice
to fool Mother Nature.




One thing you might consider is using Sun's public domain "XDR".
I think you can get it from anonymous ftp somewhere around the net.
It's got some defects, but it *is* already written, and it *is*
free.  Check it out.

pardo@june.cs.washington.edu (David Keppel) (10/22/88)

>[ want var-size structures:  struct foo { int siz; char c[1] }; ]

To make the code a little clearer, have a global #define:

	/*
	 * When this is used, it means that the structure it appears
	 * in can be allocated by malloc() as an arbitrary-size
	 * structure.
	 */
	#define VARSIZE 1

then you can do:

	struct foo {
		unsigned size;
		your_type storage[VARSIZE];
	}

Mallocing them is still a trifle wierd:

	thing = malloc (sizeof(unsigned) + n * sizeof(your_type));

because the structure could have holes in it.  (Yecch.)  Slightly
more portable:

	thing = malloc (sizeof(struct foo) - sizeof(unsigned)
			    + (n-1)*sizeof(your_type));

I'm not sure if this is portable.  I appeal to the net gods for
further light.  Anybody?

	;-D on  ( My ignorance knows no bounds-checking )  Pardo
-- 
		    pardo@cs.washington.edu
    {rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo

gwyn@smoke.BRL.MIL (Doug Gwyn ) (10/23/88)

In article <LJZ.88Oct20131114%fxgrp.fx.com@ames.arc.nasa.gov> ljz%fxgrp.fx.com@ames.arc.nasa.gov (Lloyd Zusman) writes:
>Is it really a no-no to use the 'msgBody[1]' construct as defined above?

Well, it's like this.  It is not a practical problem on an architecture
that provides a single, uniform, flat data address space.  However, on
other architectures it could fail to work, because the compiler is
entitled to assume that there is no data to be referenced beyond the
bounds of the structure and it could therefore generate code that will
fail when such an access is attempted.  I won't bore you with details.

Most programmers would probably make that structure member a pointer to
malloc()ed storage, which introduces slightly more overhead but has the
virtue of being portable.

henry@utzoo.uucp (Henry Spencer) (10/23/88)

In article <919@goofy.megatest.UUCP> djones@megatest.UUCP (Dave Jones) writes:
>> 	struct message {
>> 		int msgType;		/* msg type code */
>> 		int msgLength;		/* length of msg body */
>> 		char msgBody[1];	/* variable length msg body */
>> 	};
>
>This looks unsafe to me.  That one-char-long "msgBody" can be
>packed into the structure at any of a number of different places,
>depending on the compiler's alignment strategy.

No:  X3J11 specifically states that struct member addresses are in ascending
order, and enough existing code assumes this that there should be relatively
few compilers that violate it.  (Of course, this is little comfort if you
are constrained to use such a compiler...)
-- 
The meek can have the Earth;    |    Henry Spencer at U of Toronto Zoology
the rest of us have other plans.|uunet!attcan!utzoo!henry henry@zoo.toronto.edu

djones@megatest.UUCP (Dave Jones) (10/23/88)

#define SLM 1  /* silly little macro */

Shall we define a message packet as follows?

struct msg
  { int size;
    enum msg_type type;
    char contents[SLM];
  };

Or with a header?

struct hdr
  { int size;
    enum msg_type type;
  };

And then we package the contents separately?

It is quite likely that I would choose neither. It depends
partly on whether the solution is to be tuned for one program,
and thus can have "direct knowledge" of the data-types being
transmitted, or whether it is to be a general "library" solution.
Is it supposed to transfer messages between machines of different 
types or between programs compiled with different compilers? 
(Apparently not.)  Approximately how many different message types 
will there be?  Will they differ significantly in size?  Will they all
have a fixed-size format?  Etc..

But we still have the question as to whether or not the first declaration
is even feasible.  So, for now, let's go along with the gag, and look 
at the first declaration.

How does one malloc such a thing, and how do you write bytes into
it?  What you've got to watch out for is alignment restrictions
and conventions, and "holes" in structures.  Some of the "solutions"
which have been posted so far, and which purport to solve the
problem don't.  To begin with, the compiler may not align that 
one-char-array on a boundary suitable for just any kind of data.
So lets back off and try again. 

/* The following is supposed to have the most general
** alignment of all types.
*/
typedef union
 { void* ptr; char c; short s; int i; long l;
   float f; double d;
 }max_align;

struct msg
  { int size;
    enum msg_type type;
    max_align contents[1];
  };

Now the "contents" field should be aligned on a boundary suitable
for any data-type. (Did I miss something?)

Now, I think the following will always work; But remember, I
didn't recommend this approach, anyway.

  extern int errno;
  extern int pipe_out;

  int
  send(size,  type, contents)
     int size;
     void* contents;
     enum msg_type type;
  {
      int packet_size = 

            /* header overhead (including any padding or "holes") */
            sizeof(struct msg) - sizeof(max_align) 
      
            /* and add to that... */
            + size;

      struct msg* msg;

      if((msg = (struct msg*)malloc(packet_size)) == 0)
         return errno;

      /* Fill out the header. */
      msg->size = size;
      msg->type = type;

      /* Copy the message. */
      bcopy(contents, (void*)(msg->contents), size);

      { int written = write(pipe_out, (void*)msg, packet_size);
        free(msg);
	return (written == packet_size)?0: errno;
      }
  }

   

dg@lakart.UUCP (David Goodenough) (10/24/88)

Lloyd sez:
> We have a difference of opinion at our site as to what is the most
> desirable way to handle variable-length messages in C.  This is
> basically a difference in philosophies.  I'm showing you folks on the
> net the two opposing approaches in question here, and I would like to
> find out which of these each of you prefers and why.
> 
> One group of us here says it's OK to handle this case as follows:
> 
> 	struct message {
> 		int msgType;		/* msg type code */
> 		int msgLength;		/* length of msg body */
> 		char msgBody[1];	/* variable length msg body */
> 	};
> 
> Another group says that since the 'msgBody[1]' field really isn't one
> byte long, its use is misleading and would confuse programmers and
> debugging software, not to mention the fact that they feel it isn't
> "pure". .....

As an alternative to the above, how useable would the following be:

 	struct message {
 		int msgType;		/* msg type code */
 		int msgLength;		/* length of msg body */
 		char *msgBody;		/* variable length msg body */
 	};

Advantages:

	Is portable, and doesn't fry the mind of dbx and friends.

Disadvantages:

	Requires an extra malloc(3) call every time you want to do anything.
-- 
	dg@lakart.UUCP - David Goodenough		+---+
							| +-+-+
	....... !harvard!xait!lakart!dg			+-+-+ |
AKA:	dg%lakart.uucp@harvard.harvard.edu	  	  +---+

djones@megatest.UUCP (Dave Jones) (10/25/88)

From article <1988Oct22.231317.19640@utzoo.uucp), by henry@utzoo.uucp (Henry Spencer):
) In article <919@goofy.megatest.UUCP) djones@megatest.UUCP (Dave Jones) writes:
))) 	struct message {
))) 		int msgType;		/* msg type code */
))) 		int msgLength;		/* length of msg body */
))) 		char msgBody[1];	/* variable length msg body */
))) 	};
))
))This looks unsafe to me.  That one-char-long "msgBody" can be
))packed into the structure at any of a number of different places,
))depending on the compiler's alignment strategy.
) 
) No:  X3J11 specifically states that struct member addresses are in 
) ascending order ...

Is it not possible that one compiler might put msgBody on an
even address and another one put it on an odd address, and another
one put it on a double-word boundary?

jones@ingr.UUCP (Mark Jones) (10/27/88)

[stuff deleted]

> As an alternative to the above, how useable would the following be:
> 
>  	struct message {
>  		int msgType;		/* msg type code */
>  		int msgLength;		/* length of msg body */
>  		char *msgBody;		/* variable length msg body */
>  	};
> 
> Advantages:
> 
> 	Is portable, and doesn't fry the mind of dbx and friends.
	You can create an array of these things.
	Lint won't gripe.
	The compiler won't gripe.

> Disadvantages:
> 
> 	Requires an extra malloc(3) call every time you want to do anything.

Not necessarily, you could malloc a big chunk of memory, fgets into it,
saving pointers to the start of each message.  This is a good thing if
you are on a machine of limited memory.  Allocate an array of message
structures, and allocate a chunk for all the text.  MSDOS allocates
memory with a 16 byte minimum, this scheme can save a lot of memory.

Mark Jones

henry@utzoo.uucp (Henry Spencer) (10/27/88)

In article <926@goofy.megatest.UUCP> djones@megatest.UUCP (Dave Jones) writes:
>Is it not possible that one compiler might put msgBody on an
>even address and another one put it on an odd address, and another
>one put it on a double-word boundary?

Yes, but that won't foul up the example proposed, provided that the code
is aware of the possibility of holes in the struct.  The key point is that
msgBody is guaranteed to be at the end of the struct (modulo any trailing
padding).
-- 
The dream *IS* alive...         |    Henry Spencer at U of Toronto Zoology
but not at NASA.                |uunet!attcan!utzoo!henry henry@zoo.toronto.edu

smryan@garth.UUCP (Steven Ryan) (11/01/88)

>The dream *IS* alive...         |    Henry Spencer at U of Toronto Zoology
>but not at NASA.                |uunet!attcan!utzoo!henry henry@zoo.toronto.edu

Obviously the Canadian technique, never to dream at all, is superior.

I don't know if Henry is an American working in Canada or a Canadian working in
Canada but it certainly looks like a Canadian poking at US internal affairs.

Americans do make nasty jokes about Canada, but I've never seen anybody
use this forum to make them.

cramer@optilink.UUCP (Clayton Cramer) (11/03/88)

In article <1695@garth.UUCP>, smryan@garth.UUCP (Steven Ryan) writes:
> >The dream *IS* alive...         |    Henry Spencer at U of Toronto Zoology
> >but not at NASA.                |uunet!attcan!utzoo!henry henry@zoo.toronto.edu
> 
> Obviously the Canadian technique, never to dream at all, is superior.
> 
> I don't know if Henry is an American working in Canada or a Canadian working in
> Canada but it certainly looks like a Canadian poking at US internal affairs.
> 
> Americans do make nasty jokes about Canada, but I've never seen anybody
> use this forum to make them.

You haven't been reading this forum for very long.

Henry's objections are to NASA, not necessarily the U.S.  (Henry's other
comments on USENET demonstrate strong Americophile tendencies, so I 
don't mistake his criticisms of NASA for the sort of blatant anti-
American nonsense that sometimes crosses the border.  I do get irritated 
by the bumper sticker level of depth of those signature quotes, though).

-- 
Clayton E. Cramer
..!ames!pyramid!kontron!optilin!cramer