[comp.protocols.iso] Size distribution of ASN.1 octet and bit strings

auerbach@CSL.SRI.COM (Karl Auerbach) (06/12/89)

Hi -- I'm building a new ASN.1 tool and I am trying to optimize my
buffering strategies.  I run into some problems, however, with octet
and bit strings because of they can sometimes be huge, in theory.

What I am wondering is whether any of you out there have any
practical measures of the distribution of octet and bit string sizes
that occur in real life.  (Just to make things easier, I don't mind
ignorring X.400 -- I know the strings in there are potentially
gigantic.)

				--karl--

kmont@hpindda.HP.COM (Kevin Montgomery) (06/13/89)

/ hpindda:comp.protocols.iso / auerbach@CSL.SRI.COM (Karl Auerbach) /  8:06 am  Jun 12, 1989 /
> What I am wondering is whether any of you out there have any
> practical measures of the distribution of octet and bit string sizes

nothing practical, just an okay guess  :-) ....  I believe octet
and bit string segmentation can begin occurring at 512 byte boundaries,
so if one was segmenting (or interoperating with someone that did)
one would like to avoid this costly segmentation business as much 
as possible.  So I'd place my bet on a max of 512 bytes or at least
a multiple thereof.  (how about looking at the protocols you want
to pass?)

			(no flames- it's JUST a guess!)
					kevin

adnan@sgtech.UUCP (Adnan Yaqub) (06/15/89)

In article <5560029@hpindda.HP.COM> kmont@hpindda.HP.COM (Kevin Montgomery) writes:

   / hpindda:comp.protocols.iso / auerbach@CSL.SRI.COM (Karl Auerbach) /  8:06 am  Jun 12, 1989 /
   > What I am wondering is whether any of you out there have any
   > practical measures of the distribution of octet and bit string sizes

   nothing practical, just an okay guess  :-) ....  I believe octet
   and bit string segmentation can begin occurring at 512 byte boundaries,
   so if one was segmenting (or interoperating with someone that did)
   one would like to avoid this costly segmentation business as much 
   as possible.  So I'd place my bet on a max of 512 bytes or at least
   a multiple thereof.  (how about looking at the protocols you want
   to pass?)

Actually, the use of a constructed octet or bit string can happen at
any time in ASN.1.  For instance, I can send the octet string `FOO' as
a primitive string, a constructed string containing the primitive
string `FOO', a constructed string containing three constructed
strings, one containing `F' and two containing `O', etc.

In an implementation of MMS we handled this problem as follows.  If a
routine which was expecting an octet or bit string received a
constructed string, it would invoke another routine which muddled
through the constructed string, building a primitive string which it
returned to its caller.  This prevented everyone from having to deal
with the possibilities of obscene peers sending constructed strings of
constructed strings of constructed strings of...

As for any hints of the `typical' distribution, I don't have enough
experience to comment.
--
Adnan Yaqub
Star Gate Technologies, 29300 Aurora Rd., Solon, OH, USA, +1 216 349 1860
...uunet!abvax!sgtech!adnan

Christian.Huitema@MIRSA.INRIA.FR (Christian Huitema) (07/22/89)

Both Octets and Bit Strings have two differents uses -- neither ``uncommon''.

Bit Strings will either:
* include a limited set of named configuration flags -- could be in
most cases mapped to (some of) the 32 bits of an Integer by a typical C program;
* be used to carry a bitmap: anything from a smallish icone to a full
screen, not excluding fac-simile images.

Octet Strings have much of the same split. They can either encode:
* some attribute, like name or password, which usually fits on a single line;
* a full ``file'', or ``document'', or whatever. That not only for
X.400, but also for FTAM + all document related applications.

The other problem is the split between ``primitive'' and ``structured''
encodings. As far as I can tell, you find anything, the key being the
ability of the sender to predict the size of the string; when it is
predictable (e.g. binary file), the primitive encoding is often
preferred. Several systems code text files in the structured format,
for you have to do a lot of clean up at the end of lines => the final
length is hard to predict.

Christian Huitema