[comp.arch] String length

GQ.RLG@forsythe.stanford.edu (Dick Guertin) (02/07/89)

In article <28200268@mcdurb>,
aglew@mcdurb.Urbana.Gould.COM writes:
->
->Abstract from: comp.arch 8281,  6 Feb 89,  56 lines.
->
->May I encourage people implementing string libraries to use an extra
->level of indirection? Instead of length immediately preceding the string,
->let length be associated with a pointer to the string. Makes
->substringing operations much easier, and has the ability to reduce
->unnecessary copies (at the risk of increased aliasing).
->
->       +------+---+
->       |length|ptr|
->       +------+---+
->                |
->         +------+
->         |
->         V
->       +---+---+---+---+---+---+---+---+---+---+---+---+---+
->       | H | E | L | L | O | , |   | W | O | R | L | D | \n|
->       +---+---+---+---+---+---+---+---+---+---+---+---+---+

Such an implementation has adverse effects when the string is sent
to/from an external device, such as a file.  The 'length' must be
with the string, or the string needs a terminator character.
Furthermore, when a 'ptr' is changed to point to a new string,
what happens to the 'length' information for the old string?

jk3k+@andrew.cmu.edu (Joe Keane) (02/08/89)

Dick Guertin writes:
> Such an implementation has adverse effects when the string is sent to/from an
> external device, such as a file.  The 'length' must be with the string, or the
> string needs a terminator character.

Look what read() and write() use: length and pointer.  The DMA operations they
do probably use the same thing.  For these operations, you can't use a
terminator character, and i don't see why you'd want the length next to the
characters.

shapiro@rb-dc1.UUCP (Mike Shapiro) (02/10/89)

In article <1944@lindy.Stanford.EDU> GQ.RLG@forsythe.stanford.edu (Dick Guertin) writes:

  <<< deleted reference >>>

>Such an implementation has adverse effects when the string is sent
>to/from an external device, such as a file.  The 'length' must be
>with the string, or the string needs a terminator character.
>Furthermore, when a 'ptr' is changed to point to a new string,
>what happens to the 'length' information for the old string?

Note that in Bell Labs language language designed for string handling, SNOBOL4,
which appeared in the 1960s (before C, UNIX, et al), string descriptors had
three fields for strings in storage:
    -- pointer to start of string in memory
    -- offset from start of string storage where string actually starts
    -- length of string

Because the language had many operations on strings, a substring was easy
to compute.  Copy the string descriptor to the substring descriptor and
then adjust the offset and length fields.

For more information of string operation implementation, see Ralph Griswold's
book on the Macro Implementation of SNOBOL4.  Or see his later work on the
Icon language.

(If desperate for material on how this relates to computer architecture, dig
up a copy of my 1972 dissertation on the architecture of a machine for string
manipulation -- a SNOBOL machine.)

                                   Michael Shapiro