[comp.lang.c] Stretchy arrays

djones@megatest.UUCP (Dave Jones) (11/03/88)

From article <1709@garth.UUCP>, by smryan@garth.UUCP (Steven Ryan):
...[I missed the article to which Mr. Ryan is responding.]

>> Well, gee, how do you allocate an indefinite sized array?
> 
> Glad you asked. I stuff it down a stack, each element individually
> handcrafted with malloc, until I've reached the end and know exactly
> how big it is. Then, IF I need random access, I copy it to an array.
> Linear time.


A method I have used quite successfully for some time now is
to use realloc to increase the size of a buffer by some *percentage*
of its current size.  How big a percentage depends on how
much memory you can afford to be "wasted".  Usually doubling the
size of the array works just fine.

The array will occasionally be twice as big as it need be. But
the overhead is not as much as it might seem at first.  Notice that in
the stacking scheme described above, there is an overhead for
links and heap-structure headers.  On the machine I mostly use (Sun-3),
that would increase memory usage for character arrays by a factor of
ten!  And with that particular libc.a, malloc overhead can be devastating 
in the speed column. (I would not use malloc "raw" for linked lists, but
that's another issue.)

The algorithm is linear in time and space.

Here's a sample of an example. (I'm a poet, and didn't know it.)

From my personal library...


/* Vbuf.h */
#ifndef _VBUF_H_

typedef
struct Vbuf
{
  /* Public */
  unsigned char* buf; /* current buffer. Read-only. */
  int dot;   /* byte-displacement of next byte 
             ** to be put or bss'd. Read-write.
             */

  /* private */
  int bufsize; /* size, in bytes, of current buffer */
} Vbuf;

Vbuf* Vbuf_init(/* Vbuf* */);
void  Vbuf_clean(/* Vbuf* */);

/* These return the old dot. */
int   Vbuf_put_uchar(/* Vbuf*, unsigned  */);
int   Vbuf_put_ushort(/* Vbuf*, unsigned */);
int   Vbuf_put_ulong(/* Vbuf*, unsigned */);
int   Vbuf_put_block( /* Vbuf*, unsigned char*, int */)
int   Vbuf_bss_uchar(/* Vbuf* */);
int   Vbuf_bss_ushort(/* Vbuf* */);
int   Vbuf_bss_ulong(/* Vbuf* */);
int   Vbuf_bss(/* Vbuf*, int */);

/* These return the new dot */
int   Vbuf_align_short(/* Vbuf*/);
int   Vbuf_align_long(/* Vbuf */);

/* Returns value from system-call write() */
int   Vbuf_write(/* Vbuf*, int */);   /* int is file-descriptor */

#define _VBUF_H_
#endif  _VBUF_H_



You can probably guess what the code has to look like. Here's
a fragment...


#include "Vbuf.h"
#define INITIAL_SIZE 16

static void expand();

Vbuf_put_uchar(obj, val)
  register Vbuf* obj;
  unsigned val;
{
  while(obj->dot >= obj->bufsize) expand(obj);
  *(obj->buf + (obj->dot)++) = (unsigned char) val;
  return obj->dot - 1;
}


static void
expand(obj)
  register Vbuf* obj;
{
  if(obj->bufsize == 0 )
    { /* first allocation */
      obj->buf = (unsigned char*)smalloc(INITIAL_SIZE);
      obj->bufsize = INITIAL_SIZE;
    }
  else
    {
      obj->bufsize *= 2;
      obj->buf = (unsigned char*)realloc(obj->buf, obj->bufsize);
    }
}

smryan@garth.UUCP (Steven Ryan) (11/04/88)

The problem is that allocating by some multiple badly approximates the required
size after a number of iterations.

Getting back to the original point: Make sure the basic algorithm is good.

(1) Can you never have the case where the next realloc will require more
    memory than available even though the actual size needed will fit?

(2) Can you avoid storage fragmentation so that only the current allocation
    exists and not all previous incarnations?
-- 
                                                   -- s m ryan
+--------------------------+-----------------------------------------------+
|Canada -- where American  |  The world will have peace. The question is   |
|subsidies are unfair.     |  whether people will be there to enjoy it.    |
+--------------------------+-----------------------------------------------+

daveb@geaclib.UUCP (David Collier-Brown) (11/06/88)

 From article <1709@garth.UUCP>, by smryan@garth.UUCP (Steven Ryan):
>> Well, gee, how do you allocate an indefinite sized array?
> 
> Glad you asked. I stuff it down a stack, each element individually
> handcrafted with malloc, until I've reached the end and know exactly
> how big it is. Then, IF I need random access, I copy it to an array.
> Linear time.


  A variant on that is to write a recursive function which declares
a fixed-size buffer, fills that and then recurses. On the way "up" it
mallocs a single correct-size buffer and copues each chunk into it.
This mitigates some of the problems mentioned by djones @
megatest.UUCP (Dave Jones).
  I posted a getline function some months ago that did just that.

  Needless to say, this works best when 
	1) the variation of the maximum size is only an order of
	   magnitude or so, and
	2) you start with a largish size for the declared buffer.

--dave c-b
-- 
 David Collier-Brown.  | yunexus!lethe!dave
 Interleaf Canada Inc. |
 1550 Enterprise Rd.   | HE's so smart he's dumb.
 Mississauga, Ontario  |       --Joyce C-B