[net.lang.c] Help with malloc

LAVITSKY@RU-BLUE.ARPA (08/09/84)

From:  Eric <LAVITSKY@RU-BLUE.ARPA>

Hi,

 I'm quite a novice  to C, and  would like to  know if anyone  could
help me with the following:

  I need to allocate memory space for some buffers in the  following
manner: In the middle of my program I read in some parameters from a
file which  will  determine  the  size  of  my  buffer  space.   The
parameters are two  integers -  and the buffer  itself will  contain
integers.  So, if I read in a value of 8192 from my file header -  I
need to then allocate  my buffers to be  exactly 8192 bytes of  type
integer.  I will  also be reading  data into the  buffers from  this
file. How do I use malloc() to allocate my buffer space ? Everything
I read so far is very vague as to how this is used. I would like  to
read data  into  the array  and  allocate  each byte  of  the  array
dynamically as it is needed. Something like:

	get buffer size;
	loop:
		allocate buffer byte;
		read data into buffer;
	untill buffer size;

Thanx,
Eric
-------

bbanerje@sjuvax.UUCP (B. Banerjee) (08/18/84)

Hello,
	I have a similar question which I would appreciate
any solutions to.

I have a large (working) program involving a two dimensional array that
I would like to allocate storage for, dynamically (calloc).  I really
don't want to go through and change all the array references to do
pointer arithmetic.

What I would like to do is something like this -

datum **myarray;	/* datum is typedef'ed */

myarray = calloc(num_rows * num_cols, sizeof(datum));


The trouble is, what should I cast calloc to in order that I may
address myarray using indices?  I thought of doing something like
this :

myarray = (datum **) calloc ( ....)

However, the datum that "myarray[i][j]" would address is not
clear to me.  The situation is even murkier when using higher
dimensioned arrays.

I would be very grateful for any ( and all ) answers.  Please
reply by mail, as I doubt greatly that this is of general
interest.

Regards,

-- 
				Binayak Banerjee
		{allegra | astrovax | bpa | burdvax}!sjuvax!bbanerje
P.S.
	Send Flames, I love mail.

gwyn@BRL-VLD.ARPA (08/19/84)

From:      Doug Gwyn (VLD/VMB) <gwyn@BRL-VLD.ARPA>

malloc() is most efficient if you allocate the needed space all at
once, rather than a "byte" (actually, you seem to mean "integer
datum") at a time.  The simplest code to do what I think you want is:

#include	<stdio.h>
...
	typedef char	*generic;	/* generic pointer type, (void *)
					   after ANSI C committee is done */
	extern generic	calloc();	/* layer around malloc() */
	extern void	free();

	void		Fatal();	/* prints error message & aborts */

	typedef int	datum;		/* more often, a struct */
	int		ndata;		/* # data records to allocate */
	register datum	*dp;		/* -> allocated array */
	register int	i;		/* index data array */
	register FILE	*fp;		/* data file stream pointer */

	/* <get buffer size into `ndata' somehow> */

	if ( (dp = (datum *)calloc( ndata, sizeof(datum) )) == NULL )
		Fatal( "ran out of memory" );	/* always check! */

	/* <get data file open on stream `fp' somehow, e.g. fopen()> */

	for ( i = 0; i < ndata; ++i )
		if ( fread( (char *)&dp[i], (int)sizeof(datum), 1, fp ) != 1 )
			Fatal( "error reading data" );
	/* this is just an example; in this case you could have had:
	if ( fread( (char *)dp, (int)sizeof(datum), ndata, fp ) != ndata )
		Fatal( "error reading data" );
	... and avoided the loop altogether			    */

	/* <compute with data now in buffer, using dp[i] to access
	    the i-th element of the array>			   */

	free( (generic)dp );		/* "waste not, want not" */

calloc() is just a simple layer around malloc() to take care of
multiplying the number of items by the size of each; it also fills
the allocated storage full of 0s which makes debugging less messy.

chris@umcp-cs.UUCP (08/20/84)

(Warning:  what follows is a long explanation of various subscripting
phenomena.)

	From: bbanerje@sjuvax.UUCP (B. Banerjee)

	What I would like to do is something like this -

	datum **myarray;	/* datum is typedef'ed */

	myarray = calloc(num_rows * num_cols, sizeof(datum));

	The trouble is, what should I cast calloc to in order that I may
	address myarray using indices?  I thought of doing something like
	this:

	myarray = (datum **) calloc ( ....)

	However, the datum that "myarray[i][j]" would address is not
	clear to me.  The situation is even murkier when using higher
	dimensioned arrays.

(Tap tap tap)  Quiet in the back row!  There, that's better.  Can
everyone hear me?  What?  Is that better?  Ok, good.

Today we're going to talk about multi-dimensional arrays.  We all
understand single dimensional arrays, right?  One simply has a large
area of storage that is accessed by adding an offset (the subscript) to
the start of the memory area and poking around in there.  No problem.

Well, in many compiled languages, like C for example [of course!],
multi-dimensional arrays are really the same thing, only fancier.
Think of it this way:  suppose the array is declared as

	int a[20][50];

Then we can just multiply the two constants together and make a big
block of integers (in this case 1000 of them).  This is our storage
area.  To access one of these, we have two choices:  we can use what's
called ``row major'' order or ``column major order''.

Suppose we're asked to fetch a[6][4].  (Let's assume we have zero-based
arrays, so we go from a[0][0] to a[19][49].)  We can either multiply
the first subscript by the second subscript size and add the second
subscript.  This gives 6 times 50 plus 4 which is 304.  So we add 304
to the storage area for ``a'' and grab an integer from there.  This is
the one called ``row major'' (I think).

The other way we could do it is multiply the 4 times the 50 and add
6; that is, start with the second subscript and multiply it by the
second subscript size, then add the first subscript.  From here we
do the same thing as before.  So add 206 to ``a'' and fetch from there.
If I haven't got them reversed this is ``column major'' order.

Going back to the first, let's take a look at a three dimensional
array.  Given ``int a[4][6][5];'' as the declaration and ``a[2][3][4]''
as the reference, we take 2, multiply by six, add three, multiply the
result by five, and add four, giving 79.

We can generalize the first as follows:

	offset = 0;
	for (i = 0, j = num_dimens - 1; i < num_dimensions; i++, j--) {
		offset = offset + array_index[i];
		offset = offset * array_sub[j];
	}

where num_dimens is the number of dimensions in the array,
array_index[] contains the constants used in the declaration of the
array, array_sub[] contains the values used in the reference, and
offset computes the offset from the base of the storage area.

Comment?  What?  Yes, I know it's not optimized C code.  This is
just an illustration.  Also keep in mind that for a compiler the
subscripts (array_sub[] elements) might well be variables or
expressions, and the offset will have to be computed at RUN time.

Everyone got that?  Too bad, we're moving on anyway.

All those multiplications done at run time can really slow things
down.  Wouldn't it be nice if there were some other way to get to the
various array elements?  Also, wouldn't it be nice not to have to have
60 columns for every row, when only one or two of them really need all
of those?

I see some smiles.  You, you look like you want to say something.

Did everyone hear that?  No?  He's right; he said ``pointers''.  We can
use pointers to weasel out of multiplications.  Not only that but we
can make different ``subscripts'' have different numbers of
``sub-subscripts''.  Of course there's a price.  We have to pay for
this with storage and much head-scratching.

Let's look at a ``two-dimensional'' ``array'' using pointers.
Think of it this way:  instead of having a big rectangular area
of storage like this:

	a[0][0]		a[0][1]		a[0][2]
	a[1][0]		a[1][1]		a[1][2]

we set up one vector like this:

	a[0]
	a[1]

and make each point to another vector.  Now we have something like
this:

	a[0]===>[0] [1] [2]
	a[1]===>[0] [1] [2]

That is, one two-element vector of pointers, and two three-element
storage areas.  We're using more space now, because we have, in
addition to the six storage areas for values, two more for pointers.

But, we can now access ``a'' without multiplies.  If we need to
generate code for ``a[1][0]'', we just add 1 to the vector of pointers
and fetch that, then add 0 to that and fetch from there.  In other
words, follow the pointer for a[1] and take [0] from there.

Also, we can set up the vectors like this instead:

	a[0]===>[0] [1] [2]
	a[1]===>[0] [1] [2] [3] [4] [5] [6]

Since we aren't committed to ``rectangular'' storage areas, and in fact
can put the data in a[0][] ``far away'' from the data in a[1][], we
aren't limited to fixed sizes.  Of course, it gets confusing if a[0]
can access only [0] through [2] but a[1] can access [0] through [6].

We can make ``higher-dimensional'' ``arrays'' the same way.  If ``a''
is supposed to be a five by ten by four array, we can set up five
vectors pointing to ten vectors pointing to storage, like this:

	a[0]===>[0] [1] [2] ... [9]
	a[1]\	 v   v   v       v
	a[2]\\	[0] [0] [0] ... [0]
	a[3]\\\	[1] [1] [1] ... [1]
	a[4] |||[2] [2] [2] ... [2]
	  |  |||[3] [3] [3] ... [3]
	  |  |||
	  |  ||\[0] [1] [2] ... [9]
	  |  ||	 v   v   v       v
	  |  ||	[0] [0] [0] ... [0]
	  |  ||	[1] [1] [1] ... [1]
	  |  ||	[2] [2] [2] ... [2]
	  |  ||	[3] [3] [3] ... [3]
	  |  ||
	  |  |`=>.   .   .  ...  .
	  |  `==>.   .   .  ...  .
	  |
	  `====>[0] [1] [2] ... [9]
		 v   v   v       v
		[0] [0] [0] ... [0]
		[1] [1] [1] ... [1]
		[2] [2] [2] ... [2]
		[3] [3] [3] ... [3]

(Ok, so I'm no artist.)

This time, to access a[2][7][3], we add 2 to the a[] area, follow
that pointer, add 7 to that area, follow that pointer, and add 3
to that area.

Ok, so how to we go about building such a thing, in C?  Suppose
that you want to read two numbers and set up an array of that size.
The array is to hold things of type ``datum'' (a typedef, for what
we don't care).  Here's such a routine:

	datum **a;
	int a_rows, a_cols;

	build_a () {
		int i;

		if (scanf ("%d %d", &a_rows, &a_cols) < 2)
			return (-1);
		a = (datum **) calloc (a_rows, sizeof (datum *));
		if (a == 0)
			return (-1);
		for (i = 0; i < a_rows; i++) {
			a[i] = (datum *) alloc (a_cols, sizeof (datum));
			if (a[i] == 0)
				return (-1);
		}
		return (0);
	}

(Note the tests to make sure that ``calloc'' worked.)

Rewriting this for three, four, or more dimensions is left as an
exercise to the student.  (In other words, I sure don't want to
work that hard.)

Ok, that's all for today.  Quiz Thursday.  Good night.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci (301) 454-7690
UUCP:	{seismo,allegra,brl-bmd}!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@maryland

jack@vu44.UUCP (Jack Jansen) (08/28/84)

Chris Torek:
> Think of it this way:  suppose the array is declared as
> 
> 	int a[20][50];
> 
> Suppose we're asked to fetch a[6][4]. ....
> 
> The other way we could do it is multiply the 4 times the 50 and add
> 6; that is, start with the second subscript and multiply it by the
> second subscript size, then add the first subscript.  From here we
> do the same thing as before.  So add 206 to ``a'' and fetch from there.
> If I haven't got them reversed this is ``column major'' order.
Sorry Chris, but I think this is wrong. What you should do is
multiply the second subscript by the *first* subscript size, 
and add the first subscript, or, in this example use 4*20+6, or 86.

	Jack Jansen, {philabs|decvax}!mcvax!vu44!jack