[net.lang.c] help converting multi-dim arrays

physics@utcs.UUCP (David Harrison) (04/05/85)

From utfyzx!harrison Thu Apr  4 19:53:50 1985
Received: by utcs.UUCP (4.24/4.7) id AA12979; Thu, 4 Apr 85 19:53:46 est
Date: Thu, 4 Apr 85 19:53:46 est
From: utfyzx!harrison
Apparently-To: physics
[]
The problem: a large piece of code with a number of arrays, some 
2 dimensional and a few 3 dimensional, accessed with lines like:
    for(row=0; row<numrows; row++)
        for(col=0; col<numcols; col++)
            ...data[row][col] ...
The declaration of the matrices, such as
    static double data[MAXROWS][MAXCOLS]; /* just like FORTRAN */
is larger than any conceivable set of data it will be asked to 
deal with, and wastes scads of memory.  So, we want to convert 
the code to use malloc(3C) to dynamically allocate memory and 
save all that wasted space. But, we still want to access the 
matrices as:
           ...data[row][col] ...
because: 1) we don't want to change all the code, and 
2)  the application makes one think in matrices and so this way 
of writing the code will be much easier to maintain.  Thus, we 
are trying to avoid:
    static double *data;
with access via stuff like:
            ... *(data + row*numcols + col) ...
Any ideas?
Dave Harrison, Dept. of Physics, Univ. of Toronto: ..utzoo!utcs!physics

ksbszabo@wateng.UUCP (Kevin Szabo) (04/06/85)

In article <569@utcs.UUCP> you write:
>But, we still want to access the 
>matrices as:
>           ...data[row][col] ...
>Any ideas?
>Dave Harrison, Dept. of Physics, Univ. of Toronto: ..utzoo!utcs!physics

If you only have one matrix that you want to access, how
about:

static double *data;
#define	DATA(row,col)	(*(data + row*numcols + col))

Or if you only have a few, how about a couple of macro's?
You can even adopt the matrix format where the matrix is an array of
pointers.  It's all hidden in the macro.

The nice thing about this is that you could easily implement a sparse
matrix by redefining DATA to be a function or inline code.

#define	DATA(row,col) (*Find_address_of_sparse_array_element(row,col))

			Kevin
-- 
Kevin Szabo  watmath!wateng!ksbszabo (U of Waterloo VLSI Group, Waterloo Ont.)

friesen@psivax.UUCP (Stanley Friesen) (04/09/85)

In article <569@utcs.UUCP> physics@utcs.UUCP writes:
>From utfyzx!harrison Thu Apr  4 19:53:50 1985
>Received: by utcs.UUCP (4.24/4.7) id AA12979; Thu, 4 Apr 85 19:53:46 est
>Date: Thu, 4 Apr 85 19:53:46 est
>From: utfyzx!harrison
>Apparently-To: physics
>[]
>The problem: a large piece of code with a number of arrays, some 
>2 dimensional and a few 3 dimensional, accessed with lines like:
>
>    for(row=0; row<numrows; row++)
>        for(col=0; col<numcols; col++)
>            ...data[row][col] ...
>
>The declaration of the matrices, such as
>
>    static double data[MAXROWS][MAXCOLS]; /* just like FORTRAN */
>
>is larger than any conceivable set of data it will be asked to 
>deal with, and wastes scads of memory.  So, we want to convert 
>the code to use malloc(3C) to dynamically allocate memory and 
>save all that wasted space. But, we still want to access the 
>matrices as:
>           ...data[row][col] ...
>Any ideas?
>Dave Harrison, Dept. of Physics, Univ. of Toronto: ..utzoo!utcs!physics

	Fairly simple. given that you can can calculate the number
of rows and columns desired the following should work:

	char *calloc();
	double **vec;
	int rows, cols, i;

	rows = some stuff;
	cols = other stuff;

	/* Allocate a vector of row pointers of size rows	*/
	vec = (double **)calloc(rows, sizeof(double *));

	/* Allocate a vector of doubles for each row	*/
	for(i = 0; i < rows; i++)
	{
		vec[i] = (double *)calloc(cols, sizeof(double));
	}

And make sure all your routines that use it know how many rows
and columns actually exist and never access past the end.
With this proviso "vec" is usable *exactly* like "data" above.

If you do not have calloc() it is equivalent to malloc(num*size)
-- 

				Sarima (Stanley Friesen)

{trwrb|allegra|cbosgd|hplabs|ihnp4|aero!uscvax!akgua}!sdcrdcf!psivax!friesen
or {ttdica|quad1|bellcore|scgvaxd}!psivax!friesen

therneau@ur-msbvax.UUCP (04/09/85)

  A good solution is to use an array of vectors, i.e.,


	static double **data;
		.
		.
		.
		.

	data = calloc(nrows, sizeof(*data));  /*allocate a vector of pointers*/
	for (i=0; i<ncols; i++)
		data[i] = calloc(ncols, sizeof(**data));


The loop allocates storage for each row of the array, and initializes the
the vector of pointers to that storage.  Now the line

		whatever = data[i][j]

in the C program works just fine.  The operation done is two fetches from
memory rather than the (i*ncols + j) of an ordinary array reference; there
is some evidence that the (old) multiplication is slower than the (new)
double memory reference.  As an added bonus, 'data' can be passed to a
subroutine without dimensioning hassles.

    The allocation loop above could certainly be speeded up by doing
only two calloc calls--

	data = calloc(nrows, sizeof(*data));  /*allocate a vector of pointers*/
	dbl_ptr = calloc(nrows*ncols, sizeof(**data));
	for (i=0; i<ncols; i++)  {
		data[i] = dbl_ptr;
		dbl_ptr += ncols;
		}


			Terry Therneau
			Statistics Dept.
			University of Rochester

wildbill@ucbvax.ARPA (William J. Laubenheimer) (04/10/85)

>From utfyzx!harrison Thu Apr  4 19:53:50 1985
>Received: by utcs.UUCP (4.24/4.7) id AA12979; Thu, 4 Apr 85 19:53:46 est
>Date: Thu, 4 Apr 85 19:53:46 est
>From: utfyzx!harrison
>Apparently-To: physics
>[]
>The problem: a large piece of code with a number of arrays, some 
>2 dimensional and a few 3 dimensional, accessed with lines like:
>    for(row=0; row<numrows; row++)
>        for(col=0; col<numcols; col++)
>            ...data[row][col] ...
>The declaration of the matrices, such as
>    static double data[MAXROWS][MAXCOLS]; /* just like FORTRAN */
>is larger than any conceivable set of data it will be asked to 
>deal with, and wastes scads of memory.  So, we want to convert 
>the code to use malloc(3C) to dynamically allocate memory and 
>save all that wasted space. But, we still want to access the 
>matrices as:
>           ...data[row][col] ...
>because: 1) we don't want to change all the code, and 
>2)  the application makes one think in matrices and so this way 
>of writing the code will be much easier to maintain.  Thus, we 
>are trying to avoid:
>    static double *data;
>with access via stuff like:
>            ... *(data + row*numcols + col) ...
>Any ideas?
>Dave Harrison, Dept. of Physics, Univ. of Toronto: ..utzoo!utcs!physics

Yes. Essentially, you replace the "block of storage" concept with a
"array of arrays of smaller dimension" concept. Thus, a 3-d array
is an array of 2-d arrays, and a 2-d array is an array of 1-d arrays.
The interesting thing about C is that it lets you do this without having
to place the subarrays contiguously in memory or even knowing how big they
are. A good example of this is provided by the declaration of argv in

main (argc, argv) int a; char **argv; (or char *argv[])

You can access individual characters of the arguments by argv[argn][cpos],
and the individual elements are of different lengths. All this is brought
to you by the partial equivalence of pointers and arrays.

The general form for an n-dimensional array represented as a 1-dimensional
array of (n-1)-dimensional arrays is:

<object> ** ... * arrayName;
         (n *'s)

where <object> is the type being stored in the array.

Now for the details of implementation. A simple implementation which
isn't too bad if you have a small number of types and don't go too
high in the dimension hierarchy consists of the functions arr_2dOfObject,
... arr_ndOfObject, where "Object" is replaced by the particular type
you are generating arrays for, and n tops out at the highest dimension
you need for that type. The procedures are:

char	*calloc();

/*
	Getting things started - the 2-d case.
	Replace Object by the element type.
*/

Object
**arr_2dOfObject(dim1, dim2)
	int	dim1, dim2;
{
	Object	**array;
	int	index;

	array = (Object **)calloc(dim1, sizeof (Object *));
	for (index = 0; index < dim1; index++)
		array[index] = (Object *)calloc(dim2, sizeof (Object));
	return array;
}

/*
	Continuing along - the general case.
	Replace Object by the element type.
	Replace _n_ by the number of dimensions the function is being
	defined for. Similarly for _n-1_, etc.
*/

Object
** ... *arr__n_dOfObject(dim1, ..., dim_n_)
/* _n_ *'s,		 _n_ dim's */
	int	dim1, ..., dim_n_;
{
	Object	** ... *array;
/*		_n_ *'s */
	int	index;

	array = (Object ** ... *)calloc(dim1, sizeof (Object ** ... *));
/*			_n_ *'s				     _n-1_ *'s */
	for (index = 0; index < dim1; index++)
		array[index] = arr__n-1_dOfObject(dim2, ..., dim_n_);
	return array;
}

***** END OF SIMPLE-MINDED IMPLEMENTATION *****

The following changes need to be made to convert a program using Object[][]-
style arrays to **Object-style arrays:

For each type Object which will use the pointer format, the array
initializing functions arr_2dOfObject, ..., arr__n_dOfObject must be
defined up to the maximum n for which the pointer format will be used
on this object;

The declaration of each array using the pointer format is changed, with one
prefix * being added for each postfix [] which is removed;

*VERY IMPORTANT*: Each array using the pointer format *MUST BE INITIALIZED*
before it can be used. To initialize an array of size d1 x d2 x ... x dn,
use array__n_dOfObject(d1, ..., dn). The return value is an array of the
appropriate dimensions.


OK. That's how the simple model works. Now, here's a tricky one that
purports to do all of the above with just one procedure, but assumes that
you have a reasonable pointer mode (all pointers are the same size, pointer
conversion is the identity, calloc() handles potential alignment
problems (if they exist) satisfactorily) and you don't mind
weird-looking casts:

/*
	Initialize an n-dimensional array of size-byte objects.
	The array dim[] is an n-element array where dim[i] is the
	number of elements along dimension n.
*/

char	*calloc();

char
*arr_nD(n, dim, size)
	int	n;
	int	*dim;
	int	size;
{
	char	*array;

	if (n == 1)
		array = calloc(dim[0], size);
	else {
		array = calloc(dim[0], sizeof (char *));
		for(index = 0; index < dim[0]; index++)
			array[index] = arr_nD(n - 1, dim + 1, size);
	}
}

***** END OF BETTER VERSION FOR WELL-BEHAVED ARCHITECTURES *****

A sample initialization which shows how the initialization changes here
(everything else remains the same as the previous version) is for
a 3-dimensional length x width x depth array of doubles:

int	d[3];
double	***arr;

d[0] = length;
d[1] = width;
d[2] = depth;
arr = (double ***)arr_nD(3, d, sizeof(double));

--------------------------------------------------------------------

No guarantees on any of this stuff, unfortunately. I have used some
variants of the simple case, but have never had large numbers of these
beasts running around in the same program. If it doesn't work, show
it to your best approximation to a local C wizard and see if he can
get it to fly for you.

                                        Bill Laubenheimer
----------------------------------------UC-Berkeley Computer Science
     ...Killjoy went that-a-way--->     ucbvax!wildbill

gwyn@Brl-Vld.ARPA (VLD/VMB) (04/11/85)

Re: large matrix use in C

The closest approach meeting your stated requirements is to vector
matrix rows:

	static	*datap[MAXROWS];	/* init NULL */
	...
	/* access like */ datap[row][col]

With this approach, when setting up the array you have to test
for a null datap[row] and when necessary malloc() a whole row:

	if ( datap[row] == NULL )
		if ( (datap[row] = (double *)malloc( MAXCOLS *
				sizeof(double) )) == NULL )
			/* punt */;
		else
			for ( col = 0; col < MAXCOLS; ++col )
				datap[row][col] = 0.0;

Clearly the only space gain here is if a sizeable fraction of
the rows are never used.  However, there is an enormous speed
gain in some cases by using vectoring instead of having code
generated for index arithmetic (a multiply and add).

To do better with more general sparse arrays, you have to give
up the requirement that array element references look like

	data[row][col]

and instead use macros or functions:

	init_data();	/* sets up structures and fills with 0.0 */
	put_data( row, col, value );	/* stores value in "data" */
	value = get_data( row, col );	/* retrieves value */

Most data structures textbooks discuss algorithms for sparse
arrays.

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (04/11/85)

> 	static double **data;
> 		whatever = data[i][j]

RONG!

smith@nrl-aic.ARPA (Russ Smith) (04/11/85)

I ran into a similar problem a number of years ago when I was
writing an "array relaxation" software package which had to handle
varying sized three dimensional arrays (and had to fit on a PDP-11/45).
I got around the problem by writing a command file which created an
"include" file on-the-fly and then recompiled the C program(s) (hack, hack!).
It worked, though...

The technique was similar (but not identical) to:

	setup 5 7 3 256 256 peleg	# Create and compile a package
					# of C programs to do "peleg"
					# relaxation using a 5x7
					# neighborhood on 3-labeled
					# probabilistic images of maximum
					# size 256x256...

which would create an include file with lines of the form:

	#define MAXNCOLS	256
	#define MAXNROWS	256
	#define NLABELS		3
	#define NBRHT		7
	#define NBRWDTH		5
	
which could then be included into the source file with, e.g.:

	#include "peleg.h"

and used as in:

	double neighborhood[NBRHT][NBRWDTH];
	double onerow[MAXNCOLS];

and so on. Kludgey, but it did allow many folks at the good old Computer
Vision Laboratory to test out various forms of algorithms on varying
sized three dimensional arrays most efficiently. (Note that a different
package of programs was created (and named appropriately) for each
importantly different set of parameters.)

Unless there are better ways now (for example, using a makefile with the
same functionality (which I, alas, either didn't know exist or didn't
have available at the time and mini-reinvented)) this method can be
used.

Russ <Smith@nrl-aic>
Navy Center for Applied Research in Artificial Intelligence

libes@nbs-amrf.UUCP (Don Libes) (04/12/85)

Please reference this month's issue of Micro/System's Journal.  The C Forum
discusses what you ask about and how to do it.

Don Libes {seismo,umcp-cs}!nbs-amrf!libes

donn@utah-cs.UUCP (Donn Seeley) (04/14/85)

It's not strictly necessary to have an array of pointers to provide a
multi-dimensional array that varies in size, provided the variation is
limited to one dimension.  The following example shows one way to
dynamically allocate a multi-dimensional array:

------------------------------------------------------------------------
# define	MAXROWS		10000
# define	MAXCOLS		1000

f( nrows )
	int		nrows;
{
	register double	(*dp)[MAXROWS][MAXCOLS];
	char		*malloc();

	dp		= (double (*)[MAXROWS][MAXCOLS])
			  malloc( nrows * MAXCOLS * sizeof (double) );
	if ( dp == NULL )
		bomb( "Out of memory" );
	(*dp)[1][0]	= 1.0;
	...
	free( (char *) dp );
	return;
}
------------------------------------------------------------------------

I believe this is portable code, going on the basis of section 14.3 of
the C reference manual, but I'd be happy to be corrected if wrong.
(Actually, the main reason I like this example is that it provides a
rare instance of a use for one of the more bizarre C type
declarations...)

Donn Seeley    University of Utah CS Dept    donn@utah-cs.arpa
40 46' 6"N 111 50' 34"W    (801) 581-5668    decvax!utah-cs!donn