[comp.lang.misc] Dynamic array dimensioning

gsg0384@uxa.cso.uiuc.edu (01/05/89)

Hi,

Can someone in the net enlighten me about the following?

1.  What languages support the run-time array sizing in a particular module
as in FORTRAN 8x ?  Please list.

2  It seems (I'm not sure) impossible to dimension a local array 
dynamically even in FORTRAN 8x.  What languages support such dynamic 
array sizing in any place as can be done in Turing?  Please list.

3  Please list languages that cannot do either.

   FORTRAN 77, and ___________

4.  Is such dynamic array feature too costly to implement or what?

Thanks.
                         Hugh   gsg0384@uxa.cso.uiuc.edu

barmar@think.COM (Barry Margolin) (01/06/89)

PL/I allows automatic arrays to be sized dynamically.  The size of an
array can be any expression using variables from a containing lexical
context; for example, you can do:

	declare array_size fixed binary;

	array_size = compute_size();
	begin;
	     declare array (array_size) fixed binary (8);
	     ...
	end;

But you couldn't do this if array and array_size were declared in the
same block.  The one exception to this is arrays in the top-level
block of a procedure, which may be dimensioned using its parameter
variables, e.g.

	my_proc: procedure (array_size);
	     declare array_size fixed binary parameter;
             declare array (array_size) fixed binary (8);
             ...
	end my_proc;

In general, the rule is that the array dimension must be computable
before any local variables are initialized.

Barry Margolin
Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

pardo@june.cs.washington.edu (David Keppel) (01/06/89)

In article <117400002@uxa.cso.uiuc.edu> gsg0384@uxa.cso.uiuc.edu writes:
>[What languages have run-time local array sizing?]
>[Impossible to do in FORTRAN?]
>[Is dynamic sizing too costly to implement?]

Most of the ``very dynamic'' languages, e.g., Smalltalk have automatic
dimension resizing.  Others, such as Ada, Algol, Simula, and PL/1 have
array size bound and procedure entry time.  GNU cc also allows this as
a nonstandard extension.

[To the best of my knowledge] all standard FORTRANs must be able to
run on a machine that has no dynamic memory allocation.  Runtime size
binding definitely requires memory allocation.

Dynamic sizing is more expensive than static dimensioning for at least
the following reasons:

* The stack frame doesn't have a fixed size.  Thus you need two
  registers dedicated to the call frame.  Even if you don't care about
  using an extra register, you still have to load/reset them (time and
  code space).
* Fewer array indexing optimizations are possible.

Whether this is ``too expensive'' depends entirely on what you want.

Hope this helps.

	;-D on  ( A dynamic kind of guy )  Pardo
-- 
		    pardo@cs.washington.edu
    {rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo

johnl@ima.ima.isc.com (John R. Levine) (01/06/89)

In article <117400002@uxa.cso.uiuc.edu> gsg0384@uxa.cso.uiuc.edu writes:
>1.  What languages support the run-time array sizing in a particular module
>as in FORTRAN 8x ?  ...

Algol 60 had dynamically sized arrays in 1960, both for arguments and
local arrays.  The techniques to implement them are well known;
consult the Dragon Book or any other compiler book for details.  There
is some loss in performance compared to fixed size arrays since stack
offsets cannot be precomputed at compile time, but you have to trade
that off against the increased flexibility.  Many of Algol's descendants
(other than Pascal) also have such dynamic sizing, the most notable being PL/I.
-- 
John R. Levine, Segue Software, POB 349, Cambridge MA 02238, +1 617 492 3869
{ bbn | spdcc | decvax | harvard | yale }!ima!johnl, Levine@YALE.something
You're never too old to have a happy childhood.

bcw@rti.UUCP (Bruce Wright) (01/06/89)

In article <35163@think.UUCP>, barmar@think.COM (Barry Margolin) writes:
> 
> PL/I allows automatic arrays to be sized dynamically. [...]
> But you couldn't do this if array and array_size were declared in the
> same block.

This is not _quite_ correct.  You can't dimension an AUTOMATIC (that is,
a stack-based) array by a size which is declared in the same block;  however
you can dimension a dynamically (non-stack) allocated array be a size
declared in the same block.  For example, in full PL/I:

	alloc_test:	procedure ;

	declare array_size fixed binary ;
	declare array (array_size) fixed binary controlled ;

		array_size = 10 ;
		allocate array ;

	/* array is now a generation of storage with an implied pointer	*/
	/* to it.  Further allocate statements will create a simple LIFO*/
	/* stack of storage objects;  only the top object on the stack	*/
	/* is visible at a time.  Earlier objects can be viewed again	*/
	/* by freeing the top object on the stack.			*/

	end alloc_test ;

This will not work in any of the PL/I Subset G (general purpose subset)
standards.  CONTROLLED is a deprecated feature (planned for removal)
because it is possible to do the same sort of thing with the more general
BASED storage attribute (more similar to other pointer or reference
objects in other languages).  This use of BASED is now possible in the
current Subset-G Standard (ANSI X3.74-1987):

	alloc_test:	procedure ;

	declare p pointer ;
	declare array_size fixed binary ;
	declare 1 dyn_array based (p),
		2 dyn_size fixed binary,
		2 array (array_size refer dyn_size) fixed binary ;

		array_size = 10 ;
		allocate dyn_array ;

		/* The member "array" in "dyn_array" now contains 10	*/
		/* elements;  the pointer p now points to the latest	*/
		/* allocation of dyn_array.  dyn_size is set to the	*/
		/* current size of the dynamic array.  This is a more	*/
		/* normal reference (or pointer) situation where earlier*/
		/* copies of the array can be viewed by using a pointer	*/
		/* without freeing the top object.			*/

	end alloc_test ;

Some other syntax rules have been used by various compilers created
before the standards were fully established (for example, the IBM
PL/I Optimizing compiler and the IBM F-level compiler, both popular on
the large IBM mainframes) to accomplish dynamic array sizes, but the
effect is similar.

> In general, the rule is that the array dimension must be computable
> before any local variables are initialized.

This _is_ correct - however there are more ways to elaborate the local
variables than just to enter blocks (PROCEDURE and BEGIN statements).

					Bruce C. Wright

ken@aiva.ed.ac.uk (Ken Johnson) (01/06/89)

In article <117400002@uxa.cso.uiuc.edu> gsg0384@uxa.cso.uiuc.edu writes:
>
>Hi,
>
>Can someone in the net enlighten me about the following?
>
>1.  What languages support the run-time array sizing in a particular module
>as in FORTRAN 8x ?  Please list.

Do you include the well established trick

foo()
{
	int	*x, *alloc();
	int	n, m;

	...requires an array n by m....

	x = alloc(n*m*sizeof(int));
	bar(x,n,...);
}

bar(a,n,...)
int *a;
int n;
{
	int	i,j;

	...wishes to set element a[i,j]...

	*(a+(i*n)+j) = expr;
}

You will have to do some type conversion in this last expression, I'm
afraid (I can't do it in my head).  It's easier in the now more or less
obsolete language BCPL, which provices a special tool APTOVEC which
makes this way of doing it easier. 

To give the appearance of dynamic dimensioning you could define a macro
to conceal the awkward expression.  This is all that a
dynamic-dimensioning compiler can do anyway. 

-- 
==============================================================================
From:       Ken Johnson
Address:    AI Applications Institute, The University, EDINBURGH, Scotland
Phone:      031-225 4464 ext 212
Email:	    k.johnson@ed.ac.uk
Quotation:  I've had a rotten day at work so far. My best friend didn't come.

hmm@laura.UUCP (Hans-Martin Mosner) (01/06/89)

In article <117400002@uxa.cso.uiuc.edu> gsg0384@uxa.cso.uiuc.edu writes:
>
>Hi,
>
>Can someone in the net enlighten me about the following?
>
>1.  What languages support the run-time array sizing in a particular module
>as in FORTRAN 8x ?  Please list.
These languages are quite unrelated to Fortran, but you asked for it:
Smalltalk, most (all ?) Lisps, some Basics & Pascals, C (if you accept a
pointer for an array and implement growing by copying)
>
>2  It seems (I'm not sure) impossible to dimension a local array 
>dynamically even in FORTRAN 8x.  What languages support such dynamic 
>array sizing in any place as can be done in Turing?  Please list.
All those listed above
>
>3  Please list languages that cannot do either.
>
>   FORTRAN 77, and ___________
I don't use languages that cannot do it :-)
>
>4.  Is such dynamic array feature too costly to implement or what?
This depends on what costly means to you.  In general, dynamic resizing
involves using a pointer to the array instead of direct addresses
(this part is cheap on all sensible machines) and copying the values on
each resize.  The latter can be expensive, but it costs nothings as long
as you don't use it, so it is exactly as expensive as not having it at all :-)
Of course you need some dynamic memory allocation mechanism to get the space
to copy into, but this is not very expensive if done right.
If you want bounds checking, the accesses can be a little bit more expensive
with variable sizes because some machines have specialized & fast boundcheck
instructions with immediate (constant) operands, and the compiler will use
them only if the array has fixed size.

Summary: the main cost incurred is that of an additional indirection
per access, which is in most cases negligible.  Don't ask me why many
languages don't support it.

	Cheers,
	Hans-Martin
-- 
Hans-Martin Mosner		| Don't tell Borland about Smalltalk - |
hmm@unido.{uucp,bitnet}		| they might invent Turbo Smalltalk !  |
------------------------------------------------------------------------
Disclaimer: Turbo Smalltalk may already be a trademark of Borland...

smryan@garth.UUCP (s m ryan) (01/07/89)

ALGOL.

60 and 68.
-- 
The tears of Hreithmar quickly dried.                               -- s m ryan
He claimed his proce: to fill the hide
with scarlet gold and cover well                               -- Andwari's Gem
and thus avoid a vengeance fell.                                         -- 1/8

pardo@june.cs.washington.edu (David Keppel) (01/08/89)

>In article <117400002@uxa.cso.uiuc.edu> gsg0384@uxa.cso.uiuc.edu writes:
>>[What languages have run-time array sizing in a particular module?]

In article <827@laura.UUCP> hmm@laura.UUCP (Hans-Martin Mosner) writes:
>These languages are quite unrelated to Fortran, but you asked for it:
>Smalltalk, most (all ?) Lisps, some Basics & Pascals, C (if you accept a
>pointer for an array and implement growing by copying)
>[...]
>[Summary: only added cost is an extra indirection]

I'd like to belabor a particular point, namely the difference between
array sizing at procedure invocation, and dynamic array sizing.

In dynamic sizing, a statement "a[i]:=val;" is meaningful for all
values of i (for which there is sufficient memory); the array "a" is
grown if necessary.  In arrays that are sized at procedure invocation,
the array has some fixed upper/lower bound for the duration of the
procedure call.

The first is certainly more flexible.  It is also more expensive, and
for several reasons.  In addition, *both* are (in general) less
efficient than statically-sized (even stack-allocated) arrays since it
is not generally possible to do as much constant folding or remove as
much computation via loop induction.  It is also usually necessary to
keep both a stack pointer and a frame pointer.


Here is a case to ponder:

    functin foo(x:integer);
	s[1..x], t[1..x] : character;
	i integer;

	for i := 1 to x do
	    ... use s[i] & t[i] ...
    }

In function-invocation sizing, the optimizer can determine (at compile
time) that no reference into s or t will ever be outside the range
0..x-1, so no bounds checking is needed.  Also, if the function call
frame looks like:

	+-----------+
	| call stuff|  <- fp
	+-----------+
	| i storage |
	+-----------+
	| s storage |
	|     :     |
	+-----------+
	| t storage |
	|     :     |
	+-----------+
			<- sp

Then access to "i" is always by a constant offset off of the stack
pointer

	movl	fp+4,r0

Access in to s is similarly cheap -- as cheap as a statically-sized
array:

	movl	(fp+4)[r0],r1

Since the size of s and t are bound at runtime, the location of t[i]
is a runtime value relative to both of fp and sp and must be computed
dynamically.  The cheapest (general) way to do this is keep a pointer
to the storage for t.  This pointer lives at a constant offset from
the frame pointer.

	+-----------+
	| call stuff|  <- fp
	+-----------+
	| i storage |
	+-----------+
	| tp storage|
	+-----------+
	| s storage |
	|     :     |

				; assume r0 holds "i"
	mov	(fp+8),r1	; r1 <- pointer to t storage
	mov	r1[r0],r1	; r1 <- t[i]

Thus reference in to an arbitrary array location takes an extra
reference.  [Although the "for" loop may be coded to keep track of
both "i" and "x-i" and thus keep track of t[i] as an offset from "sp",
this only works for two arrays and so does not generalize for the
worst-case cost for access.]

If we are stepping through the array, as in a "for" loop, we can
optimize accesses in memory references by, say, loading the pointer
"tp" into a register.  In certain cases, we need not even allocate "tp"
on the stack, we keep it in a register and/or recompute it on demand.

In the above description, the procedure must have both a frame pointer
and a stack pointer.  This adds an extra few instructions to
call/return, but only for those functions with variable-sized arrays
(probably acceptable).  For the extra price of copying the call frame
(and arguments), we can get rid of the frame pointer.


If the arrays are *dynamically* sized, we lose several things:

* It is impossible to perform loop induction to remove overhead on
  dynamically-sized arrays.  A big problem here is that we must check
  for overflow on every array access.  This is typically a lot of
  extra overhead, a significant portion of the runtime of a given
  function.
* The storage can be allocated on the stack only when there is only
  one dynamically-sized array in a given procedure.  Thus procudure
  entry/exit will generally be expensive (calls a memory allocator)
  even if the array is never resized.
* It's harder to get rid of the frame pointer.
* It is not as often possible to allocate a dynamically-sized array
  on the stack.

As a language designer, one choice is to make all three mechanisms
available and choose the most efficient implementation for each.
Those who rely on more general models will pay the price (code size,
runtime), while those who do not need it will not pay for it.

I hope that this is helpful.

  ;-D on  ( Now if I can just get rid of the array, it'll be *fast* )  Pardo
-- 
		    pardo@cs.washington.edu
    {rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo

anw@nott-cs.UUCP (01/09/89)

In article <6847@june.cs.washington.edu> pardo@cs.washington.edu (David
Keppel) writes:
> [other languages], such as Ada, Algol, Simula, and PL/1 have
>array size bound and procedure entry time.

	Minor nit:  in Algol, new arrays can arise (more-or-less) anywhere,
and their sizes cannot, in general, be bound until the relevant code is
actually executed.

>Dynamic sizing is more expensive than static dimensioning for at least
>the following reasons:
>
>* The stack frame doesn't have a fixed size.  Thus you need two
>  registers dedicated to the call frame. [...]

	Yes.  Conventionally, these will be the "local base" and the "stack
pointer".  Obviously, with fixed-size frames, the SP can double as the LB;
however, this means that temporary variables must share the same base as
ordinary variables, perhaps moving some of them outside the convenient range
of "offsets", that (in some environments) interrupts may be harder to handle,
variadic procedures may be harder to handle, etc.  Coping with this is
certainly possible, but imposes extra rules on the storage layout that may
or may not increase code size or reduce speed.

>* Fewer array indexing optimizations are possible.

	True:  for example, initial array accesses must be channelled
through a dope vector instead of directly to a known location.  On the
other hand, the programmer is more likely to write code that is amenable
to optimisation:  eg, copy the whole of an array to another array, instead
of just the top left-hand corner.

Dynamic sizing is cheaper than static dimensioning for at least
the following reasons:

* Arrays are just the right size, instead of as big as they might possibly
  need to be, some day, some time.

* User convenience.

-- 
Andy Walker, Maths Dept., Nott'm Univ., UK.
anw@maths.nott.ac.uk

skh@hpclskh.HP.COM (01/13/89)

To this point, no one has mentioned APL (and offshoots), which can dynamically
expand any array dimension at any time.
The procedure invocation dynamic arrays are much more common.  BASIC, PL/1,
and versions of ALGOL have had them for years.  They are more expensive,
especially if range checking code is produced (as even constant references
must be checked).