[comp.sys.mac.programmer] THINK C Array Design Flaw

pepke@loligo (Eric Pepke) (05/04/89)

First of all, THINK C is great.  It is a wonderful development environment,
the compiler is fast, the inline assembler works well, and the code is
adequate for most purposes.  However, there is a major design flaw regarding
the use of arrays larger than 32K in version 3.0.

In one of my programs I need to use an array of 32*32*32 short integers,
which is 64K long.  The segment loader problem doesn't bother me.  I am
perfectly happy allocating the memory, passing a pointer into a function,
and declaring it as I want it in the function.  So, I tried something like

AddHist(theHist, r, g, b)
short theHist[32][32][32];

which caused a compiler error, in spite of the fact that the function does
not have to define the array at all.  So, I thought that somebody had just
put the test in the wrong place in the compiler, and I tried

AddHist(theHist, r, g, b)
short theHist[][32][32];

which is perfectly legal C.  It compiled without a complaint.  Fortunately,
I am paranoid, so I checked to see what code it generated.  The code in this
case was correct, but it is generally incorrect when a dimension of the
array is other than a power of two. 

Consider these two functions:

short Access7(array, a, b)
short array[7][7];
short a, b;
{
    register short result;
    result = array[a][b];
}

Access4(array, a, b)
short array[4][4];
short a, b;
{
    register short result;
    result = array[a][b];
}

Here is the code produced:

  ACCESS7
     +0000  00196856   LINK       A6,#$0000          | 4E56 0000 
     +0004  0019685A   MOVE.L     D7,-(A7)           | 2F07 
     +0006  0019685C   MOVE.W     $000C(A6),D0       | 302E 000C 
     +000A  00196860   MULS.W     #$000E,D0          | C1FC 000E 
     +000E  00196864   ADD.L      $0008(A6),D0       | D0AE 0008 
     +0012  00196868   MOVE.W     $000E(A6),D1       | 322E 000E 
     +0016  0019686C   EXT.L      D1                 | 48C1 
     +0018  0019686E   ADD.L      D1,D1              | D281 
     +001A  00196870   ADD.L      D1,D0              | D081 
     +001C  00196872   MOVEA.L    D0,A0              | 2040 
     +001E  00196874   MOVE.W     (A0),D7            | 3E10 
     +0020  00196876   MOVE.L     (A7)+,D7           | 2E1F 
     +0022  00196878   UNLK       A6                 | 4E5E 
     +0024  0019687A   RTS                           | 4E75 

  ACCESS4
     +0000  00196884   LINK       A6,#$0000          | 4E56 0000 
     +0004  00196888   MOVE.L     D7,-(A7)           | 2F07 
     +0006  0019688A   MOVE.W     $000C(A6),D0       | 302E 000C 
     +000A  0019688E   EXT.L      D0                 | 48C0 
     +000C  00196890   ASL.L      #$3,D0             | E780 
     +000E  00196892   ADD.L      $0008(A6),D0       | D0AE 0008 
     +0012  00196896   MOVE.W     $000E(A6),D1       | 322E 000E 
     +0016  0019689A   EXT.L      D1                 | 48C1 
     +0018  0019689C   ADD.L      D1,D1              | D281 
     +001A  0019689E   ADD.L      D1,D0              | D081 
     +001C  001968A0   MOVEA.L    D0,A0              | 2040 
     +001E  001968A2   MOVE.W     (A0),D7            | 3E10 
     +0020  001968A4   MOVE.L     (A7)+,D7           | 2E1F 
     +0022  001968A6   UNLK       A6                 | 4E5E 
     +0024  001968A8   RTS                           | 4E75 

The techniques in both functions generalize to higher order and larger
arrays.  ACCESS4 does an arithmetic shift on a longword, so it can be
expected to work when the array is larger than 32K.  ACCESS7 only does a
single word multiply, so it can be expected to fail on large arrays.  It is
also a *signed* multiply, which makes little sense.

The code from MPW C is much more reasonable.  It does two unsigned word
multiplies, one on the high word and one on the low word of the cumulative
address, and adds the two together.

It would be *trivial* for the THINK people to fix the code generation.  If
they are worried about efficiency, they could install one more check box
into the options dialog.  It would be almost as trivial to fix the parser so
that large arrays could explicitly be declared as function parameters.

As it is, THINK C is useless for my application.  Just because I am
currently using a quantization that is a power of two does not mean that I
always will, and I cannot really trust future releases of THINK C to
generate correct code even for the power of two case.

Eric Pepke                                     ARPA:   pepke@gw.scri.fsu.edu
Supercomputer Computations Research Institute  MFENET: pepke@fsu
Florida State University                       SPAN:   pepke@scri
Tallahassee, FL 32306-4052                     BITNET: pepke@fsu

Disclaimer: My employers seldom even LISTEN to my opinions.
Meta-disclaimer: Any society that needs disclaimers has too many lawyers.

lippin@skippy.berkeley.edu (The Apathist) (05/06/89)

Recently pepke@gw.scri.fsu.edu (Eric Pepke) wrote:
[...]
]In one of my programs I need to use an array of 32*32*32 short integers,
]which is 64K long.  The segment loader problem doesn't bother me.  I am
]perfectly happy allocating the memory, passing a pointer into a function,
]and declaring it as I want it in the function.  So, I tried something like
]
]AddHist(theHist, r, g, b)
]short theHist[32][32][32];
]
]which caused a compiler error, in spite of the fact that the function does
]not have to define the array at all.

Looks like they blew that one.  I'm surprised it hasn't come up
before.

]short Access7(array, a, b)
]short array[7][7];
]short a, b;
]{
]    register short result;
]    result = array[a][b];
]}
[more code deleted]
]The techniques in both functions generalize to higher order and larger
]arrays.  ACCESS4 does an arithmetic shift on a longword, so it can be
]expected to work when the array is larger than 32K.  ACCESS7 only does a
]single word multiply, so it can be expected to fail on large arrays.  It is
]also a *signed* multiply, which makes little sense.

The code generated for Access7 looks correct to me.  The key point is
that the MULS.W instruction produces a long result.  Since the code
needs to multiply a, a *signed* word, by 14, MULS.W is the most
straightforward way to get the right result.  Problems with 32K will
only arise if a>=32K (which it can't be) or if the row size>=32K, in
which case I presume (or at least hope) different code is generated.

					--Tom Lippincott
					  lippin@math.berkeley.edu

		"Foo, you are nothing but a charlatan!"
					--Adventure

pepke@loligo.cc.fsu.edu (Eric Pepke) (05/07/89)

In article <24093@agate.BERKELEY.EDU> lippin@math.berkeley.edu writes:
>The code generated for Access7 looks correct to me.  The key point is
>that the MULS.W instruction produces a long result.  Since the code
>needs to multiply a, a *signed* word, by 14, MULS.W is the most
>straightforward way to get the right result.  Problems with 32K will
>only arise if a>=32K (which it can't be) or if the row size>=32K, in
>which case I presume (or at least hope) different code is generated.

The same code is produced if I define access7 as

short Access7(array, a, b)
short array[][7];

This is perfectly legal C, and it makes no assumptions on how big the
array is.  If you have enough memory, a call to Access7(40000, 2) should
be perfectly reasonable, but it will fail. 

Eric Pepke                                     ARPA:   pepke@gw.scri.fsu.edu
Supercomputer Computations Research Institute  MFENET: pepke@fsu
Florida State University                       SPAN:   pepke@scri
Tallahassee, FL 32306-4052                     BITNET: pepke@fsu
 
Disclaimer: My employers seldom even LISTEN to my opinions.
Meta-disclaimer: Any society that needs disclaimers has too many lawyers.

pepke@loligo.cc.fsu.edu (Eric Pepke) (05/08/89)

In article <696@loligo.cc.fsu.edu> I write:
>This is perfectly legal C, and it makes no assumptions on how big the
>array is.  If you have enough memory, a call to Access7(40000, 2) should
>be perfectly reasonable, but it will fail. 

This statement of mine is nonsense, of course.Such a call is only reasonable 
in a C where int is defined as bigger than 16 bits.  When you use Lightspeed
C, as you would have to declare a and b as long integers.  When you do that, 
you get what looks like correct code.  

Also, my statement that Lightspeed C would generate bad code in some cases
where a syntax check would fail is also wrong.  The multiplier for the first
subscript is determined by dimensions 2 through N, and cases where the
multiplier would be greater than 32767 would be rejected first by the
test for arrays with size greater than 32767.

What is will do is reject certain array declarations needlessly, as in the
case of hist[32][32][32];

Eric Pepke                                     ARPA:   pepke@gw.scri.fsu.edu
Supercomputer Computations Research Institute  MFENET: pepke@fsu
Florida State University                       SPAN:   pepke@scri
Tallahassee, FL 32306-4052                     BITNET: pepke@fsu
 
Disclaimer: My employers seldom even LISTEN to my opinions.
Meta-disclaimer: Any society that needs disclaimers has too many lawyers.

beard@ux3.lbl.gov (Patrick C Beard) (05/11/89)

In article <698@loligo.cc.fsu.edu> pepke@loligo.UUCP (Eric Pepke) writes:
>In article <696@loligo.cc.fsu.edu> I write:
>>This is perfectly legal C, and it makes no assumptions on how big the
>>array is.  If you have enough memory, a call to Access7(40000, 2) should
>>be perfectly reasonable, but it will fail. 
>
>This statement of mine is nonsense, of course.Such a call is only reasonable 
>in a C where int is defined as bigger than 16 bits.  When you use Lightspeed
>C, as you would have to declare a and b as long integers.  When you do that, 
>you get what looks like correct code.  
>
>Also, my statement that Lightspeed C would generate bad code in some cases
>where a syntax check would fail is also wrong.  The multiplier for the first
>subscript is determined by dimensions 2 through N, and cases where the
>multiplier would be greater than 32767 would be rejected first by the
>test for arrays with size greater than 32767.
>
>What is will do is reject certain array declarations needlessly, as in the
>case of hist[32][32][32];
>
>Eric Pepke                                     ARPA:   pepke@gw.scri.fsu.edu
>Supercomputer Computations Research Institute  MFENET: pepke@fsu
>Florida State University                       SPAN:   pepke@scri
>Tallahassee, FL 32306-4052                     BITNET: pepke@fsu
> 
>Disclaimer: My employers seldom even LISTEN to my opinions.
>Meta-disclaimer: Any society that needs disclaimers has too many lawyers.

I'm sorry, but I just couldn't let this discussion go by without making
a comment.  The problem you are discussing, passing static or automatic arrays,
such as foo[40][50] to subroutines with a declaration such as foo[][50] is
doomed to fail not because of a failing in THINK C (formerly LightSpeed C),
but because of an inherent flaw (feature?) in the C programming language.
Namely, the duality between arrays and pointers makes passing arrays of
dimension >= 2 impossible.  The reason is that the compiler assumes that
a declaration of foo[][50] really means *foo[50], an array of pointers.
The array foo[40][50] is really just one long chunk of memory, and an
index into it, such as foo[y][x] is really just foo+40*y+x.  The called
subroutine has no knowledge of the inner dimensions of the array, and so.
has to resort to foo[y][x] meaning *(*(foo+y)+x), or *(foo[y]+x).

The solution to all of this, is to always pass around arrays of pointers,
not true two-dimensional arrays.  Then you can have arrays of dimensions
of as large as you like.

I hope this sheds some light on this discussion.

__________________

   Patrick Beard
  PCBeard@lbl.gov
 BSI, Berkeley, CA
___________________

lippin@ronzoni.berkeley.edu (The Apathist) (05/13/89)

Recently beard@ux3.lbl.gov (Patrick C Beard) wrote:

] I'm sorry, but I just couldn't let this discussion go by without
] making a comment.  The problem you are discussing, passing static or
] automatic arrays, such as foo[40][50] to subroutines with a
] declaration such as foo[][50] is doomed to fail not because of a
] failing in THINK C (formerly LightSpeed C), but because of an inherent
] flaw (feature?) in the C programming language.  Namely, the duality
] between arrays and pointers makes passing arrays of dimension >= 2
] impossible.  The reason is that the compiler assumes that a
] declaration of foo[][50] really means *foo[50], an array of pointers.
] The array foo[40][50] is really just one long chunk of memory, and an
] index into it, such as foo[y][x] is really just foo+40*y+x.  The
] called subroutine has no knowledge of the inner dimensions of the
] array, and so.  has to resort to foo[y][x] meaning *(*(foo+y)+x), or
] *(foo[y]+x).

This isn't really a problem: First, a declaration of foo[][50] is
treated (in most ways) as (*foo)[50], not *foo[50].  That is, as a
pointer to an array rather than as an array of pointers.  Second, to
index into an array foo[40][50], one calculates &foo[y][x] as
&foo[0][0]+50*y+x.

The result is that addressing into an array depends on all dimensions
but the first, and this is recognized in C syntax, by allowing the
first index of a parameter array to be omitted.

Note that this has little bearing on (one of) the original complaints,
that LSC does not allow formal parameter declarations of the form
foo[32][32][32].  While I haven't verified that this is the case, I
have found that Sun cc, Gnu cc, and lint all agree with my belief that
this is a legal declaration.

					--Tom Lippincott
					  lippin@math.berkeley.edu

	"`The problem's all inside your head,' she said to me.
	 `The program's easy if it's done recursively.
	  I'd like to help you in your struggle for a B --
	  There must be fifty ways to write your program.
		Fifty ways to write your program.'"

pepke@loligo.cc.fsu.edu (Eric Pepke) (05/17/89)

In article <2595@helios.ee.lbl.gov> beard@ux3.lbl.gov (Patrick C Beard) writes:
>I'm sorry, but I just couldn't let this discussion go by without making
>a comment.  The problem you are discussing, passing static or automatic arrays,
>such as foo[40][50] to subroutines with a declaration such as foo[][50] is
>doomed to fail not because of a failing in THINK C (formerly LightSpeed C),
>but because of an inherent flaw (feature?) in the C programming language.
>Namely, the duality between arrays and pointers makes passing arrays of
>dimension >= 2 impossible.  The reason is that the compiler assumes that
>a declaration of foo[][50] really means *foo[50], an array of pointers.
>The array foo[40][50] is really just one long chunk of memory, and an
>index into it, such as foo[y][x] is really just foo+40*y+x.  The called
>subroutine has no knowledge of the inner dimensions of the array, and so.
>has to resort to foo[y][x] meaning *(*(foo+y)+x), or *(foo[y]+x).

Not true.  I just tried compiling

ArrayRef(foo, a, b, c)
int foo[][50][50];
int a, b, c;
{
	register int x;
	x = foo[a][b][c];
}

PointerRef(foo, a, b, c)
int *foo[50][50];
int a, b, c;
{
	register int x;
	x = foo[a][b][c];
}

in Lightspeed/THINK C.  They generate different code.  The code is correct
in each case.  ArrayRef indexes one long 3-D array, and PointerRef gets a
pointer to some integers from a 2-D array of pointers and then adds to the
pointer to get the address.  So, it although the function bodies look the
same, the compiler does the intuitively correct thing based on the declaration.

By the way, I have gotten the application working using Lightspeed C, and
it does include a big 3-D array which works fine.  Thank you once again,
Rich and company, for a wonderful development system.  Had I not realized my
my mistake and switched from MPW C I would probably still be waiting for
it to compile.  This was the first time I have used the Palette manager, and 
the speed of your C made exploring its behavior a lot less painful.

Eric Pepke                                     ARPA:   pepke@gw.scri.fsu.edu
Supercomputer Computations Research Institute  MFENET: pepke@fsu
Florida State University                       SPAN:   pepke@scri
Tallahassee, FL 32306-4052                     BITNET: pepke@fsu

Disclaimer: My employers seldom even LISTEN to my opinions.
Meta-disclaimer: Any society that needs disclaimers has too many lawyers.