[comp.lang.c] Novice C question

ajayshah@almaak.usc.edu (Ajay Shah) (04/16/91)

Consider this fragment of C code (from Numerical Recipes):

1	double *dvector(nl,nh)
2	int nl,nh;
3	{
4	        double *v;
5	
6	        v=(double *)malloc((unsigned) (nh-nl+1)*sizeof(double));
7	        if (!v) nrerror("allocation failure in dvector()");
8	        return v-nl;
9	}

It's supposed to be a function which allocates a vector of
doubles.  My interpretation of nl and nh is: they're array
indexes.  If you want to allocate an array going from 5 to 10,
you would say p = dvector(5, 10).

Question: what is happening on line 8?  Why is he not just
returning v (a pointer)?  What is the meaning of subtracting nl
(an int) from v without any casting?

Thanks!

	-ans.

-- 
_______________________________________________________________________________
Ajay Shah, (213)734-3930, ajayshah@usc.edu
                             The more things change, the more they stay insane.
_______________________________________________________________________________

neufeld@aurora.physics.utoronto.ca (Christopher Neufeld) (04/16/91)

In article <31969@usc> ajayshah@almaak.usc.edu (Ajay Shah) writes:
>
>1	double *dvector(nl,nh)
>2	int nl,nh;
>3	{
>4	        double *v;
>5	
>6	        v=(double *)malloc((unsigned) (nh-nl+1)*sizeof(double));
>7	        if (!v) nrerror("allocation failure in dvector()");
>8	        return v-nl;
>9	}
>
>It's supposed to be a function which allocates a vector of
>doubles.  My interpretation of nl and nh is: they're array
>indexes.  If you want to allocate an array going from 5 to 10,
>you would say p = dvector(5, 10).
>
   Yes, that's right. Memory is allocated for p[5] to p[10]. Memory
outside these points is not yours, and will usually crash the machine if
overwritten.

>Question: what is happening on line 8?  Why is he not just
>returning v (a pointer)?  What is the meaning of subtracting nl
>(an int) from v without any casting?
>
   Subtracting or adding an integer to a pointer has a well-defined
meaning. Consider:
  my_typedef *p1, *p2;
where 'my_typedef' is any arbitrary data structure.
If one defines:
p2 = p1 - n;    /* 'n' is an integer  */
Then the pointer 'p2' points in memory to a location earlier than 'p1'
in memory by an amount   n * sizeof(my_typedef)
OK, this looks pointless so far, since that memory might not be
reserved, or might be reserved for something completely different, like
the integer variable 'n'.
   Here's the fun part. When the compiler sees  p1[j] it evaluates it as
*(p1+j). The addition of the integer 'j' means the same thing it meant
above, it advances in memory by an amount   j * sizeof(*p1)

   So, an example. If you have a conventionally defined array, and a
pointer, such as:
float arr[3], *myvec;
and I assign:
myvec = arr - 1;

then the following are true:
myvec[0]  is undefined and dangerous to use, as you step on memory.
myvec[1] == arr[0]
myvec[2] == arr[1]
myvec[3] == arr[2]
arr[3]  is undefined and dangerous to use.

   To answer your question, the code fragment you asked about creates a
pointer 'v' to the beginning of an array whose indices lie between '0'
and 'nh-nl' inclusive. The 'return v-nl;' instruction returns a pointer
whose [nl] entry is the [0] entry of the 'v' just defined.

   I should mention that there is one exception that I know of, and
probably a few other people can provide. The rule is less simple for
multi-dimensional arrays defined in the conventional manner:
float arr[N1][N2][N3];
Now, when the compiler sees  arr[i][j][k] it evaluates:
*(arr + ((i * N2) + j) * N3 +k)
However, if I have:
float ***myarr;
then when the compiler sees myarr[i][j][k] it evaluates:
*(*(*(myarr+i)+j)+k)
Note that this definition takes less CPU usually, and eats a bit more
memory. It's also a bit easier to use in passing to functions because
the function doesn't have to be told the dimensions 'N2' and 'N3', which
it obviously needs for the first definition.

-- 
 Christopher Neufeld....Just a graduate student  | Flash: morning star seen
 neufeld@aurora.physics.utoronto.ca    Ad astra! | in evening! Baffled
 cneufeld@{pnet91,pro-cco}.cts.com               | astronomers: "could mean
 "Don't edit reality for the sake of simplicity" | second coming of Elvis!"

heinhuis@dri.nl (Gustaaf-Jan Heinhuis) (04/16/91)

In article <31969@usc> ajayshah@almaak.usc.edu (Ajay Shah) writes:
>Consider this fragment of C code (from Numerical Recipes):
>
>1	double *dvector(nl,nh)
>2	int nl,nh;
>3	{
>4	        double *v;
>5	
>6	        v=(double *)malloc((unsigned) (nh-nl+1)*sizeof(double));
>7	        if (!v) nrerror("allocation failure in dvector()");
>8	        return v-nl;
>9	}
>
>It's supposed to be a function which allocates a vector of
>doubles.  My interpretation of nl and nh is: they're array
>indexes.  If you want to allocate an array going from 5 to 10,
>you would say p = dvector(5, 10).
>
>Question: what is happening on line 8?  Why is he not just
>returning v (a pointer)?  What is the meaning of subtracting nl
>(an int) from v without any casting?
>
>Thanks!
>
Well, I know why he substracts nl. The fact that you have an array from
5 to 10 implies you also have indexes 0 to 4. The lowest index your
going to use is 5 which is an offset to a base adress. To make this work
the base adress has to 5 units (= nl, unit depends on type of reserved 
memory; here it's double) lower as the adress of the allocated block 
of memory.

You can, however, not use p[0] to p[4] because you didn't alloc any
memory for this part of the array.

Hope this answers your question, Gustaaf-Jan.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++---===To all virgins, thanks for nothing.===---++++++++++++++
          
   ___              GJ Heinhuis student comp. sc. HIO Enschede (Holland)
  /  _)             Final Project at: DataCount, Wierden, Holland
 |  ____ ___        Mail: heinhuis@dri.nl
  \__/|   |         
      |___|         Everybody is entitled to his own	
  |   |   |         opinion, as long as it's mine!
   \_/    |         Not necessarely DataCount's opinion!!

mhcoffin@tolstoy.waterloo.edu (Michael Coffin) (04/16/91)

In article <31969@usc> ajayshah@almaak.usc.edu (Ajay Shah) writes:
>Consider this fragment of C code (from Numerical Recipes):
>
>1	double *dvector(nl,nh)
>2	int nl,nh;
>3	{
>4	        double *v;
>5	
>6	        v=(double *)malloc((unsigned) (nh-nl+1)*sizeof(double));
>7	        if (!v) nrerror("allocation failure in dvector()");
>8	        return v-nl;
>9	}
>
>It's supposed to be a function which allocates a vector of
>doubles.  My interpretation of nl and nh is: they're array
>indexes.  If you want to allocate an array going from 5 to 10,
>you would say p = dvector(5, 10).
>
>Question: what is happening on line 8?  Why is he not just
>returning v (a pointer)?  What is the meaning of subtracting nl
>(an int) from v without any casting?

The intention is to return a pointer to a phantom array location nl
spaces to the "left" of the leftmost space actually allocated, so that
after calling, for example, 
    d = dvector(2,4)
d[2],d[3], and d[4] would be within the space actually allocated.

Although this will work with many C compilers, the ANSI standard does
not give its blessing.  A compiler is entitled to produce code
that issues an error message or dumps core when it evaluates the
expression "v-nl" for positive nl.

Michael Coffin				mhcoffin@watmsg.waterloo.edu
Dept. of Computer Science		office: (519) 885-1211
University of Waterloo			home:   (519) 725-5516
Waterloo, Ontario, Canada N2L 3G1

gwyn@smoke.brl.mil (Doug Gwyn) (04/17/91)

In article <31969@usc> ajayshah@almaak.usc.edu (Ajay Shah) writes:
>Consider this fragment of C code (from Numerical Recipes):

Ugh, but okay.

>1	double *dvector(nl,nh)
>2	int nl,nh;
>3	{
>4	        double *v;
>6	        v=(double *)malloc((unsigned) (nh-nl+1)*sizeof(double));
>7	        if (!v) nrerror("allocation failure in dvector()");
>8	        return v-nl;
>9	}
>It's supposed to be a function which allocates a vector of doubles.
>My interpretation of nl and nh is: they're array indexes.

They're supposed to be the minimum and maximum indices that will be
used to access the vector after it is allocated.

>Question: what is happening on line 8?  Why is he not just
>returning v (a pointer)?  What is the meaning of subtracting nl
>(an int) from v without any casting?

The intention is to be able to use dvector() like this:
	double *v = dvector(5,10);
	v[5] = 42.0;
	/* ... */
	v[10] = 3.1416;
The returned pointer is supposed to be offset so that v[5] will
access the first location in the object allocated by malloc().
Pointer_to_thing plus or minus some integer is an expression that
has type pointer_to_thing and a value that points to that integer
number of "things" away from the original pointer.

Unfortunately, v-nl does not necessarily point to anything valid
(who knows what lurks below the address returned by malloc()?),
so this function may fail (most likely catastrophically) on many
C implementations.

The Numerical Recipes C code is terrible; this is just one example.

torek@elf.ee.lbl.gov (Chris Torek) (04/17/91)

>In article <31969@usc> ajayshah@almaak.usc.edu (Ajay Shah) writes:
[quoting from Numerical Recipes in C]
>>1	double *dvector(nl,nh)
>>2	int nl,nh;
>>3	{
>>4	        double *v;
>>5	
>>6	        v=(double *)malloc((unsigned) (nh-nl+1)*sizeof(double));
>>7	        if (!v) nrerror("allocation failure in dvector()");
>>8	        return v-nl;
>>9	}
>>
>>It's supposed to be a function which allocates a vector of doubles.

(It *is* such a function.)

>>My interpretation of nl and nh is: they're array indexes.

A better word here is `bounds':

>>If you want to allocate an array going from 5 to 10, you would say
>>p = dvector(5, 10).

The result of this call is not portable (dvector is portable only when
nl is nonpositive, nh is nonnegative, and nl is less than or equal to nh).

One way to think about this is to imagine that the malloc() call
allocates a `true' array (as opposed to reality, in which malloc
merely obtains a suitably malleable `blob of memory'):

	extern double array[nh - nl + 1];
	v = &array[0];

This partially-equivalent (but not legal C) assignment makes `a' point
to the first of nh-nl+1 doubles.  The C language then allows you to
point to and use any of those doubles; it also allows you to point to
the (single) nonexistent double at the end (&v[nh-nl+1+1]).  Thus,
v[0]..v[nh-nl+1] are all legal `double's, and v[nh-nl+1+1] is not a
double but is nonetheless addressible.  NO OTHER ELEMENTS ARE ADDRESSIBLE.
You may not compute &v[-1] in portable code.

Thus, in the actual assignment, the same thing happens.  v points to
the first of nh-nl+1 `double's, and v[0]..v[nh-nl+1] are all legal
`double's, and in addition you may compute &v[nh-nl+1+1].

So what does the `return' expression `v - nl' compute?

If nl is 0, this is just `v'.  If nl is negative, this adds -nl
`double's to v (skips forward -nl doubles---e.g., if nl is -3, this
computes &v[3]).  But if nl is positive, this tries to compute &v[<some
negative number>].  This is illegal, and the result is undefined
(perhaps the computer turns into a frog).  On many machines the result
is what you would expect (v simply points somewhere `below' the first
double, such that v[nl] is the first one), which is why Numerical
Recipes in C gets away with it.  Portable code should not count on
this.

In article <1991Apr16.033331.5408@helios.physics.utoronto.ca>
neufeld@aurora.physics.utoronto.ca (Christopher Neufeld) writes:
>... When the compiler sees  p1[j] it evaluates it as *(p1+j).
>The addition of the integer 'j' means the same thing it meant
>above, it advances in memory by an amount   j * sizeof(*p1)

More simply, it advances by `j' objects, each of which is whatever *p1
is.  On a typical byte-addressed machine, `j objects' and `j * sizeof
*p1 bytes' give the same machine-internal-offset-value, but not all
machines are byte-addressed.

>   I should mention that there is one exception that I know of, and
>probably a few other people can provide. The rule is less simple for
>multi-dimensional arrays defined in the conventional manner:
>float arr[N1][N2][N3];
>Now, when the compiler sees  arr[i][j][k] it evaluates:
>*(arr + ((i * N2) + j) * N3 +k)
>However, if I have:
>float ***myarr;
>then when the compiler sees myarr[i][j][k] it evaluates:
>*(*(*(myarr+i)+j)+k)

Actually, in the `virtual machine', the operation of both is the same:
arr[i][j][k] means `add i to arr, indirect, add j to that, indirect,
add k to that, indirect'.  This works because `arr' is a collection%
of `array N2 of array N3 of float's; adding i moves forward by i such
objects.  If you have a byte-addressed machine---you probably do,
and you probably know about it if you do not---this is indeed the
same as moving forward i*N2*N3*sizeof(float) bytes.  If not, it is
something else.
-----
% In fact, `arr' is a collection of exactly N1 objects, and i must be
  in the half-open interval [0..N1).  If not, all bets are off (better
  find someone who knows who to turn frogs back into computers).
-----

This repeats for j and k, but the objects differ: the result of
`arr[i]' is a collection of `array N3 of float's, and the result
of `arr[i][j]' is a collection of `float's.  Indeed, arr[i] is
a collection of N2 such arrays, and arr[i][j] is a collection of
N3 floats.

myarr[i][j][k] means the same thing: `add i to myarr, indirect, add j
to that, indirect, add k to that, indirect'.  This works because
`myarr' points to the first of a collection of `pointer to pointer to
float's; adding i moves forward by i such objects.  If you have a
byte-addressed machine, this is the same as moving forward
i*sizeof(float **) bytes.  If not, it is something else.

This repeats for j and k.  In each case the number of objects is not
specified (myarr may point to the first of zero or more `float **'s;
myarr[i] may point to the first of zero or more `float *'s; and
myarr[i][j] may point to the first of zero or more `float's), but
whatever each actual number is, i, j, and k had better be less than
each.

>Note that this definition takes less CPU usually, and eats a bit more
>memory. It's also a bit easier to use in passing to functions because
>the function doesn't have to be told the dimensions 'N2' and 'N3', which
>it obviously needs for the first definition.

Right, although the `less CPU' is hard to define; `usually' may be
overstating the case.  A great deal depends on both the compiler and
the machine.
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427)
Berkeley, CA		Domain:	torek@ee.lbl.gov