[comp.lang.c] address of structure == address of first member?

carlp@iscuva.ISCS.COM (Carl Paukstis) (11/22/88)

In my H&S, 5.7.3 (p. 108) says "... C compilers are constrained to assign
components increasing memory addresses in a strict left-to-right order,
with the first component starting at the beginning address of the structure
itself."

Fine. I want to write a general-purpose search function.  I have several
arrays of (different) structures; each structure contains a pointer to a char
string identifier as its first member.  The search function takes a
character (string) pointer to a search argument, a pointer to one of the
arrays of structures, the size the structure entry, and the number of
entries in the table.  It finds the table entry containing the (pointer to)
string specified, and returns a pointer to the matching table entry.

By H&S, that part of the code which gets the pointer to the string to do
the compare must work.  And the following compiles and runs clean under
gcc (1.30) under Ultrix and another (semi-)ANSI(draft)compliant compiler
on a SYS V system.  What other portability problems will I run into with
this code?  How can I fix them?

 1	#include <stdio.h>
 2
 3	struct mumble {
 4	    char *dopey;
 5	    float sneezy;
 6	    int doc;
 7	    };
 8
 9	struct fumble {
10	    char *bashful;
11	    char grumpy;
12	    float sleepy;
13	    };
14
15	const struct mumble m[10] = {
16	    { "abc", 1.3, 4 },
17	    { "def", 2.5, 2 },
18	    { "ghi", 2.6, 1 },
19	    { "ghj", 2.67, 0 },
20	    { "zzzkl", 9.4, -2 }
21	    };
22
23	const struct fumble f[12] = {
24	    { "annapolis", 'b', 907.4 },
25	    { "brainerd", 'c', 821.0 },
26	    { "collinston", 'd', 1243.6},
27	    { "detroit", 'e', 3.2 },
28	    { "evanston", 'f', 1.1054 },
29	    { "fargo", 'g', 1204.0 },
30	    { "gonzaga", 'z', 3.1415926 },
31	    { "zulu", 'a', 92203.1 }
32	    };
33
34	main()
35	{
36	    void * mybsearch(const char *, void const *, const int, const int);
37	    struct mumble *pm;
38	    struct fumble *pf;
39
40	    pm = (struct mumble *)mybsearch ("ghj",
41	            (void *)m,
42	            sizeof(m)/sizeof(struct mumble),
43	            sizeof(struct mumble));
44
45	    if (0 == pm)
46	        {
47	        printf("Not found in m!\n");
48	        exit(1);
49	        }
50	    printf("Value for ghj is %f\n",pm->sneezy);
51
52	    pf = (struct fumble *)mybsearch ("evanston",
53	            (void *)f,
54	            sizeof(f)/sizeof(struct fumble),
55	            sizeof(struct fumble));
56
57	    if (0 == pf)
58	        {
59	        printf("Not found in f!\n");
60	        exit(1);
61	        }
62	    printf("Value for evanston is %f\n",pf->sleepy);
63
64	    printf("OK, found both!\n");
65	    exit(0);
66	}
67
68	/**
69	**  simple-minded binary search routine:
70	**      Find a table entry which matches the desired string.
71	**      Table entries are assumed to be structures, the first
72	**      member of which is a pointer to a character string key.
73	**/
74
75	void * mybsearch (const char * key,     /* value to match against */
76	                void const * table,     /* first table entry */
77	                const int count,        /* number of table entries */
78	                const int size)         /* size (char's) of each entry */
79	{
80	    int l, m, r;
81	    int code;
82	    for (l = 0, r = count - 1; l <= r;)
83	    {
84	        m = (l + r) / 2;
85
86	        code = strcmp (key, *(char **)((char *)table + (m * size)));
87
88	        if (code == 0)      /* found it! */
89	            return ((void *)((char *)table + (m * size)));
90
91	        if (code < 0)
92	            r = m - 1;      /* search left half */
93	        else
94	            l = m + 1;      /* search right half */
95	    }
96
97	    return ((void *) 0);    /* not found */
98	}
99

Line 86 bothers me.  Is it permissible to cast a void* into a char* and do
arithmetic on it like this?  Line 89 would have the same problem (if any).

On machines (environments?) with non-byte-sized chars, is there a better
way to do the necessary arithmetic?  Or what else have I missed?

-- 
Carl Paukstis    +1 509 927 5600 x5321  |"I met a girl who sang the blues
                                        | and asked her for some happy news
UUCP:     carlp@iscuvc.ISCS.COM         | but she just smiled and turned away"
          ...uunet!iscuvc!carlp         |                    - Don MacLean

gwyn@smoke.BRL.MIL (Doug Gwyn ) (11/22/88)

In article <2172@iscuva.ISCS.COM> carlp@iscuva.ISCS.COM (Carl Paukstis) writes:
>What other portability problems will I run into with this code?
1.  Use of prototypes constrains you to a modern compiler.
2.  You implicitly declare exit() as returning int, which is incorrect.
3.  sizeof results in a size_t but mybsearch() wants an int.  Although the
    type is automatically converted in the scope of the prototype, if you
    tried to use a really large struct the size might not fit.
4.  You really ought to #include <string.h> to declare strcmp().

>86	        code = strcmp (key, *(char **)((char *)table + (m * size)));
>Line 86 bothers me.  Is it permissible to cast a void* into a char* and do
>arithmetic on it like this?  Line 89 would have the same problem (if any).

Line 86 is okay, but you really don't need to cast to a (char**) then
dereference to get the (char*) key.  That's extra work that amounts to
a no-op.

>On machines (environments?) with non-byte-sized chars, is there a better
>way to do the necessary arithmetic?  Or what else have I missed?

That seems about as good as any, if you don't want to tailor individual
versions to specific systems.

carlp@iscuva.ISCS.COM (Carl Paukstis) (11/24/88)

(Maybe I should take this to private mail, but it seems a couple of points
here may be of general interest..)

In article <8954@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>In article <2172@iscuva.ISCS.COM> carlp@iscuva.ISCS.COM (Carl Paukstis) [me]:
>>What other portability problems will I run into with this code?
...
>2.  You implicitly declare exit() as returning int, which is incorrect.

  Hmmm.  Since exit() _doesn't_ return, I never paid any attention to this.
  Call me sloppy, I guess.  I'll have to remember that.

...
>4.  You really ought to #include <string.h> to declare strcmp().

  Oops!  Actually, I thought of this after I nicely formatted the listing,
  and didn't want to mess up the line numbers.  (You believe that? :-)

>>... void * table ...
>
>>86	        code = strcmp (key, *(char **)((char *)table + (m * size)));
>
>Line 86 is okay, but you really don't need to cast to a (char**) then
>dereference to get the (char*) key.  That's extra work that amounts to
>a no-op.

  Huh?  The type of the expression ((char *)table + (m * size)) is
  "pointer to char", no?  And the usage on line 86 requires "pointer to
  pointer to char".  If I dereference the subexpression above, e.g.
  *((char *)table + (m * size)), the result type is "char", not "pointer
  to char" as required by strcmp().  Or am I just being dense?

Anyway, thanks to Doug (and Chris Torek in private mail) who both sort of
confirmed that this code is reasonably portable.
-- 
Carl Paukstis    +1 509 927 5600 x5321  |"The right to be heard does not
                                        | automatically include the right
UUCP:     carlp@iscuvc.ISCS.COM         | to be taken seriously."
          ...uunet!iscuva!carlp         |                  - H. H. Humphrey

gwyn@smoke.BRL.MIL (Doug Gwyn ) (11/24/88)

In article <2176@iscuva.ISCS.COM> carlp@iscuva.ISCS.COM (Carl Paukstis) writes:
->>86	        code = strcmp (key, *(char **)((char *)table + (m * size)));
->Line 86 is okay, but you really don't need to cast to a (char**) then
->dereference to get the (char*) key.  That's extra work that amounts to
->a no-op.
-  Huh?  The type of the expression ((char *)table + (m * size)) is
-  "pointer to char", no?  And the usage on line 86 requires "pointer to
-  pointer to char".  If I dereference the subexpression above, e.g.
-  *((char *)table + (m * size)), the result type is "char", not "pointer
-  to char" as required by strcmp().  Or am I just being dense?

That's not what I said!

You already had a (char *).  You then cast it to (char **), and then
dereferenced that using the * operator.  There was no need for these
extra steps when the original (char *) was just what you needed.
(strcmp takes (char *) arguments.)

chris@mimsy.UUCP (Chris Torek) (11/27/88)

[Amazing that no one else has jumped on this yet.]
>In article <2176@iscuva.ISCS.COM> carlp@iscuva.ISCS.COM (Carl Paukstis) writes:
>>86	        code = strcmp (key, *(char **)((char *)table + (m * size)));
[much edited]

In article <8976@smoke.BRL.MIL> gwyn@smoke.BRL.MIL (Doug Gwyn) suggests:
>You already had a (char *).  You then cast it to (char **), and then
>dereferenced that using the * operator.  There was no need for these
>extra steps when the original (char *) was just what you needed.

But the original `char *' is not what he needs, and this does not compile
into an effective no-op (as you suggested in <8954@smoke.brl.mil>).  If
we write:

	char *p;
	
	p = (char *)table + (m * size);

we have p pointing at (not `to': I am reserving that for a pointer of
the appropriate type) an object of `struct <something>'.  This structure
is defined as:

	struct <something> {
		char	*name;
		...
	};

To be completely type-correct, we should convert `p' into a pointer to
the correct structure type-name, and follow that to its `name' field.
The problem, of course, is that in this general-purpose search routine,
we have no idea *which* structure `p' is standing for [to end the
sentence a preposition with].  There are two alternatives: define a
structure that contains only a name; or cheat.

	struct name { char *name; };

	result = strcmp(key, ((struct name *)p)->name);

or

	result = strcmp(key, *(char **)p);

The former is in some sense superiour, particularly under K&R rules,
where any pair of structures that have an initial common set of names
and types are, in that common part, compatible.  (This rule is left
over from the days before unions, and does not appear in the dpANS.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

gwyn@smoke.BRL.MIL (Doug Gwyn ) (11/27/88)

In article <14725@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>To be completely type-correct, we should convert `p' into a pointer to
>the correct structure type-name, and follow that to its `name' field.
>The problem, of course, is that in this general-purpose search routine,
>we have no idea *which* structure `p' is standing for [to end the
>sentence a preposition with].  There are two alternatives: define a
>structure that contains only a name; or cheat.

Technically it is not necessary to do this via a purely "type correct"
method; there are sufficient requirements imposed by the very latest
proposed C standard to ensure that any addressable data object forms a
linear address subspace within itself, such that (char *)-arithmetic
suffices to correctly address its pieces.  This guarantee was necessary
to make memcpy() etc. usable for their intended purposes.

Cheating is guaranteed to suffice.  A pointer to a structure converted
to (char *) must compare equal to a pointer to the first member of the
structure converted to (char *), and in this particular example casting
that back to (char **) will produce a valid pointer to the first
structure whose type is (char *).  Then * will pick up the contents of
that (char *) for use with strcmp().  (I made a mistake about this in
my previous posting. strcmp() does need the (char *) itself, not just
its address, so the additional *(char**) is necessary.  Sorry.)

I admit to being surprised fairly recently when I found that this
"cheating" was supposed to be officially sanctioned, but it's useful.

The wording about "same representation and alignment" was also fixed
so that one can safely pass a nonnegative int to a function expecting
an unsigned int, or a non-qualified type to a function with parameter
declared as a qualified version of that type, etc.  This (along with
a fix to the definition for fprintf() %u format) seems to complete
the work of making C object representations behave as usefully as
possible without undue burden for implementors on segmented, ones-
complement, etc. architectures.

A further nice thing is that the K&R 2nd Ed. example where a pointer
to a function of a slightly different type than expected was passed
(for a sorting comparison function, I think) is now guaranteed to
actually work the way K&R used it.  This had been bothering me..

throopw@xyzzy.UUCP (Wayne A. Throop) (12/01/88)

>,>>> gwyn@smoke.BRL.MIL (Doug Gwyn )
>>,>>>> carlp@iscuva.ISCS.COM (Carl Paukstis)
>>>>86	        code = strcmp (key, *(char **)((char *)table + (m * size)));
>>>Line 86 is okay, but you really don't need to cast to a (char**) then
>>>dereference to get the (char*) key.  That's extra work that amounts to
>>>a no-op.
>> Huh? [... It sure seems necessary to me ...]
> You already had a (char *).  You then cast it to (char **), and then
> dereferenced that using the * operator.  There was no need for these
> extra steps when the original (char *) was just what you needed.
> (strcmp takes (char *) arguments.)

Oooooh, Doug, you ought to know better.  The stuff you count as
"extra work that amounts to a no-op" is not a no-op at all, but the
dereference of a crucial pointer.  Look at those structure definitions
again.  The first elements of those structures are not (char []), but
(char *).  What is gotten by ((char *)table + (m*size)) is a pointer
to this pointer (but of the wrong type (1)).  The fact that this
pointer to the desired pointer happens to be typed in the same way as
the desired pointer itself is just a coincidence, required for reasons
of doing byte-granular pointer arithmetic in C.

Hence, the cast and dereference is in NO WAY a no-op, and
is correct and necessary for Carl's code to work properly.

The alternate (and more portable) way to do what Carl is trying
to do is to pass in routines to do subscripting and comparison.
But I suspect the porting problems of Carl's code as it is are
miniscule.

(1) To pick nits, that's a pointer to a structure containing the
    desred pointer as its first element, which is what the subject
    of this discussion thread is all about.

--
In Loglan, time flies like an arrow precisely the way fruit flies
like a banana;  there isn't any other way of saying it.
                                --- D.E. Cortesi in Dr Dobbs Journal
-- 
Wayne Throop      <the-known-world>!mcnc!rti!xyzzy!throopw