[net.lang.c] Arcane C hacks?

herndon@umn-cs.UUCP (02/18/86)

  I wrote a mild flame in another newsgroup about this, but
somebody out there might know another way.
  The problem is this:  I'd like to construct a jump table by
putting lots of labels into an array, and then issuing a
statement like

		goto jumptab[i];

Unfortunately, labels are no longer simple integers (and haven't
been for most of a decade now).  They can no longer be type coerced
to integer or pointer-to-integer, nor can other types be coerced
to type label.  Is there any way around this?  Admittedly, for
the static case with a dense list, a switch statement usually
constructs a jump table, but I'd like to construct one on the fly.
  If I just want to execute machine code out of an array ("goto
array_name;" used to be legal C) I can write a simple assembly
language routine of one argument to do it.  Is there a legal way
to do this in pure C any more?  (Yes, one can type coerce the
address of the array to pointer-to-function and call it, but
is it possible to jump to the array?)
  Both of these problems are admittedly things one would not often
do.  They are, however, things one might do in an interpreter.
The former operation may be considered to be "machine independent",
as there is one reasonable, consistent interpretation.  The latter
is much more open to question, as the contents of the array are not
machine independent.  However, if one is to write portable inter-
preters, being able to jump to an array without assembly language
help would be a plus.  (Individual pseudo-machine operations can
then be relegated to data tables, leaving the C code machine
independent.)


					Robert Herndon
					...!{ihnp4|stolaf}!umn-cs!herndon

hfavr@mtuxo.UUCP (a.reed) (02/19/86)

Robert Herndon writes:
>   I wrote a mild flame in another newsgroup about this, but
> somebody out there might know another way.
>   The problem is this:  I'd like to construct a jump table by
> putting lots of labels into an array, and then issuing a
> statement like
> 		goto jumptab[i];

Such a table is automatically built by the compiler when you do
	switch (i) {	case 1: goto label_a;
			case 2: goto label_b;	} /* etc. */
This can do everything you need, and gives you the benefit of
readable labels.
			Adam Reed (ihnp4!npois!adam)

bowles@cbosgd.UUCP (Jeff Bowles) (02/19/86)

(Gee, wasn't the article I'm replying to supposed to be here, too?)

Well, anyway, the article I'm following up is one where a programmer
asks why:
	int i;
	...
	goto labels[i];	/* "levels" contains pointers to, well,
			 * you get it...
			 */
went away from C. [I guess it used to be valid. Hmmmm.]

If you feel the need to do something like this, you probably want
to just use a structure:
	struct bletch {
		int	l_token;	/* If you see this, */
		void	(*l_whither)();	/* then call this routine */
	};
Then you can just do a table lookup, calling the appropriate
routines for each item.

Or, :-), you can use the "computed goto" that C provides:
	switch (i) {
		case 0:
			...
			break;
		case 1:
		...
		}
The "table lookup" that C provides will usually do what you
need, with minimal effort. If you're translating an assembler
program and need to "jump to the location specified by the N'th
element of this array", I suggest finding a BCPL compiler.

	Jeff Bowles
	Lisle, IL

ark@alice.UucP (Andrew Koenig) (02/19/86)

In "pure C," the only thing you can do with a label is
use it as the subject of a goto.  The closest you can come
is to call an element of an array rather than jumping to
it; the array must then be an array of function pointers.

bzs@bu-cs.UUCP (Barry Shein) (02/20/86)

Re: what to do now that goto jumptab[x] is gone?

Well, the obvious solution is a switch statement but that does not
fulfill all your requirements and you probably rejected that.

The way I do such things is to use an array of pointers to functions.
In your example of jumping to on-the-fly generated code I suspect that
is really what you are saying: On-the-fly generated functions, having
an environment around the generated code is certainly not a bad thing
and we assume you would like to come back sometime. Generating the
prologue and epilogue to make the generated code a function should be
more or less trivial as, besides stack offsets for autos, it's a cliche.

So, the answer would be to declare a table something like:

int (*funtable[MAXFNS])() ;	/* did I get that right? array of pointers
				   to functions returning int */

and just malloc the storage for the generated code. Obviously the return
value needn't be int. I can't think of any reason off hand why this isn't
powerful enough for what you propose. It should be quite portable (code
generator aside) and is legal C.

	-Barry Shein, Boston University

kwh@bentley.UUCP (KW Heuer) (02/23/86)

In articles <184@bu-cs.UUCP> bu-cs!bzs (Barry Shein) writes:
>declare a table something like:   int (*funtable[MAXFNS])() ;
>and just malloc the storage for the generated code.  ...  It should be
>quite portable (code generator aside) and is legal C.

Well, some compilers will dislike the attempt to cast a (char *) into
a (int (*)()) ; in fact I think some will call it an outright error
(not just a warning).  But in any case it is _not_ portable to the 3b2,
because all programs are pure -- you can't goto/call data space, nor
can you read from the instruction stream.  Some sort of chastity belt
in the hardware, I think.

herndon@umn-cs.UUCP (02/23/86)

[Here, bugs, bugs, bugs!  Here, bu

  Hmmm.  Apparently my original posting wasn't too clear.
Many responses were sent telling me that I should just use
a switch statement, or an array of pointers to functions.
Somebody else mailed me a note telling me I should not
put code into an array, but should use an assembler.
My original note explicitly mentioned the possibilities
of using both switch statements and pointers to functions,
and I've had to make do with these options.  Sigh.
  Let me restate my problem.  Suppose I have an interpreter,
which accepts input from a user.  Something like a BASIC
(Ugh!) or Lisp interpreter/compiler.  I wish to convert a
statement that the user enters into machine code, and be
able to execute that machine code, RIGHT THEN(!).  (I
certainly don't wish to have to call an assembler and a
loader.)  This is perhaps a iffy operation, since some
machines will not allow the execution of data.
  Now, it is certainly not too difficult to generate my
machine code and stick it into an array somewhere.  If I
could simply jump to it, I'd be very happy.  This I can
do by creating an assembly language procedure of one
argument which jumps to the address given as the argument.

1)  Can I do this without the assembly language help?

  As a second alternative, I can put my machine code into
an array, place the address of that array into a union as
an integer, and CALL (not jump to!) the array by pulling
the address out of the union as a pointer to a function.
This is somewhat ugly, since I don't know what size a
code address is, and C will NOT let me type cast an address
into a pointer to a function.  Therefore this CODE construct
is not portable.  (As I noted in my original article, I
can generate code from machine-dependent DATA tables by
using ifdefs and includes, but I'd like machine independent
CODE.)  Further, many machines (for instance, the VAX)
insist on particular prologues and epilogues for procedures
which I have no interest in and do not wish to generate
code for.

2)  Is there a machine independent way to coerce non-pointer-
    to-function values to pointer-to-function values?

  As a third alternative, definitely the least desirable
from my particular perspective, is to do the whole thing
a "proper way".  I should generate nice intermediate code,
stuff it into an array, and then write a routine to interpret
the intermediate code.  Presumably then I can use the switch
statement everyone recommends to generate the jump-tables to
get to the code to interpret my intermediate code.  Slow.
And I can't add new intermediate-opcodes without recompiling.
  The fourth alternative (another "proper way") is to generate
arrays of pointers to functions for code, where the pointers
point to real, live C functions.  Then, by stepping through
the arrays and calling each function pointed to, I can
indirectly interpret my code. (Something like a forth
interpreter.)  Again, I can't add new intermediate-opcodes
without recompiling.  Sigh.
  Oh, well, it was a hack anyhow.  It was something that
used to be possible and had occasional application, and then
was rudely snatched away by "improvements" to C.  I think it
predates the existence of K&R's book.


				Robert Herndon

guy@sun.uucp (Guy Harris) (02/23/86)

> >declare a table something like:   int (*funtable[MAXFNS])() ;
> >and just malloc the storage for the generated code.  ...  It should be
> >quite portable (code generator aside) and is legal C.
> 
> Well, some compilers will dislike the attempt to cast a (char *) into
> a (int (*)()) ; in fact I think some will call it an outright error
> (not just a warning).  But in any case it is _not_ portable to the 3b2,
> because all programs are pure -- you can't goto/call data space, nor
> can you read from the instruction stream.

3B2, hell, that goes all the way back to separate I&D space on the PDP-11.
It is quite unportable, and "lint" will justifiably complain about it,
warning of a "questionable conversion of function pointer" (even if you're
converting another kind of pointer *into* a function pointer, but that's
life during UNIX).

Then again, if they're generating code on the fly, it's not going to be very
portable anyway, so if you're doing this sort of thing worrying about
whether the pointer conversion is portable is kind of silly.  (If you really
MUST do this sort of thing, you can probably get the OS to help by providing
a call to convert a data segment into a code segment.)
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.arpa	(yes, really)

jph@houxf.UUCP (J.HARKINS) (02/26/86)

> In articles <184@bu-cs.UUCP> bu-cs!bzs (Barry Shein) writes:
> >declare a table something like:   int (*funtable[MAXFNS])() ;
> >and just malloc the storage for the generated code.  ...  It should be
> >quite portable (code generator aside) and is legal C.
> 
> Well, some compilers will dislike the attempt to cast a (char *) into
> a (int (*)()) ; in fact I think some will call it an outright error

Huh???

This line of code DOES NOT cast a char * into an int. It is declaring
that funtable is an array of MAXFNS elements, each of which is a pointer to 
a function that returns a int value.

> (not just a warning).  But in any case it is _not_ portable to the 3b2,

BOLDERDASH!!!

I have programs that use pointers to functions, some that run on 3B2/5's.
The construct is totally(no flames, please) portable.  As a matter of fact
I have used this type of construct to allow emulation of UN*X signal
processing on a non UN*X operating system that only allowed one routine
to be specified for all signals. 

> because all programs are pure -- you can't goto/call data space, nor
> can you read from the instruction stream.  Some sort of chastity belt
> in the hardware, I think.

Whazat??

MOST(not all) programs are pure in this environment, yes.  But that has
nothing to do with being able to use a pointer to a function.  The code
that is executed is actually in the shared text region; it is only the
pointer to the function that is in the data area.

-------
Disclaimer: I hereby disclaim all my debts.
------

Jack Harkins @ AT&T Bell Labs
Princeton Information
(201) 949-3618
(201) 561-3370
houxf!jph

kwh@bentley.UUCP (KW Heuer) (02/26/86)

[ bu-cs!bzs (Barry Shein) ]
>> >declare a table something like:   int (*funtable[MAXFNS])() ;
>> >and just malloc the storage for the generated code.  ...  It should be
>> >quite portable (code generator aside) and is legal C.

[ bentley!kwh (Karl Heuer) ]
>> Well, some compilers will dislike the attempt to cast a (char *) into
>> a (int (*)()) ; in fact I think some will call it an outright error

[ houxf!jph (Jack Harkins) ]
>Huh???  This line of code DOES NOT cast a char * into an int....
>I have programs that use pointers to functions, some that run on 3B2/5's.

Sorry, you seem to have lost the context.  The original poster wanted to
malloc space for the CODE ITSELF, not the pointer table; i.e. do something
like
	funtable[0] = (int (*)())malloc(codesize);
and this line _does_ cast a (char *) (which is what malloc() returns)
into a function pointer.  (Actually a more likely sequence is
	char *s = malloc(codesize);
	s[0] = CLRW;  s[1] = R0;
	funtable[0] = (int (*)())s;
	n = (*funtable[0])();
or something like that.)

[ bentley!kwh (Karl Heuer) ]
>> because all programs are pure -- you can't goto/call data space, nor
>> can you read from the instruction stream.  Some sort of chastity belt
>> in the hardware, I think.

[ houxf!jph (Jack Harkins) ]
>Whazat??
>
>MOST(not all) programs are pure in this environment, yes.  But that has
>nothing to do with being able to use a pointer to a function.  The code
>that is executed is actually in the shared text region; it is only the
>pointer to the function that is in the data area.

When I said "all programs are pure", I meant that on the 3b2 it is _not_
_possible_ to write an impure program (as far as I can determine).  The
code fragment above can be made to work on a VAX (even without "ld -N"),
but on the 3b2 it dies with a bus error.  I hope I've cleared this up.

To any would-be flamers: the alignment is appropriate; the bus error
occurs on the CALL instruction; don't flame me about what I'm "clearly"
doing wrong unless you can demonstrate a way to do it right.  On a 3b2.
I've already checked things pretty carefully, including the source code
in the kernel.

chris@umcp-cs.UUCP (Chris Torek) (02/27/86)

In article <1067@houxf.UUCP> jph@houxf.UUCP (Jack Harkins)

 . . . responds to the following from Barry Shein:

>>declare a table something like:   int (*funtable[MAXFNS])() ;
>>and just malloc the storage for the generated code.  ...  It should be
>>quite portable (code generator aside) and is legal C.
> 
>Well, some compilers will dislike the attempt to cast a (char *) into
>a (int (*)()) ; in fact I think some will call it an outright error

Jack Harkins says:

>This line of code DOES NOT cast a char * into an int. It is declaring
>that funtable is an array of MAXFNS elements, each of which is a pointer
>to a function that returns a int value.

You are both right.  It is obvious that the line Jack refers to is:

	int (*funtable[MAXFNS])();

while the code Barry refers to is:

	char *malloc();

	funtable[n] = (int (*)()) malloc(codesize);

(which does not appear in the quoted text, but is implied nonetheless.)

So when Barry says:

> it is _not_ portable to the 3b2,

he is correct: you cannot invoke the allocated function without
turning it into `code' first, for the hardware will not execute
`data'; and when Jack says:

>I have programs that use pointers to functions, some that run on 3B2/5's.
>The construct is totally(no flames, please) portable.

he is also correct: pointers to functions are portable.  It is this
specific usage---allocate data, fill with code, call data area as
function---that is not.

Hoping this will forestall further confusion,
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1415)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu

aaz@pucc-j (Marc Mengel) (02/27/86)

	As far as I know, the following code is legal, and it works on
    all the machines I have ever used.  It is not neccesarily portable
    everywhere, since some machines may not like executing in the
    data segment, but then again, if you are putting machine code in
    an array and executing it, it isn't portable code in *any* case.

char foo[BIGNUM];
main()
{
        int result;

	/* code to put machine code into foo[] */

	result = (* (int (*)()) foo)();
}
-- 
					Marc Mengel
Uucp: { decvax, icalqa, ihnp4, inuxc, sequent, uiucdcs  }!pur-ee!pucc-j!aaz
     { decwrl, hplabs, icase, psuvax1, siemens, ucbvax }!purdue!pucc-j!aaz

USnail: 910 N. 9th street
	Lafayette IN 47904

larry@cca.UUCP (Laurence Schmitt) (02/28/86)

> 
> 2)  Is there a machine independent way to coerce non-pointer-
>     to-function values to pointer-to-function values?
> 

Considering that the program in question would be generating machine code
on the fly, an extremely machine *dependent* operation, it seems curious to
complain that the jump operation itself cannot be made machine independent! :-)

-- 
Larry Schmitt 			Computer Corporation of America
larry@cca 			4 Cambridge Center
decvax!cca!larry 		Cambridge, MA 02142
				(617)-492-8860

kwh@bentley.UUCP (KW Heuer) (02/28/86)

This is a re-post; my apologies if you get it twice.

[ bu-cs!bzs (Barry Shein) ]
>> >declare a table something like:   int (*funtable[MAXFNS])() ;
>> >and just malloc the storage for the generated code.  ...  It should be
>> >quite portable (code generator aside) and is legal C.

[ bentley!kwh (Karl Heuer) ]
>> Well, some compilers will dislike the attempt to cast a (char *) into
>> a (int (*)()) ; in fact I think some will call it an outright error

[ houxf!jph (Jack Harkins) ]
>Huh???  This line of code DOES NOT cast a char * into an int....
>I have programs that use pointers to functions, some that run on 3B2/5's.

Sorry, you seem to have lost the context.  The original poster wanted to
malloc space for the CODE ITSELF, not the pointer table; i.e. do something
like
	funtable[0] = (int (*)())malloc(codesize);
and this line _does_ cast a (char *) (which is what malloc() returns)
into a function pointer.  (Actually a more likely sequence is
	char *s = malloc(codesize);
	s[0] = CLRW;  s[1] = R0;
	funtable[0] = (int (*)())s;
	n = (*funtable[0])();
or something like that.)

[ bentley!kwh (Karl Heuer) ]
>> because all programs are pure -- you can't goto/call data space, nor
>> can you read from the instruction stream.  Some sort of chastity belt
>> in the hardware, I think.

[ houxf!jph (Jack Harkins) ]
>Whazat??
>
>MOST(not all) programs are pure in this environment, yes.  But that has
>nothing to do with being able to use a pointer to a function.  The code
>that is executed is actually in the shared text region; it is only the
>pointer to the function that is in the data area.

When I said "all programs are pure", I meant that on the 3b2 it is _not_
_possible_ to write an impure program (as far as I can determine).  The
code fragment above can be made to work on a VAX (even without "ld -N"),
but on the 3b2 it dies with a bus error.  I hope I've cleared this up.

To any would-be flamers: the alignment is appropriate; the bus error
occurs on the CALL instruction; don't flame me about what I'm "clearly"
doing wrong unless you can demonstrate a way to do it right.  On a 3b2.
I've already checked things pretty carefully, including the source code
in the kernel.

nather@utastro.UUCP (Ed Nather) (03/01/86)

In article <1067@houxf.UUCP>, jph@houxf.UUCP (J.HARKINS) writes:
> 
> BOLDERDASH!!!
> 
It is a thrilling thing to see a new and needed word enter the language.
I assume it means the dash has been overstruck ...

-- 
Ed Nather
Astronomy Dept, U of Texas @ Austin
{allegra,ihnp4}!{noao,ut-sally}!utastro!nather
nather@astro.UTEXAS.EDU

henry@utzoo.UUCP (Henry Spencer) (03/02/86)

> 2)  Is there a machine independent way to coerce non-pointer-
>     to-function values to pointer-to-function values?

No, because on some machines they are very different animals:  a pointer
to a function is not necessarily a pointer to the bytes comprising its
code.  Some machines want a rather more elaborate structure, in which a
pointer to a function is a module identifier and a function-within-module
identifier, and there is extra information somewhere that allows the
machine to interpret this.  Which leads to...

> ...Further, many machines (for instance, the VAX)
> insist on particular prologues and epilogues for procedures
> which I have no interest in and do not wish to generate
> code for.

If you want to treat something as a function, you *must* observe the
conventions that your machine (and your compiler) want to see.  There
is no portable way around this.  In fact, there's no entirely portable
way to do what you want at all, because the basic nature of the conventions
(never mind the details of them!) is machine-dependent.  For example,
machines that use a module+function form of function pointer will need
some sort of module dictionary somewhere, which you're going to have to
build.  Some machines won't let you do what you want at all, in fact,
because on them, code is code and data is data and never the twain shall
meet.  (Examples:  a pdp11 running split-space; a segmented machine that
makes a distinction between code and data segments.)
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

jph@houxf.UUCP (J.HARKINS) (03/02/86)

>> 
>> BOLDERDASH!!!
>> 
>It is a thrilling thing to see a new and needed word enter the language.
>I assume it means the dash has been overstruck ...
>
>-- 
>Ed Nather
>Astronomy Dept, U of Texas @ Austin
>{allegra,ihnp4}!{noao,ut-sally}!utastro!nather
>nather@astro.UTEXAS.EDU

Actually, it refers to spur of the moment sporting events held regularly
in areas subject to rockslides:-)
-------
Disclaimer: I hereby disclaim all my debts.
------
Jack Harkins @ AT&T Bell Labs
Princeton Information
(201) 949-3618
(201) 561-3370
houxf!jph

jph@houxf.UUCP (J.HARKINS) (03/02/86)

Sorry for the delay, haven't read news in a week ...

>and just malloc the storage for the generated code.  ...  It should be

This is the line I missed in the original posting, thus the motivation
for my reply to Karl Heuer's posting was off the mark.  Sorry about that Karl,
and everyone else who has charred the Recieve Data line on my modem for it.

>Huh???  This line of code DOES NOT cast a char * into an int....

I should have said (int (*))() instead of int,  and was referring to the
declaration, not the attempt to assign code to a variable.  As I said above, 
I conviently missed the orignal point, trying to dynamically generate code,
which is indeed illegal for seperate I&D.

So, in the words of Gilda Radner: "Never mind".

Consider this posting my reply to future flames on my original reply.
------
Disclaimer: I hereby disclaim all my debts.
------
Jack Harkins @ AT&T Bell Labs
Princeton Information
(201) 949-3618
(201) 561-3370
houxf!jph

gwyn@BRL.ARPA (VLD/VMB) (03/03/86)

I can think of two legal ways to implement the
	goto jumptab[i];
idea in C, assuming you need the flexibility of reassigning
the destinations for each i:

(1)  Use an array of jmpbufs, do something to get them initialized
using setjmp (that's the hard part), and longjmp to the correct
jmpbuf array member.

(2)  Use an array of function pointers, initialize them as desired,
and call via the appropriate function array member (watch out that
you don't keep recursing deeper and deeper; it's probably best to
have a common return from the functions).

I suspect that if we knew your intended application, better
solutions would be possible.  Just what do you think you need
a jump table for?