[comp.lang.c] Variable function names

throopw@xyzzy.UUCP (Wayne A. Throop) (12/08/87)

> rustcat@russell.STANFORD.EDU (Vallury Prabhakar)
> Is there an equivalent in C for the "funcall" utility (in Lisp)?

Not in general, but in the example below, there is hope.

Something to keep in mind if you already know LISP and are trying to
learn C: use pointers.  The things that LISP does naturally and easily
often correspond to the use of pointers.  The problem posed is a case
in point:

> I would like to have a portion of code, that compares a specific string
> with a predefined string, and if true, then call the corresponding function.
> {
>    static char *LIST = {"First", "Second", "Third"};
>    int First(), Second(), Third(),
>    int i;
> 
> 	for (i = 0; i < 3; i ++)	{
> 	  if (strcmp (<input-string>, LIST[i]) == 0)  {
> 		(...statement that can call the corresponding routine...)
> 	  }
> 	}
> }

Easy enough.  You can either have a "parallel" array of pointers to your
funcitons, or (more to my taste) you can have a list of name-function
"dotted pairs" (well, not really dotted pairs, but a struct with two
"cells", much like a cons, but I digress), like so:

int invoke_named_function( name )
    char *name;
{
    int strcmp();
    int First(), Second(), Third();
    static struct { char *name; int (*function)(); } table[] = {
        { "First",      First },
        { "Second",     Second },
        { "Third",      Third }
    };
    int i;
    for( i=0; i<(sizeof(table)/sizeof(table[0])); ++i ){
        if( strcmp( name, table[i].name ) == 0 ){
            return( (*(table[i].function))() );
        }
    }
    return( -1 ); /* or some other error indication */
}

You can simply add as many entries to the table as you like.

Note that the (*(table[i].function))() used to invoke the function can
in some compilers be replaced by a simple table[i].function().

--
Everyone I know is having a more productive crisis than I am.
                                        --- Cathy
-- 
Wayne Throop      <the-known-world>!mcnc!rti!xyzzy!throopw

mcdonald@uxe.cso.uiuc.edu (12/14/87)

>> 
>> Is there an equivalent in C for the "funcall" utility (in Lisp)?
>> I wasn't able to find any information about variable functions that can
>> be called in any of the C manuals.  I'd appreciate any pointers/advice/
>> alternatives on how to go about doing this.
>> 
>> 						-- Vallury

>C will allow you to refer to previously-compiled functions indirectly
>(through function pointers), but the feature you're talking about can
>only be implemented in an interpretive language like Lisp.  Lisp will
>interpret commands as it encounters them, and hence a Lisp object can
>be used both as code and data.  For example, if you define a list in
      (section omitted)

>This will not work in C or in any compiled language such as Pascal,
>FORTRAN, COBOL, etc. etc., as these compile their code and keep their
>data separate from the code.

>Now, you could write, in C, a program that implements an interpreter
>that will take in character strings and execute them per some set of
>rules.  In fact, there are several implementations of Lisp that are
>written in C.  The unix shells are also examples of programs that are
>implemented in C and that are interpretive.


>So the point here is that in C, you can't do exactly what you described, and
>this is due to the fact that C is compiled, not interpreted.

Apparently C does not allow this as a general, required, feature. However,
I learned C with the explicit assumption that, since it allowed both
data and function pointers, one could take an array, write a program that
generated compiled code in that array, cast the address of (the first
byte of) the array to a function pointer, and call that function. 
     I have since learned that on some systems (i.e. 80286 Xenix in anything
except single-segment model) it won't work, and that ANSI C does not
require it to work (A FATAL FLAW). BUT, on all the machines that I regularly
use, including the VAX/VMS, PDP-11/Decus-C, IBM-PC/DOS, and IBM-PC/OS2,
it does work (although on OS2 it requires a minor help from a system call.)
     Just because a language is normally compiled, as C is, does not mean
that you can't write an interpreter in it (for the language of your choice),
or, as I have done, an incremental compiler. It's just that on your
particular system, the system implementer has stupidly prevented you from
executing the code you generate. Any system in which is is impossible to
write something as obvious as an incremental compiler is terminally
brain-damaged. (Actually, in most cases it can be done with the aid of
assembly language.)
     Would someone explain to me why a system would prevent you from doing
this? You couldn't have a TurboPascal or even a Forth! (I wouldn't object
if you had to explicitly declare that a block of memory would be both
data and code.)

Doug McDonald

P.S. Please don't think I've done anything as ambitious as write a complete
language compiler. My little affair is a simple incremental compiler for
arithmetic expressions only. I first tried an interpreter, but the compiler
was faster by about a factor of 75.

kers@otter.HP.COM (Christopher Dollin) (12/15/87)

Duh ...

> This is an example of how, in Lisp, the "mylist" object can be used
> either as code or as data, and again, this is because Lisp is
> interpretive.
> Even in the case of Lisp "compilers", the compiled code has an interpreter
> built in.
>
> So the point here is that in C, you can't do exactly what you described, and
> this is due to the fact that C is compiled, not interpreted.

False. There exists at least one Lisp compiler, namely the Poplog Common Lisp
compiler, which has NO interpreter built it, not any, not even a bit.

[There are probably lots more, but that's the one I use the most].

The reason you can't do it (without work) in C is because C isn't designed
that way. That's all. Compilation has nowt to do with it, although 
INCREMENTALITY (being able to compile before all the text is available) may 
have.


Regards,
Kers                                    | "Why Lisp if you can talk Poperly?"

pardo@uw-june.UUCP (David Keppel) (12/16/87)

[ why don't all machines let us execute code stored in an array ]

Some machines make optimizations about where the executable code and
data live, and trying to execute code from the data region breaks some
of those optimizations.  [OK, not many, but a *few*]


    ;-D on  ("And, as a consolation prize, you get Rice-A-Roni")  Pardo

    "You can do anything you want, if you don't mind slow"

rwa@auvax.UUCP (Ross Alexander) (12/17/87)

In article <3826@uw-june.UUCP>, pardo@uw-june.UUCP (David Keppel) writes:
> [ why don't all machines let us execute code stored in an array ]

Ouch!  Are people _still_ trying to do this sort of stuff?  In my undergrad
days, it was a well-known kluge that Honeywell Fortran (on the 6050) wasn't
too fussy about how you referenced external objects.  This let one declare an
integer array, stuff 'magic constants' == 'machine instructions' into it,
and then call it.  It did have its uses, but the whole idea makes me cringe
in retrospect.  It was poor practise then, and inexcusable now.

--
Ross Alexander @ Athabasca University,
alberta!auvax!rwa

dag@chinet.UUCP (Daniel A. Glasser) (12/17/87)

In article <3826@uw-june.UUCP> pardo@uw-june.UUCP (David Keppel) writes:
>[ why don't all machines let us execute code stored in an array ]
>
>Some machines make optimizations about where the executable code and
>data live, and trying to execute code from the data region breaks some
>of those optimizations.  [OK, not many, but a *few*]
>
[trailing stuff removed.]

To be more specific, many machines have separate instruction and data
address spaces, and unless they are mapped together, you cannot directly
execute the contents of arrays, since all instruction fetches will be
done from the instruction space and the array contents are stored in
the data address space.  The M68000 is capable of this separation, some
PDP-11's support I/D separation, the Intel 8086, Zilog Z8001/3, Intel 8051,
and many other "current" machines have this, but on systems where it is
supported you usually have the choice not to use it.

This particular discussion should move over to comp.arch, since it is
not a language issue.
-- 
					Daniel A. Glasser
					...!ihnp4!chinet!dag
					...!ihnp4!mwc!dag
					...!ihnp4!mwc!gorgon!dag
	One of those things that goes "BUMP!!! (ouch!)" in the night.

karl@haddock.ISC.COM (Karl Heuer) (12/17/87)

In article <47000025@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes:
>[Re generating code at runtime: (*(int (*)())&array[0])()]
>I have since learned that on some systems ... it won't work, and that ANSI C
>does not require it to work (A FATAL FLAW). BUT, on all the machines that I
>regularly use ... it does work

So, it's a FATAL FLAW that ANSI C has to work on machines other than the ones
you regularly use?  Gimme a break.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

mcdonald@uxe.cso.uiuc.edu (12/18/87)

me:
>[Re generating code at runtime: (*(int (*)())&array[0])()]
>I have since learned that on some systems ... it won't work, and that ANSI C
>does not require it to work (A FATAL FLAW). BUT, on all the machines that I
>regularly use ... it does work

Karl W. Z. Heuer:

So, it's a FATAL FLAW that ANSI C has to work on machines other than the ones
you regularly use?  Gimme a break.

Me again:

Yes, I really mean it. I can't see why it cannot be specified to be generally
useful. I realize that, somewhere, there might be a machine that CAN'T,
realy, truly CAN'T, do what I want, but I have never seen one described to 
me. There are machines where the architecture or the operating system
makes it hard, or not the default, but not impossible. The language 
specifiers should put it in the language. Then, if a particular machine
simply can't do it,  their C compiler would be sold with an asterisk *


* due to the stupid blunder we made when we decided on this machine's 
architecture, it is impossible for us allow you to write an incremental
compiler. Therefore, we are unable to produce a completely implemented
compiler. So sorry.


By fatal flaw, I mean that it is fatal to my programs. A substantial
fraction of the programs I have written for the IBM-PC are in fact
dependent on being incremental compilers. It is also fatal to the claim
that C is a general purpose language. It wouldn't matter in special-
purpose languages like Fortran or Cobol. Again, I would be perfectly
happy if you had to declare specifically that you wanted code and
data to be co-accessible. Could you give me an example of a machine
where this would be impossible? (perhaps some hard-wired Lisp processor
?)

Doug McDonald

gwyn@brl-smoke.ARPA (Doug Gwyn ) (12/20/87)

In article <47000027@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes:
>I realize that, somewhere, there might be a machine that CAN'T,

Separation of Instruction and Data space is enforced by a large number of
modern computer systems.  It is generally considered to be a Good Thing,
since it prevents program errors from being quite as disastrous as they
might be were valid code to be overwritten with random data while running.

I won't give you a list of such systems, as there doesn't seem to be any
point in an enumeration.

It is possible to implement a fairly fast interpretive language within
this architectural constraint.  I have implemented a few of these, and
you can find an example in Kernighan & Pike's "The UNIX Programming
Environment" (the chapter on the "hoc" language).

chip@ateng.UUCP (Chip Salzenberg) (12/22/87)

In article <47000027@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes:
>>[Re generating code at runtime: (*(int (*)())&array[0])()]
>Yes, I really mean it. I can't see why it cannot be specified to be generally
>useful. I realize that, somewhere, there might be a machine that CAN'T,
>realy, truly CAN'T, do what I want, but I have never seen one described to 
>me.

The iAPX 286 can't do this in protected mode, which is the only useful
mode for multitasking OS's.  (Executable segments are not writable.)

Look, if you're on a machine where executing data is allowed, you can just
cast the data address to a function pointer and call the result:

	int (*func)();
	char array[100];
	...
	func = (int (*)()) array;
	(*func)();

And if the C compiler won't even do the cast, use a union.  As they say in
Rome:  Non perspiratum ("no sweat").

-- 
Chip Salzenberg         "chip@ateng.UUCP"  or  "{codas,uunet}!ateng!chip"
A T Engineering         My employer's opinions are not mine, but these are.
 "Gentlemen, your work today has been outstanding, and I intend to recommend
    you all for promotion -- in whatever fleet we end up serving."   - JTK

michael@stb.UUCP (Michael) (12/22/87)

(I came in late to this, so pardon any mis-understandings)
In article <47000025@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes:

(a question about refering to data as code)

(A reply saying "Ok in lisp, (where code and data are interchangable) but
not in C (where they are seperate)")

(A reply saying "You can compile code into an array and then execute
from that array")

Unfortunately, this is not the same. In lisp, you can create a data
structure that looks like SOURCE code, and execute it. In C, you have
to write a machine dependent compiler subroutine to compile the code,
and then execute it.

-- 
: Michael Gersten		ihnp4!hermix!ucla-an!remsit!stb!michael
:				sdcrdcf!trwrb!scgvaxd!stb!michael
: "Copy Protection? Just say 'Off site backup'. "

karl@haddock.ISC.COM (Karl Heuer) (12/22/87)

In article <47000027@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes:
>[Re generating code at runtime: (*(int (*)())&array[0])()]
>Yes, I really mean it. I can't see why it cannot be specified to be generally
>useful. ... The language specifiers should put it in the language.  Then, if
>a particular machine simply can't do it, their C compiler would be sold with
>[a disclaimer that it isn't full ANSI].

Certainly it's a useful option when the architecture and the operating system
can support it.  It has already been mentioned that this isn't always the
case, so I won't harp on that.  But even if ANSI were only concerned with such
"reasonable" architectures, it is clearly beyond their jurisdiction to try to
specify anything about the result of converting a data pointer to a code
pointer or vice versa.

However, I really don't think you have anything to worry about.  Most likely,
the C compilers on "reasonable" architectures will continue to support such
intersegment casts as a Common Extension.

There is some analogy with the casting of pointer to integer.  Have you seen
what the dpANS says (and doesn't say) about that?

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

aglew@ccvaxa.UUCP (12/24/87)

..> Executing data.

There are many architectures where executing data cannot be done 
easily. But, since the loader (getxfile) has to read in data and
execute it at some point, a similar facility should be provided 
to the user (without having to play tricks with temporary files).

In this posting, I discuss why casting an array to a function pointer
is NOT the way, I discuss the main architectural impediments,
and I suggest an interface for converting data to
code that might be portable to a lot of architectures (UNIX would
have to be modified).

Casts are not the way
---------------------

Casts, however, are *NOT* the way. Using a cast to function makes it 
much too tempting to do little bit twiddles, like changing an ADD to
a MULTIPLY, while you are executing the code. Almost *NO* modern 
architectures permit this sort of thing to be done safely, without 
some sort of possibly expensive synchronization of the instruction
prefetch buffer and memory. This would create hell for an optimizing 
compiler.

Casts are also inappropriate because there are many architectures
where I and D are separate. You can't make D into I. You probably can,
however, copy between the two.

Finally, casts are inappropriate because they do not indicate HOW MUCH
data is going to be made into code. Knowing how much is important,
because, as mentioned above, systems with separate I and D, where 
movement is permitted between the two, have to do some sort of
synchronization - and the synchronization may be made more efficient
if the amount of data is known (page flush instead of entire cache flush).


Architectural Impediments to Data->Code
---------------------------------------

As discussed above, duty cycle - many architectures cannot execute 
writes into the instruction stream immediately. Some form of synchronization
must be done so that data written can be made into code.

Instructions and data may be truly in different spaces. However it may
be possible to copy between them.

Entry point registry: Advanced architectures may register entry points
for security reasons.



Examples of Applications That Can Use Data->Code Conversion
-----------------------------------------------------------

Incremental compilers: although these are inherently machine dependent,
 	you can isolate much of the dependence in per-machine files.
	It is obviously desirable to be able to compile without going
	through headstands to read the compiled code from a file.
	(A standard routine "Compile converting string to format obviously
	is useful).

Numerical Work: many large numerical packages actually used to compile
	and load parts of their algorithms for efficiency. Interpretation
	isn't even in the ballpark, and even running compiled code 
	with ifs is too expensive...

Overlay Systems: there are still some systems with small address spaces.



Almost-Portable Interface for Data->Code Conversion
---------------------------------------------------

Completely separate I/D spaces
    Data Type
	Since some systems have truly disjoint I/D spaces, it is necessary
	to have a data type that is "uninitialized code".

	Suggested syntax:
		int f()[SIZE]
	where SIZE is in the same units as sizeof(). This is not to imply
	that code is measured in bytes; it is just to facilitate the 
	description of sizes 

	Providing a prototype at declaration time may be appropriate for
	architectures that do entry point control.

    Dynamic Allocation
		funcptr = codealloc(size);
	This loses in that C doesn't have a "mode" type; but, it'll handle
	most architectures.

    Movement Between Data and Code Spaces
		codecpy( (char *)frombuf, (int ()*)tofunc, SIZE)
	(i) Is legal only to correctly sized function buffers. Otherwise
	    undefined.
	(ii) Gives you a locus for doing all the sorts of synchronization
	     that your architecture requires.
	(iii) Identifies tofunc as an entry point to the machine.
	And, obviously, you would want a vice versa function.

    Simple interface
	The above lets you explicitly manage the code address space at a 
	high level. A simpler interface might be:

		funcptr = mkexecutable((char*)buf,size)

	where you basically say that it is not safe to modify buf while
	funcptr may be executing.
		In this case, it would be possible for funcptr to be
        simply a cast, but mkexecutable() might very well manage the I
	address space, allocate, and copy, returning a pointer to the
	newly allocated space.

I think that an interface like this would be portable to many systems.



Andy "Krazy" Glew. Gould CSD-Urbana.    1101 E. University, Urbana, IL 61801   
aglew@mycroft.gould.com    ihnp4!uiucdcs!ccvaxa!aglew    aglew@gswd-vms.arpa
   
My opinions are my own, and are not the opinions of my employer, or any
other organisation. I indicate my company only so that the reader may
account for any possible bias I may have towards our products.

jimp@cognos.uucp (Jim Patterson) (12/28/87)

In article <47000025@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes:
>I learned C with the explicit assumption that, since it allowed both
>data and function pointers, one could take an array, write a program that
>generated compiled code in that array, cast the address of (the first
>byte of) the array to a function pointer, and call that function. 
>     I have since learned that on some systems (i.e. 80286 Xenix in anything
>except single-segment model) it won't work, and that ANSI C does not
>require it to work (A FATAL FLAW).
This is hardly a FATAL FLAW, since it's an unusual application that would
depend on such an ability. It's also a stance that is consistent with
the ANSI C objective of standardizing a version of C that is widely
implementable. Executing code in data space is extremely difficult on
some architectures, for example the HP/3000 which has completely
segregated code and data spaces. 
-- 
Jim Patterson                              Cognos Incorporated
UUCP:decvax!utzoo!dciem!nrcaer!cognos!jimp P.O. BOX 9707    
PHONE:(613)738-1440                        3755 Riverside Drive
                                           Ottawa, Ont  K1G 3Z4

ljz@fxgrp.UUCP (Lloyd Zusman, Master Byte Software) (01/06/88)

In article <47000025@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes:
> ...
>I learned C with the explicit assumption that, since it allowed both
>data and function pointers, one could take an array, write a program that
>generated compiled code in that array, cast the address of (the first
>byte of) the array to a function pointer, and call that function. 
>     I have since learned that on some systems (i.e. 80286 Xenix in anything
>except single-segment model) it won't work, and that ANSI C does not
>require it to work (A FATAL FLAW).
> ...

I agree with some of the others here that this is anything but "A FATAL
FLAW".  It's hardly a flaw that a language that was designed as a
compiled language would not always work properly when some machine-
dependent side effects of the language that sometimes allow it to
be used in an interpretive manner don't work on all architectures and
hence are not required as features of the language.

Back in the days of FORTRAN, I wrote some code that would go through
memory and find the FORMAT statement strings, and that would alter
them at run-time.  This gave me execution-time-modifiable FORMAT
statements which, although useful, are not a feature of the language.
This code I wrote only worked for a particular FORTRAN implementation
on a particular operating system on a particular machine.  Putting
machine code into a block of memory at run-time and executing it is a
slick feature, but it's foolish to expect such a thing to be portable
...  or to even be possible at all on some architectures, as some
people here have already pointed out.  Making this a feature of a
language one of whose prime features is portability would be silly, in
my opinion.

Sure, if your operating system and machine architecture allow this to
be done, and if you don't care whether your code is portable, then by
all means do it if you must.  But don't expect someone to design C to
make it easy on you.

-------------------------------------------------------------------------
 Lloyd Zusman
 Master Byte Software
 Los Gatos, California	    	    	Internet:   fxgrp!ljz@ames.arpa
 "We take things well in hand."	    	UUCP:	    ...!ames!fxgrp!ljz