[comp.compilers] Multi-compilers

markh@csd4.csd.uwm.edu (Mark William Hopkins) (09/11/90)

   Recently, an interesting idea has come to mind for a new kind of compiler:
a Multi-Compiler.  What makes it different from your typical compiler is that
it accepts code from more than one source language: many source languages in
fact.

   However, it's an idea that is easier said than conceived.  What would it
look like?  The whole issue seems to revolve around this concept (which I
borrow from linguistics) of 'code-switching'.

   Code-switching is where a multi-lingual speaker switches from one language
to another, often in mid-sentence.  For instance, while waiting for a
departure from an airport in Budapest, I got in a conversation with an East
German traveller.  However, my German was weak, his English was non-existant,
and our Hungarian was not very strong.  So we found it necessary to literally
sprinkle our conversations with almost random switching between German and
Hungarian.  Each language offered something which compensated for something
lacking in (our knowledge of) the other.

    A good programmer will also face the same kind of dilemma.  Different
languages are designed to do different things better.  An extreme example is
the case of writing a truly practical AI control program which would ideally
handle all the intelligent rule-based tasks in Prolog, and all the
event-driven tasks in assembly and C, and maybe even the recognition and
learning tasks in the assembly of a special purpose neural net chip.

   The question, naturally, is: when are you allowed to code-switch?
Depending on how you answer this, you either got a closely integrated set of
*distinct* compilers (like the Quick series marketed by MicroSoft), or a
truly integrated programmer's utility.
   
   If you force the "one-language-per-module" constraint, which a lot of
people I talked to about this seem to arrive at as a first idea, then you
have nothing more than a series of disjoint compilers integrated by a common
object code format and single linker.  In this case, it's "all in the
linker".

   But in that situation, there would remain the question: when you define a
module in language A, and use it in language B, which language do you declare
it in?  Declaring it in B, potentially means a lot of redundant header files,
and declaring it in A means having to resolve the issue of how to interface
data types of different languages.  This could be very much complicated if
your languages vary between the highly imperative C, to the highly
declarative Prolog.

   If you allow for interlanguage mixing within modules, you will face a more
extreme version of the data-type interfacing problem, and possibly even a
control statement interfacing problem.  Here, the ideal solution seem to be
the "one-language-per-function" rule.  But in this case, it's "all in the
compiler", not the linker.

   Syntax is not an issue.  We're not talking about actualy merging the
syntaxes of the source languages into one horrific construct (though that
would be an interesting problem to solve).  When you want your compiler to do
C, you issue a #in c directive.  When you want it to switch to Pascal, you
likewise issue a #in pascal directive, and so on...

   With this latter strategy (more than one language per file), the issue of
what language you issue external declarations becomes moot: since it's all
"going down the same stomach" anyhow, it doesn't matter.

   The best strategy to pursue to minimize these problems see to be to
simultaneously develop extensions of each language that are upwardly
compatible with the latest standard and which make these languages as much
alike as possible.  This means adding C/Pascal-like data structures and
control structures to the likes of FORTRAN or BASIC, for instance.

   It seems to me, though, that the huge investment in this effort would be
very much worth it, since no matter where I talk and who I talk to about
this, the idea goes over extremely well: it seems that we're talking about
the ultimate programmer's workbench with this kind of utility.

   But there's this one nagging issue: what would this give us that using a
series of compilers, like MicroSoft's Quick series, with a good linker won't
already give you?
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

cik@l.cc.purdue.edu (Herman Rubin) (09/18/90)

In article <9009110403.AA03158@csd4.csd.uwm.edu>, markh@csd4.csd.uwm.edu (Mark William Hopkins) writes:
>
>   Recently, an interesting idea has come to mind for a new kind of compiler:
>a Multi-Compiler.  What makes it different from your typical compiler is that
>it accepts code from more than one source language: many source languages in
>fact.
> 
>   However, it's an idea that is easier said than conceived.  What would it
>look like?  The whole issue seems to revolve around this concept (which I
>borrow from linguistics) of 'code-switching'.
>
>   Code-switching is where a multi-lingual speaker switches from one language
>to another, often in mid-sentence.  For instance, while waiting for a
>departure from an airport in Budapest, I got in a conversation with an East
>German traveller.  However, my German was weak, his English was non-existant,
>and our Hungarian was not very strong.  So we found it necessary to literally
>sprinkle our conversations with almost random switching between German and
>Hungarian.  Each language offered something which compensated for something
>lacking in (our knowledge of) the other.

But you were not using specialized languages.  Any one of the languages would
have been quite adequate for the task.  It was knowledge in one language which
compensated for lack of knowledge of the other.  If the two of you both knew
Swahili, you would undoubtedly have used that for conversation.  This is not
the problem in computer languages.

>    A good programmer will also face the same kind of dilemma.  Different
>languages are designed to do different things better.  An extreme example is
>the case of writing a truly practical AI control program which would ideally
>handle all the intelligent rule-based tasks in Prolog, and all the
>event-driven tasks in assembly and C, and maybe even the recognition and
>learning tasks in the assembly of a special purpose neural net chip.

I disagree.  Apart from machine language, there is no language designed to
use the power of the computer.  Mathematicians, physicists, chemists, etc.,
have specialized vocabularies added to their "normal" languages, and would
not think of writing an article exclusively in their jargon.  Also, they
have had to interact with other jargons, and there is no attempt to preclude
constructs, as the present computer languages do.

Even Basic English, an attempt to produce an easily learned limited version 
of English, was not really that limited.  The main limitation was to replace
verbs and tenses by compound expressions using participles and auxiliaries.
But nothing was really left out.  Algol failed in its goal as an algorithmic
language by omitting even well-known algorithmic ideas, the most notable being
the existence of a list of outputs from a replacement statement.  This has
been discussed somewhat; a list is not a struct, or a class in an object-
oriented language, but a temporary labeling.

We really need the approach taken by the users of natural language, namely,
if it isn't there, put it in!  If it is clumsy, do it differently, although
this can be very hard to achieve.  Why can we not have an overloaded, weakly
typed, "natural" notation assembler, with a macro processor which can expand
things in the user's fashion?

>   The question, naturally, is: when are you allowed to code-switch?
>Depending on how you answer this, you either got a closely integrated set of
>*distinct* compilers (like the Quick series marketed by MicroSoft), or a
>truly integrated programmer's utility.

Only the latter will do.  Having part of the program produced by one 
language and part by another requires combining them by someone knowing
the syntaxes of both, and even the machine representations of both.  In
some cases, this can be done by a super-linker, but not by a linker.  But
does the super-linker even have the information?  We would have to augment
the symbol table by having the parameter structure for each call listed,
and also for each return.  The code would likely be clumsy and slow.

The language can be produced, and presumably compilers can be written.
But it must take the approach that everything is possible, and that 
utility is more important than esthetics.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet)	{purdue,pur-ee}!l.cc!cik(UUCP)
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.