markh@csd4.csd.uwm.edu (Mark William Hopkins) (09/11/90)
Recently, an interesting idea has come to mind for a new kind of compiler: a Multi-Compiler. What makes it different from your typical compiler is that it accepts code from more than one source language: many source languages in fact. However, it's an idea that is easier said than conceived. What would it look like? The whole issue seems to revolve around this concept (which I borrow from linguistics) of 'code-switching'. Code-switching is where a multi-lingual speaker switches from one language to another, often in mid-sentence. For instance, while waiting for a departure from an airport in Budapest, I got in a conversation with an East German traveller. However, my German was weak, his English was non-existant, and our Hungarian was not very strong. So we found it necessary to literally sprinkle our conversations with almost random switching between German and Hungarian. Each language offered something which compensated for something lacking in (our knowledge of) the other. A good programmer will also face the same kind of dilemma. Different languages are designed to do different things better. An extreme example is the case of writing a truly practical AI control program which would ideally handle all the intelligent rule-based tasks in Prolog, and all the event-driven tasks in assembly and C, and maybe even the recognition and learning tasks in the assembly of a special purpose neural net chip. The question, naturally, is: when are you allowed to code-switch? Depending on how you answer this, you either got a closely integrated set of *distinct* compilers (like the Quick series marketed by MicroSoft), or a truly integrated programmer's utility. If you force the "one-language-per-module" constraint, which a lot of people I talked to about this seem to arrive at as a first idea, then you have nothing more than a series of disjoint compilers integrated by a common object code format and single linker. In this case, it's "all in the linker". But in that situation, there would remain the question: when you define a module in language A, and use it in language B, which language do you declare it in? Declaring it in B, potentially means a lot of redundant header files, and declaring it in A means having to resolve the issue of how to interface data types of different languages. This could be very much complicated if your languages vary between the highly imperative C, to the highly declarative Prolog. If you allow for interlanguage mixing within modules, you will face a more extreme version of the data-type interfacing problem, and possibly even a control statement interfacing problem. Here, the ideal solution seem to be the "one-language-per-function" rule. But in this case, it's "all in the compiler", not the linker. Syntax is not an issue. We're not talking about actualy merging the syntaxes of the source languages into one horrific construct (though that would be an interesting problem to solve). When you want your compiler to do C, you issue a #in c directive. When you want it to switch to Pascal, you likewise issue a #in pascal directive, and so on... With this latter strategy (more than one language per file), the issue of what language you issue external declarations becomes moot: since it's all "going down the same stomach" anyhow, it doesn't matter. The best strategy to pursue to minimize these problems see to be to simultaneously develop extensions of each language that are upwardly compatible with the latest standard and which make these languages as much alike as possible. This means adding C/Pascal-like data structures and control structures to the likes of FORTRAN or BASIC, for instance. It seems to me, though, that the huge investment in this effort would be very much worth it, since no matter where I talk and who I talk to about this, the idea goes over extremely well: it seems that we're talking about the ultimate programmer's workbench with this kind of utility. But there's this one nagging issue: what would this give us that using a series of compilers, like MicroSoft's Quick series, with a good linker won't already give you? -- Send compilers articles to compilers@esegue.segue.boston.ma.us {ima | spdcc | world}!esegue. Meta-mail to compilers-request@esegue.
cik@l.cc.purdue.edu (Herman Rubin) (09/18/90)
In article <9009110403.AA03158@csd4.csd.uwm.edu>, markh@csd4.csd.uwm.edu (Mark William Hopkins) writes: > > Recently, an interesting idea has come to mind for a new kind of compiler: >a Multi-Compiler. What makes it different from your typical compiler is that >it accepts code from more than one source language: many source languages in >fact. > > However, it's an idea that is easier said than conceived. What would it >look like? The whole issue seems to revolve around this concept (which I >borrow from linguistics) of 'code-switching'. > > Code-switching is where a multi-lingual speaker switches from one language >to another, often in mid-sentence. For instance, while waiting for a >departure from an airport in Budapest, I got in a conversation with an East >German traveller. However, my German was weak, his English was non-existant, >and our Hungarian was not very strong. So we found it necessary to literally >sprinkle our conversations with almost random switching between German and >Hungarian. Each language offered something which compensated for something >lacking in (our knowledge of) the other. But you were not using specialized languages. Any one of the languages would have been quite adequate for the task. It was knowledge in one language which compensated for lack of knowledge of the other. If the two of you both knew Swahili, you would undoubtedly have used that for conversation. This is not the problem in computer languages. > A good programmer will also face the same kind of dilemma. Different >languages are designed to do different things better. An extreme example is >the case of writing a truly practical AI control program which would ideally >handle all the intelligent rule-based tasks in Prolog, and all the >event-driven tasks in assembly and C, and maybe even the recognition and >learning tasks in the assembly of a special purpose neural net chip. I disagree. Apart from machine language, there is no language designed to use the power of the computer. Mathematicians, physicists, chemists, etc., have specialized vocabularies added to their "normal" languages, and would not think of writing an article exclusively in their jargon. Also, they have had to interact with other jargons, and there is no attempt to preclude constructs, as the present computer languages do. Even Basic English, an attempt to produce an easily learned limited version of English, was not really that limited. The main limitation was to replace verbs and tenses by compound expressions using participles and auxiliaries. But nothing was really left out. Algol failed in its goal as an algorithmic language by omitting even well-known algorithmic ideas, the most notable being the existence of a list of outputs from a replacement statement. This has been discussed somewhat; a list is not a struct, or a class in an object- oriented language, but a temporary labeling. We really need the approach taken by the users of natural language, namely, if it isn't there, put it in! If it is clumsy, do it differently, although this can be very hard to achieve. Why can we not have an overloaded, weakly typed, "natural" notation assembler, with a macro processor which can expand things in the user's fashion? > The question, naturally, is: when are you allowed to code-switch? >Depending on how you answer this, you either got a closely integrated set of >*distinct* compilers (like the Quick series marketed by MicroSoft), or a >truly integrated programmer's utility. Only the latter will do. Having part of the program produced by one language and part by another requires combining them by someone knowing the syntaxes of both, and even the machine representations of both. In some cases, this can be done by a super-linker, but not by a linker. But does the super-linker even have the information? We would have to augment the symbol table by having the parameter structure for each call listed, and also for each return. The code would likely be clumsy and slow. The language can be produced, and presumably compilers can be written. But it must take the approach that everything is possible, and that utility is more important than esthetics. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!cik(UUCP) -- Send compilers articles to compilers@esegue.segue.boston.ma.us {ima | spdcc | world}!esegue. Meta-mail to compilers-request@esegue.