[comp.compilers] Questions, concerns about ANDF

rcd@ico.ISC.com (Dick Dunn) (05/06/89)

I'd like to see some discussion about the idea of ANDF.  I've got some
questions and concerns to try to get it going.  Naturally, I hope Doug
Hartman (or some other representative of OSF) will respond, but it would
also be interesting to see what other folks think.  Try these for starters:

=>1.  When I first skimmed the RFT, I had this uneasy feeling that it
sounded an awful lot like UNCOL.  I doubt that I'm alone in this reaction.
I assume that OSF is familiar with the history of UNCOL attempts...so the
first question is "How is the ANDF concept qualitatively different from the
UNCOL concept?"

=>2.  The RFT states the intention that...

> Hardware vendors can:
>... - rely on the availability of a rich applications base for new   
>   technologies.

This seems to imply that OSF believes ANDF will allow a program to be run
on new hardware on which it has not been tested or ported.

I think our general experience with moving software around in the UNIX
world is that it *is* often possible to write programs in a way that allows
them to be moved to new machines--for which they have not been tested--and
have them work.  However, what is "possible" and what actually happens in
the real world are rather different here.  Many programs have sensitivities
to particular architectural features--sometimes blatant, sometimes quite
subtle.  One must be careful about confusing "architecture-neutral" with
"portable", yet there is a strong connection between them.

Next, I submit that the source form of a program, whatever its quality
might be, is at the upper bound of portability and architecture-neutrality
for the program.  That is, a translation step, such as moving from source
to ANDF, cannot decrease the program's architecture sensitivity.  (If a
translation could remove architecture-sensitivity, then either the trans-
lation somehow violates or substantively alters the semantics of the pro-
gram, or the architecture-sensitivity wasn't really there in the first
place.)  The implication from this line of reasoning is that programs
distributed in ANDF will be subject to at least as many problems of non-
portability as are existing programs which are distributed in source form.

Granted, one can detect certain types of architecture-sensitivity during
translation, but by no means is all of it detectable.

So here, "Is my understanding of the implication correct, that an existing
program in ANDF is expected to be able to run on new hardware?"

=>3.  With regard to characteristics of a solution, the RFT suggests:

> 1. A specification for providing architecture-neutral software 
> distribution.  Possible examples include  specification of an 
> intermediate compiler format, encrypted source, or tagged 
                                ^^^^^^^^^ ^^^^^^
I don't understand how this would meet the criteria.  If the format were
encrypted source, an ANDF compiler system would have to contain the
decryption algorithm.  Wide availability of ANDF compilers would make it
highly likely that a "decrypter" could be constructed and surreptitiously
passed around.

With regard to the goal of protecting the original source, then, "Is it not
the case that the translation from source to ANDF *must not* be an infor-
mation-preserving transformation?"

=>4.  Regarding the requirements for ANDF:

> Multiple architectures
> Support for at least two distinct machine architectures must be 
> demonstrated...
>...The technology must be extensible...to additional hardware 
> architectures...

I can understand that OSF might not want to require support for many
machine architectures, simply because this would place such a large burden
on the groups developing technology, to develop multiple back ends as part
of the submission.  However, experience in this area of portability has
shown that techniques which appear promising, and which work well for a
few machines, frequently either fail or collapse of their own weight when
extended to a larger number of machines.

This places a heavy burden on OSF, to evaluate ANDF candidates for their
extensibility to all machines of potential interest.  I think they could
have made their jobs easier by requiring demonstration for two architec-
tures and a "paper solution" for a couple more.

=>5.  The same concern applies on the "front end" requirement:

> Languages
> The mechanism must support applications written in ANSI C...
>...The technology must be extensible...to additional programming languages.

"UNCOL-style" work in programming languages has probably failed more often
due to the diversity of programming languages, than due to the diversity
of machine architectures.  I can't imagine how OSF is going to be able to
evaluate a submission for extensibility to additional languages based on
support for a single language.  It would have been helpful to have some set
of additional languages against which submissions could be evaluated.  I
would think that extensibility to at least FORTRAN, COBOL, Pascal (or a
derivative), and Ada would be necessary.

=>6.  A colleague of mine offered a conjecture:  ANDF will not remain
architecture-neutral.  If ANDF is, as seems likely, an intermediate trans-
lation format, there could be serious commercial advantage to building
hardware which is attuned to the characteristics of ANDF.

It's an interesting conjecture.  The implications are far-reaching, and not
necessarily good.  Comments?

=>7.  One small matter I didn't understand:

> National Language Support
> Implementations must be capable of supporting a broad range of 
> national languages, including at least European, Semitic, and Asian 
> languages.

It was not clear to me how the ANDF would influence, nor be influenced by,
the choice of national language.  Is this simply a piece of "boilerplate"
which is always stated for safety (even if compliance is trivial)?  Or is
there some anticipated problem--some way in which an ANDF might fail to
admit variation in national language?
---
Dick Dunn      UUCP: {ncar,nbires}!ico!rcd           (303)449-2870
   ...Relax...don't worry...have a homebrew.
[The moment I saw reports of the ANDF RFT it immediately impressed me as yet
another chapter in the quest for the UNCOL holy grail.  The fact that they
even put out the RFT suggests that (optimist) they know somebody who has made
a major breakthrough and has actually got such a thing or (pessimist) they know
somebody who thinks he's done it but will find like all previous attempts at
UNCOL that it's real hard, on a par with, say, automated translation of
poetry from English into Chinese.  But I suppose as moderator I should try to
avoid being too opinionated.  -John]
--
Send compilers articles to compilers@ima.isc.com or, perhaps, Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

zs01+@andrew.cmu.edu (Zalman Stern) (05/09/89)

I don't know much about the history of UNCOL. Perhaps someone could post a
brief summary or a list of references.

On the subject of the RFT:

If you are willing to pay enough performance, this goal seems attainable.
For example, software written in 80386 assembly language for an IBM PC
running MSDOS can now be executed on at least four different architectures.
(IBM RT, Motorola 680x0, SPARC, and of course the Intel 80386. There are
doubtlessly other which I don't know about). I wouldn't be surprised if one
of the submissions for this RFT comes from a company doing an 80386
"recompiler."

If nothing else, this should encourage some interesting proposals. If they
pull this off, it will make the OSF lots of money. If they don't, the
technical publicity is probably worth more than they would have paid in
advertising.

Sincerely,
Zalman Stern
Internet: zs01+@andrew.cmu.edu     Usenet: I'm soooo confused...
Information Technology Center, Carnegie Mellon, Pittsburgh, PA 15213-3890
--
Send compilers articles to compilers@ima.isc.com or, perhaps, Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

henry@zoo.toronto.edu (05/10/89)

>> intermediate compiler format, encrypted source, or tagged 
>                                ^^^^^^^^^ ^^^^^^
>I don't understand how this would meet the criteria.  If the format were
>encrypted source, an ANDF compiler system would have to contain the
>decryption algorithm...

Think "obfuscated", as in "Obfuscated C Content", not "encrypted" in the
traditional sense.  In practice, if ANDF is to be truly architecture-
neutral, it's going to have to be some form which is more or less source
code in disguise.  The disguise, however, can be made pretty thorough
even without resorting to binary tokenized formats.  As witness some of
the stuff that appears in the above-mentioned contest.  A prettyprinter
would still bring out control structure, but data structure could be
obscured pretty thoroughly, to the point where figuring the code out would
be almost as hard as disassembling binaries.  (Good disassemblers can pretty
much reconstruct the control structure, so you're not really losing much
by the availability of the prettyprinter route.)

                                     Henry Spencer at U of Toronto Zoology
                                 uunet!attcan!utzoo!henry henry@zoo.toronto.edu
[This point is well taken, though it is my impression that ANDF is supposed to
support multiple source languages.  -John]
--
Send compilers articles to compilers@ima.isc.com or, perhaps, Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

rnovak@mips.com (Robert E. Novak) (05/12/89)

Why don't we just ship encrypted source code as the ANDF?

The main purpose of an ANDF is to allow vendors to ship some machine
independent form of their program that can be installed on a machine.  Thus
we can have true shrink wrap software.  The vendors have proprietary
algorthms locked up in their source code, hence the desire to not ship the
source.  Yet, almost any ANDF would be in such a form that uncompiling an
ANDF back to source code would be fairly simple.

Sooo... let's ship the source using public key encryption where the
compilers are built by 'trusted' manufacturers that will decrypt and
compile the source code and install it, without ever revealing the source
code.  Decryption or intercepting the compiler halfway through is
guaranteed to be more difficult than uncompiling an ANDF.
-- 
Robert E. Novak                                     MIPS Computer Systems, Inc.
{ames,decwrl,pyramid}!mips!rnovak      928 E. Arques Ave.  Sunnyvale, CA  94086
rnovak@admin.mips.COM (rnovak%mips.COM@ames.arc.nasa.gov)       +1 408 991-0402
[Seems to me that it's unlikely that you could create such a scheme that wasn't
fairly easy to reverse engineer.  Public key encryption assumes that the
decryption key is not public, but in this case you'd be shipping it in every
copy of the compiler.  -John]
--
Send compilers articles to compilers@ima.isc.com or, perhaps, Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

sri@osf.osf.org (Sri Vasudevan) (05/12/89)

Hello Everyone:

I am writing this in response to Article [117] from Dick Dunn. First of all
I want to thank Dick and all the others who have expressed their views so
far on ANDF and I'd like to encourage everyone to continue to discuss any
issues (positive AND negative!)  related to ANDF. It is extremely important
to have an open process and the more feedback we get, the higher are the
chances of converging on the best possible approach to the problem that we
are trying to solve!

Here are my responses to Dick's concerns and I'd like to qualify my responses
by mentioning that I reserve the right to change my views based on the
technologies that we are about to receive in response to the RFT.

1: UNCOL versus ANDF:

ONE of the three approaches to ANDF , viz. compiler intermediate format is
conceptually similar to UNCOL. Perhaps the biggest difference between
UNCOL and this (1/3)rd of ANDF is not in concept but in timing in the history
of computing. Computer Science has a come a long way since the UNCOL times
in the area of architecture-neutrality.

New and powerful phenomena like UNIX and C were non-existent then. Besides
many many computer vendors have their proprietary compiler technology based
on a common intermediate language for several languages. (I'd rather not
mention any names, but talk to someone who has worked in compiler development 
in several computer companies.) So a common intermediate language for multiple
languages and machines is hardly "a quest for holy grail", but pretty much
a de-facto standard for building compilers today in proprietary architectures.
(Whether or not it is a good idea is a different story and perhaps not
without controversy!)

2: portability versus architecture-neutral

The distinction between the two is very real and it is very important to
understand it. ANDF will be an architecture-neutral program representation
and it is not the same as an architectural-neutral program! You can have
programs which are not architectural neutral expressed in an ANDF 
representation! ANDF would be irrelevant without an emphasis on portability
during software development. And this emphasis has grown stronger since
UNCOL times.

3: Can ANDF be an "encrypted source"?

I reserve my judgement on the issue until I see the technologies 
submitted to us.

4. A "paper solution" for a couple more architectures:

Good suggestion. Extensibility is a mandatory requirement that will be
given a lot of attention in the evaluation process.

5. Extensibility to languages

Same as (4).

6. Building Hardware atuned to ANDF:

I would be interested in hearing what people think about the validity of the
conjecture as well as the implications.

7. National Language Support:

One way an ANDF proposal might have difficulty in supporting National
Languages might be an assumption in the design that characters can
be 8-bits and only 8-bits wide, for example.

Thanks again!
[From sri@osf.osf.org (Sri Vasudevan)]
--
Send compilers articles to compilers@ima.isc.com or, perhaps, Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request