[comp.software-eng] optimizing compiler technology ==> help for SE?

clements@cs.utexas.edu (Paul C. Clements) (11/10/89)

About ten days ago, I posted a request for information that asked, roughly,
"How much can we rely on optimizing compiler technology to get back the
performance cost that usually comes with the use of sound software engineering
techniques?"  I promised to post a summary of responses.

I got over two dozen answers.  Over twenty responses were what I call
"sociological" or "anecdotal". These (a) told me why we ought to use good
techniques _anyway_; (b) told me how to coerce or fool "recalcitrant"
decision-makers into using the techniques _anyway_; (c) told me why performance
wasn't important; or (d) provided anecdotal evidence that the performance
penalty associated with sound engineering techniques is a myth, or is real
but not very big, or is real and is big.    

To these people, I say thank you for writing, and you raised good points
(at least for your application area) but the problem you addressed is not
the one I had in mind.   I explicitly stated that performance was VERY
important in my hypothetical system, and I was fascinated by the number of
people who either ignored that, or assumed that it didn't really matter
"in this day and age."  In the application I have in mind: (a) every byte
counts; (b) every millisecond counts; (c) performance requirements are
well-defined, tight, non-negotiable, and hard to meet; (d) a program failing
to meet its performance requirements is useless; and (e) getting new hardware
is not a possible alternative.  In this environment, people in charge will
CHOOSE -- a conscious, well- informed, reasoned decision -- high performance
over just about any other advantage you can name that derives from application
of what we would call sound software engineering practices.  They see no
choice.   For doubters, one example of this application domain is embedded
tactical avionics systems for military aircraft.

Well, of course many (most?) optimization problems are NP-complete, if not
undecidable.   However, there is a shocking (to me) lack of formality with
regard to optimization work.   I was hoping to find something like "to solve
optimization problem P, technique T will handle all cases of the form F,
and produce optimal code.   For a case not of form F, T will produce code
no worse than what was given, and maintain its correctness."

No such luck.   All the optimization literature I've seen is of the form
"Here's a technique to solve this problem.  It works like this.   We ran it,
and discovered that either (a) it produces better code than an optimizing
compiler we were competing against; or (b) it produces at least as good code
as a competitor, but does it a little faster."

Short summaries of four intriguing "technical" responses that I received
follow; you may inquire directly of the author for further information.

Thanks for everyone's help.

P. C. Clements
University of Texas at Austin                            (130 lines follow)

==============================================================================

>From soi!chip@harvard.harvard.edu Wed Nov  1 09:04:22 1989

My company is currently working on a DARPA-funded 5-year project that
has as one of its main goals precisely your paragraph.  The project is
the joint design and prototyping of a wide-spectrum language and
programming environment.  One underlying idea is to design
the environment and the language together.

One of our design goals is to allow the programmer to use _very_ high
level constructs at no loss of penalty... 

We pay _no_ overhead for procedure calls unless it is required by
the program...

We allow the programmer to specify the relative cost of time and
space, and most of our optimization methods compute costs relative to
these parameters before choosing what to do....

==============================================================================

From mauney@cscadm.ncsu.edu Fri Nov  3 15:52:05 1989

...The additional problems posed by information-hiding techniques are likely
to be
a) Indirection
     The concrete representation of an abstract data type is likely
     to be based on a pointer... 
b) procedure calls
     Manipulating an object of abstract type requires a procedure call,
     which is expensive.  But procedure calls can be in-lined, even
     across modules... 
c) need for inter-module analysis
     Inter-module analysis for optimization is theoretically the
     same as any other inter-procedural analysis...
d) dynamic binding
     ...Lisp partisans claimed to have (this) under
     control, and many good languages don't have dynamic binding.
e) Generic code
     With paramterized types, such as generic packages in Ada,
     we have a problem: generate one piece of code that is general
     enough to be used in all instantiations, or macro-expand
     distinct pieces of code for each instantiation?  Here is the
     time-space tradeoff.  But I don't see a difficult compilation
     process, only a question of how to let the programmer specify
     which way to go.

It's really a matter of being willing to delay many of the optimizations
until the whole program is assembled... 

=============================================================================

>From preston@rice.edu Tue Oct 31 15:09:21 1989

Your example, minimization of storage: Janet Fabri wrote a thesis on
the subject.  She later worked for IBM on the ECS compiler project...
She used ideas that were similar to later work done with graph coloring
for register allocation.

However, "minimal (not just improved)" is certainly NP-Complete
and probably undecidable.  Does it matter?  I don't expect an assembly
programmer achieves minimal in any large program.

Next subject, time/space efficiency:   Compilers can't do much here.
...the real tradeoffs are made during the program (algorithm) design.
(However, see Jack Schwartz's work on SETL,  e.g. "Automatic data structure
choice in a language of very high level", CACM, December 1975.)

...I can talk about what's up with optimizers, especially here at Rice.

Classically, optimizers have worried about one procedure at a time.
About 10 years ago, this area started to settle down, though there
is still room for improvement.  This attention to single procedures
has lead a lot of people to worry about procedure call overhead
(and to build 2000 line procedures).  Of course, others began to study
how to optimize whole programs.  The family tree looks something like:

Spillman

  Barth - Allen - Burke (others?)
        \
	 Banning - Cooper, Kennedy (Torczon, Callahan, Balasundarum, ...)

A second area that's opening up is analysis of arrays.
Classically, optimizers make pretty bad assumptions around arrays.
Recently though, a lot of work has been done to understand patterns
of array access in loops (dependence analysis).  The initial 
motivation was vectorization, but people are beginning to apply 
the same techniques to scalar machines.  So, we're beginning
to understand arrays.

What about records, linked structures, persistent objects,
coroutines, first class functions, continuations, ...?

==============================================================================

>From cline@cheetah.ece.clarkson.edu Fri Nov  3 14:13:29 1989

...A more powerful boost to SE is: "Can we use good SE to produce what is
actually *faster* code?

The basic notion that I'll present is that "more high-level information
means more chance for high-level optimization."  I cite recent research
which Doug Lea (dl@oswego.oswego.edu) and I (cline@sun.soe.clarkson.edu) are
doing with annotating C++ code.  We call the system "A++" for "Annotated C++".

Benefits:  Correctness. Formal verification can be employed to check if
what you *said* (the code) is consistent with what you *meant* to say
(the annotations).  Thus *lots* more static (compile-time) analysis can be
done on annotated code.

Another benefit, although not immediately obvious, is Optimization.  It
turns out that the additional semantic information can be employed by
the compiler to help generate better code.  For example, many runtime
consistency and exception test are redundant; formal verification can
be used to *prove* them to be redundant, so they can be (automatically)
removed.  This reduces actual runtime testing to a minimum (or *nearly*
minimal anyway) without changing the source code.  Thus A++ ("Annotated
C++") can be thought of as an "Exception Optimizer."

==============================================================================

(end of msg)

PAAAAAR@CALSTATE.BITNET (11/13/89)

In Software Engineering Digest v6n78
"Paul C. Clements"
<snorkelwacker!apple!usc!cs.utexas.edu!clements@think.com>
notes that many sample arguments for using HLL, modularization, and
other SE techniques are invalid in his environment.
Performance requirements sometimes dominate
requirements for clarity,  reusability, economy,
amendabillity, testabillity, elegance, managability,...
SE has recently been aimed at
these qualities almost to the exclusion of <efficiency>.

Let me argue that SE plus compilers is doomed to fail:

(1) Typical manufacturers' compilers  produce Inefficient code.
Hardware manufacturers have to design
operating systems and compilers to be inefficient because this increases
the demand for the more expensive pieces of hardware, and so increases profit
s.
Further adding optimization costs time and money and increases production cos
ts.
It is the task of a company to maximize profits...

Conclusion - Efficient software is DIY or from a software company.

(2) Optimisation is worse than NP complete. Its not computable!
Predicting the run time of any given program on any given piece of data
is a well known unsolvable problem. If such a program exists then
there will exist a program and some data for which it will
fail to predict the correct time. Hence there is no program that
guarantees optimal code...

(3) Structure and efficiency.
In the Journal of the ACM some years ago some theorists proved that
eliminating GOTOs from some programs lead to an exponential growth
in program text or in run time.

I've never met a real program with this property:-)

Solution :- KISS!
(1) Distinguish design from code. A blueprint is not the object!
(2) Use a simple blueprint notation that is one-one with assembler.
(3) Develop tools to edit and assemble the notation.

Sub-Problem

In software engineering we don't have a good popular design notation.
(Not just *my* opinion - Hoare, Sutherland, and others
have said the same thing)
However there do exist some working notations:
Tripp LL "A survey of Graphical Noations for Program Design  - An Update
   Software Engineering Notes Vol 13 No 4 pages 39-44 (Oct 1988)
Jonsson, Dan, "Graphical Program Notations: On Tripp's Survey, the Past,
and the Future",
   Software Engineering Notes Vol 14 No 5 pages 78-79 (July 1989)

Example 1

Rob Witty invented Dimensional Flowcharts some years ago
to allow him to produce efficient and yet well designed software.
They are easy to draw and edit. There is a minimum of decorative graphics.
They are designed for stepwise refinement/modular code.
They work with data directed methods as well.
Program structure and the diagram layout correspond one-one.
There is a simple way to map a Dimensional Chart into ASCII/EBCDIC...
Simple algorithms exist that
 (1) assemble the coded charts into efficient machine code
 (2) print, plot or display coded charts.

The tools were on a specialised machine in the Rutherford
Computer Labs in the UK and before there was a mass market for software.
I don't think it is available for any modern machines...

The rules are:
1. No BOXES.
2. Sequences are Vertically down the page.
   Sequence
    \
     A
     :
     B
     :
     #
3. When there are several alternative branches they are spread out
   horizontally across the page with a line across the top.
   IF
    \_____________________
     :                    :
    (condition for A     (condition for B
     :                    :
     A                    B
     :                    :
     #                    #

4a. If a sequence can repeat (a loop) then it terminates with a star
4b. Otherwise a sequence terminates with a hash sign
  (drawn, plotted and displayed as the electrical engineers symbol for 'earth
')
5. A loop can have several EXIT points (marked with a 'C' shape for
   Condition).
  LOOP
    \
     :
    ( WHILE A is next
     :
    Do something to A
     :
     *

6. A diagonal line indicates refinements/macro definitions/subprograms
   and connects the high level description (down and left) to the
   detailed form.
7. A "Devil's Tail" indicates a GOTO (just in case...).

References
Botting, Dick, "Comments on Tripp's Graphical Notations"
   Software Engineering Notes Vol 14, No. 1 p83 (Jan 1989)
Witty RW "Dimensional Flowcharting"
   Report RL-76-132/A, Ruhterford Labs, Chilton, England, 1976

I can EMAil samples if you wish.

Example 2.

System Software for MU5 (Anecdote)
The 5th machine designed at Manchester University, England, had software
produced by two distinct teams of experts - the designers and the coders.

The designers drew and editted flow charts.
The coders used natural intelligence to convert the charts into code.
Design changes were made by the designers and implemented by coders.
The designers worked on paper, the coders worked with the code.
A program printed out the updated flow chart, from the code.
The coders ran the tests as well...

Has anyone got any info on whether how this worked out in the long run...

Example 3.

Use the Brackets(S) program on a cheap PC to do the design
in Warnier/Orr notation. There exist simple algorithms for
converting Warnier's noation into efficient code.
(Warning - I am convinced that the coding used in the text books
 tends to generate buggy loops....)



KISS technology like the above can get the benefits of
visible design, for a small investment in technology,
and no loss of performance.

-------------------------------------------------------------
Richard J. Botting,  Department of computer science,
California State University, San Bernardino,
5500 State University Parkway, San Bernardino CA 92407
PAAAAAR@CCS.CSUSCC.CALSTATE
paaaaar@calstate.bitnet
PAAAAAR%CALSTATE.BITNET@CUNYVM.CUNY.EDU

----------------------------------------------------------
What this country needs is a good $5M Ada Compiler!
----------------------------------------------------------