[comp.sw.components] Using components

tada@athena.mit.edu (Michael Zehr) (05/10/89)

Most of the discussion so far seems to be directed at retrieving
components, but there's been very little said about using them.

background: i haven't done any programming in ADA, so if there's a
definition of component that only applies to ADA, i don't it.  i'm
basing this on my "feel" of component -- namely a group of code that's
intended to for reuse.  

my experience so far has been that the more general a component is, the
slower it runs.  for example, given a routine to sort objects, you have
to give it a criteria.  somehow, the sorting routine has to make a
function call to compare two elements.  (in a recent article, someone
said that this is how they'd like to see a sorting routine
implemented.  that function call can DOUBLE the time it takes to
perform a sort!)

for another example, some languages have arrays as their basic data
unit. "+" becomes an array operator.  but if all you need to do is add
two scalar quantities, you're hit with a performance penalty.

so far, the trend has largely been towards larger and larger components:
software components have moved from machine language instructions to
assembly language instructions and macros to high-level-languages to
4th-generation languages; 

hardware trends have been similar: very simple machine instructions,
then simulating new features by subroutines, then hardware to perform
new features, continuing with complicated things like MMUs and so on.

but recently, the hardware trend has reversed -- use very simple
components, with a much more complicated "compile" phase; let the
compiler have access to all sorts of little details it didn't have
before. (i'm referring to RISC of course.)

there's still a lot of room for growth in compiler technology,
particularly with global optimization, but this is assuming a few
things: 

1) the compiler needs to have access to the source code for the
component you're using.  no problem if you built all the components
(although it might mean having to open several 1000 files to look at all
the components!) but if you're using components produced by someone else
and sold as a package, you won't have this option.  (it also impinges
on the abstraction level which the component is supposed to be hiding.)

2) the additional time required for the compile phase isn't a problem.

3) ability to recompile everything that uses a component if the
implementation of a component changes, even if the interface remains the
same.

with the current trends in hardware and software (backlog of software
applications, steady and fairly rapid growth in processing capability
per cost), saving time building the application is more important than
reducing the running time of the application.  

but my experience on applications has led to the conclusion that a lot
of performance is being sacrificed for the decrease in application
building time.  there is a lot of work being done in decreasing the
time required to build an application (object-oriented languages, CASE
systems, etc.).  what current work is being done towards increasing
program performance?  what sorts of things are *your* company doing
towards performance of programs?

(at my company, we've build a lot of systems using a fourth generation
language, Stratagem (currently sold by Computer Associates).  the long
term path i'm seeing is coding a prototype in stratagem, finishing it
and makin preliminary delivery in stratagem;  then, recoding critical
parts of the program in C, for rapid numerical calculations, etc.  so
far it's worked fairly well, but to get that performance there's the
penalty of having to write the same thing in two different languages,
plus testing/debugging it twice, etc...)

do others agree that this is a problem in the software industry, or it
is just me?  if it is a problem, how should we face it and fix it
before it becomes worse?

(sorry for the length of this)

-michael j zehr

rang@cpsin3.cps.msu.edu (Anton Rang) (05/11/89)

In article <11293@bloom-beacon.MIT.EDU> tada@athena.mit.edu (Michael Zehr) writes:

   my experience so far has been that the more general a component is, the
   slower it runs.  for example, given a routine to sort objects, you have
   to give it a criteria.  somehow, the sorting routine has to make a
   function call to compare two elements.

Not necessarily true.  Given a good enough development system, the
compiler can inline the function call (even to the point of it
becoming a single machine instruction!).  This leads to a slower
compile, so for test runs it's easier to leave the function call in
(at the expense of speed).

   for another example, some languages have arrays as their basic data
   unit. "+" becomes an array operator.  but if all you need to do is add
   two scalar quantities, you're hit with a performance penalty.

I don't know any languages with arrays as their basic data type (maybe
APL?).  In most languages where "+" can be used on arrays, it's
just overloading, and the compiler generates the appropriate code
based on the type of the objects.

   [ trends have been largely toward larger, more complex components, both
     in software and hardware ]

   but recently, the hardware trend has reversed -- use very simple
   components, with a much more complicated "compile" phase; let the
   compiler have access to all sorts of little details it didn't have
   before. (i'm referring to RISC of course.)

Well, there are problems with this.  Take an HP-9000, which doesn't
have an integer multiply instruction (DISCLAIMER: at least, it didn't
when I read a paper on it, if I'm thinking of the right machine).  Now
suppose that somebody comes up with an extremely fast (hardware)
integer multiplication unit.  No programs will take advantage of it,
because they're all compiled into code to do repeating additions.
  Similarly on the software end of things, it makes sense to abstract
the function of a component at as high a level as possible, so that
new algorithms can be effectively used with existing software.  (Of
course, it's also important to make the components flexible, which
complicates things somewhat.)

   there's still a lot of room for growth in compiler technology,
   particularly with global optimization, but this is assuming a few
   things: 

   1) the compiler needs to have access to the source code for the
   component you're using. [ thus purchased components are a problem ]

No, you don't need source code.  You need some kind of intermediate
form, though.  A generalized RTL, perhaps.

   2) the additional time required for the compile phase isn't a problem.

You only need to do global optimization when producing the final
version.  Development versions can probably get by with only local
optimization.

   3) ability to recompile everything that uses a component if the
   implementation of a component changes, even if the interface remains the
   same.

This is a problem.  However, if you're storing the intermediate code
(see my response to point 1), you don't need to do a full compilation:
just the optimization (admittedly slow).  Code generation can be done
ahead of time, essentially.

   with the current trends in hardware and software (backlog of software
   applications, steady and fairly rapid growth in processing capability
   per cost), saving time building the application is more important than
   reducing the running time of the application.  

I think this is true.  It can take two or three years to build a
fairly small microcomputer program, where the difference between (say)
5 minutes and 10 minutes to generate a report can be annoying, but not
critical (in many applications).

   but my experience on applications has led to the conclusion that a lot
   of performance is being sacrificed for the decrease in application
   building time.

Depends.  Using 4GL, object-oriented, etc. tends to hurt performance,
at least in my experience.  But I spent a few years working at a place
with a nice component library (based around VAX Pascal), and we never
had performance problems.  Going to new languages (4GL/OO) seems to
hurt performance if you don't have specialized hardware, especially
when the compilers aren't well-done.
  The first version of TeleSoft Ada I used took 15 minutes or so to
compile a small program, and execution time was over three minutes for
the 8-queens (if I'm remembering right).  The last version of it I've
used took about 10 seconds to compile, and executed in 3 seconds.  It
takes a while for compiler technology to catch up.

   do others agree that this is a problem in the software industry, or it
   is just me?  if it is a problem, how should we face it and fix it
   before it becomes worse?

Which problem?  Performance?  I don't think it's a serious problem yet
(then again, most companies haven't switched to 4GL or whatever yet).
I think people need to start spending a *lot* more time figuring out
what they need in environments and languages than they are right now.


		Anton

+---------------------------+------------------------+-------------------+
| Anton Rang (grad student) | "VMS Forever!"         | VOTE on	         |
| Michigan State University | rang@cpswh.cps.msu.edu | rec.music.newage! |
+---------------------------+------------------------+-------------------+
| Send votes for/against rec.music.newage to "rang@cpswh.cps.msu.edu".   |
+---------------------------+------------------------+-------------------+

billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe,2847,) (05/11/89)

From article <11293@bloom-beacon.MIT.EDU>, by tada@athena.mit.edu (Michael Zehr):
% my experience so far has been that the more general a component is, the
% slower it runs.  for example, given a routine to sort objects, you have
% to give it a criteria.  somehow, the sorting routine has to make a
% function call to compare two elements.  (in a recent article, someone
% said that this is how they'd like to see a sorting routine
% implemented.  that function call can DOUBLE the time it takes to
% perform a sort!)

    Depends on the quality of the compiler being used...

> for another example, some languages have arrays as their basic data
> unit. "+" becomes an array operator.  but if all you need to do is add
> two scalar quantities, you're hit with a performance penalty.

    Ada allows you to "overload" operators; thus, "+" can be defined
    both as an array operator and as an integer operator, as well as
    any other types you are capable of dreaming up.  The precise function
    to be applied is determined at compile time, through analysis of the
    types of parameters involved.  Thus, algorithms optimized for one data
    types can coexist with algorithms optimized for other data types.

> 1) the compiler needs to have access to the source code for the
> component you're using.  no problem if you built all the components
> (although it might mean having to open several 1000 files to look at all
> the components!) but if you're using components produced by someone else
> and sold as a package, you won't have this option.  (it also impinges
> on the abstraction level which the component is supposed to be hiding.)

   Not necessarily.  The vendor can provide you with a sublibrary which
   contains the vended components in compiled form, compiled under global
   optimization mode.  The knowledge of the component is stored in the
   compiler's "notes" which were encoded into the sublibrary.  When it is
   time to globally optimize the final product, the compiler reads its notes
   in order to obtain the relevant information, and finally the binder binds
   everything together, and you're done.  No source code required; you just
   have to specify to the vendor which Ada compiler you're using.
 
> 3) ability to recompile everything that uses a component if the
> implementation of a component changes, even if the interface remains the
> same.

   This isn't necessary if the interface remains the same.  Just
   re-bind the executables; no recompilation required.  (In Ada, at least)


   Bill Wolfe, wtwolfe@hubcap.clemson.edu

adamsf@cs.rpi.edu (Frank Adams) (05/12/89)

In article <2927@cps3xx.UUCP> rang@cpswh.cps.msu.edu (Anton Rang) writes:
>In article <11293@bloom-beacon.MIT.EDU> tada@athena.mit.edu (Michael Zehr) writes:
>   2) the additional time required for the compile phase isn't a problem.
>
>You only need to do global optimization when producing the final
>version.  Development versions can probably get by with only local
>optimization.

This isn't quite true.  You can get by with only local optimization for MOST
of the development cycle, but when you change over to the globally optimized
version, you WILL find more bugs.  You had better figure on spending some
time doing debugging on that version.

And, of course, if the compiler/global-optimizer is new, you will likely find
some bugs in *it* when you start using it, too.

Frank Adams				adamsf@turing.cs.rpi.edu

tada@athena.mit.edu (Michael Zehr) (05/14/89)

There have been a couple of good comments about using software
components, but so far most of them have been along the lines of
"such-and-such a language does this."  Could you describe
*how* the language provides both efficiency and quick building time?  

For example, one person mentioned Ada's "re-binding" of
executables.  What does this do?  

Someone also mentioned being able to reuse components in different
languages.  I agree that this is crucial in combining efficiency with
productivity.  So far, my experience with mixed-language coding is
that it is very easy on Vaxes (becuase of the forced procedure
interface of the CALLS instruction) but extremely difficult on IBM
mainframes (because there is no standard calling sequence).  Does
anyone have any experience in mixed-language coding on newer RISC
machines?  Do compiler witers stick to "standardized" calling
mechanisms?

-michael j zehr

thant@horus.SGI.COM (Thant Tessman) (05/16/89)

In article <11401@bloom-beacon.MIT.EDU>, tada@athena.mit.edu (Michael Zehr) writes:
> 
[...]

> 
> Someone also mentioned being able to reuse components in different
> languages.  I agree that this is crucial in combining efficiency with
> productivity.  So far, my experience with mixed-language coding is
> that it is very easy on Vaxes (becuase of the forced procedure
> interface of the CALLS instruction) but extremely difficult on IBM
> mainframes (because there is no standard calling sequence).  Does
> anyone have any experience in mixed-language coding on newer RISC
> machines?  Do compiler witers stick to "standardized" calling
> mechanisms?
> 
> -michael j zehr

Silicon Graphics uses RISC processors from MIPS (first workstation to do so).
Fortran can call C and C can call Fortran.  (and C++, and Ada, and Pascal)
It's done all the time.  Our Graphics Library works that way.  Not only that,
but the Graphics Library uses a shared library mechanism which allows
executables to run on machines with different graphics hardware without
recompiling (again, first (and only?) workstation to do so).

thant@sgi.com "don't quote me, they're not paying me for that"

billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe,2847,) (05/16/89)

From tada@athena.mit.edu (Michael Zehr):
> There have been a couple of good comments about using software
> components, but so far most of them have been along the lines of
> "such-and-such a language does this."  Could you describe
> *how* the language provides both efficiency and quick building time?  
> 
> For example, one person mentioned Ada's "re-binding" of
> executables.  What does this do?  

    Briefly, Ada provides a number of excellent facilities for
    partitioning a problem into modular subsystems, in order to
    facilitate the construction of the subunits in an independent
    manner.  One such facility is known as "separate compilation",
    whereby first the interface to a procedure or function is described:

      procedure Handle_This_Task (With_Parameter : in out Parameter_Type)
         is separate;

    Now we can compile the code which contains this specification 
    without ever actually implementing the Handle_This_Task procedure.
    Eventually, before we bind everything into an executable, a procedure
    must actually be coded to satisfy this specification, but we can 
    defer it until the rest of the code has been written and debugged.
    Moreover, if we wish to change something in that procedure's code,
    we can change and recompile only that procedure, without having to
    recompile the entire system.

    Another mechanism is the "package".  One use of packages is to
    describe specifications (*compilable* specifications) for subsystems
    which collectively interact to form a larger system.  Once the
    high-level specifications have been compiled together, the information
    flows across the subsystem interfaces have been verified for completeness
    and consistency.  The project can then be "farmed out" to individual
    programmers, who will implement each subsystem independently of what
    everyone else is doing.  The package specification forms a contract
    between the implementor and the other project members; if everyone
    fulfulls their end of the deal, then the entire system is guaranteed
    to work as far as information flows are concerned.  The implementor
    constructs a package body, which is said to be "compiled against" the
    package specification, or "spec".  So everyone goes about their
    business, implementing and testing their individual subsystems, 
    (testing of subsystems can be accomplished by first developing them
    in temporary isolation from the system, and later compiling them 
    against the system specs) and finally when everyone is done, the 
    compiled units are *bound* together into an executable for integrated
    testing.  Now suppose that a problem is found in one subsystem.  That
    problem is corrected, and only that subsystem (and possibly only a
    small portion of that subsystem, if the subsystem has been internally
    partitioned as well) will require recompilation.  Once the affected
    area of the system is recompiled, then the system is bound once again
    into a new executable for further testing.  The technical mechanism by
    which this is accomplished is known as the "library" system; each program
    is a member of some program library or sublibrary, and when a program
    (or "program unit", or "unit") is compiled into a library, the library
    stores some intermediate form(s) of the program.  Upon recompilation,
    the library simply updates its intermediate form(s).  If there is an
    optimization flag turned on, then the library stores special information 
    which will be used by the optimizer during the binding phase.  Now when 
    binding occurs, the binder scans through the libraries for the desired
    program units, and binds them together into a piece of executable code,
    perhaps exploiting the knowledge accumulated during the compilation
    phase in order to optimize the executable code.  Although packages 
    present boundaries to the programmer, an optimizer can corrupt any and 
    all boundaries in pursuit of a more efficient executable, if global 
    optimization is desired.  Normally the work of optimization is not
    done until very near the end, during or after execution profiling. 

    The library system is also important in that it permits the reuse of
    code (a software component needs to be compiled into only one library,
    and many other users can then make use of it); it also facilitates
    the tracking of the impact of a modification, in that a change in the
    specification of a component which is used by other users will require
    the recompilation of the relevant areas of code which made use of the
    modified unit.  This enables the rapid identification of the impact of
    the modification, and allows the tracking of change as it spreads 
    throughout the system.  There is an automatic recompilation facility 
    which can be used to automatically recompile all dependent units if it is
    known that the change will not require changes to the dependent units;
    automatic recompilation will also identify (in the form of compilation
    errors) specific places in which the modified specification impacts the
    dependent units (e.g., a dependent unit's attempt to call a procedure
    which was just removed from the specification).

> Someone also mentioned being able to reuse components in different
> languages.  I agree that this is crucial in combining efficiency with
> productivity.  So far, my experience with mixed-language coding is
> that it is very easy on Vaxes (becuase of the forced procedure
> interface of the CALLS instruction) but extremely difficult on IBM
> mainframes (because there is no standard calling sequence).  Does
> anyone have any experience in mixed-language coding on newer RISC
> machines?  Do compiler witers stick to "standardized" calling
> mechanisms?

    Ada makes direct provision for interfacing to other languages; 
    the only implementation dependencies are whether or not the
    compiler you are using supports calls to that particular language.
    Typically, compilers support calls to at least several other 
    languages, and the means of use is via the standardized 
    interfacing mechanism defined by the Ada programming language.
    Special provision is also made for machine code insertions;
    as with interfacing to other languages, there is direct support 
    for machine code insertion in the Ada language, and the only
    question is whether the particular compiler you are using 
    provides the service.  Although the compilers are not required to
    provide machine code insertion either, many if not most of them do so. 

  
    Bill Wolfe, wtwolfe@hubcap.clemson.edu

rpw3@amdcad.AMD.COM (Rob Warnock) (05/16/89)

In article <11401@bloom-beacon.MIT.EDU> tada@athena.mit.edu writes:
+---------------
|            ... So far, my experience with mixed-language coding is
| that it is very easy on Vaxes (becuase of the forced procedure
| interface of the CALLS instruction) but extremely difficult on IBM
| mainframes (because there is no standard calling sequence).  Does
| anyone have any experience in mixed-language coding on newer RISC
| machines?  Do compiler witers stick to "standardized" calling
| mechanisms?
+---------------

One of the things AMD tried very hard to do with the Am29000 RISC CPU
was to try to define a single subroutine calling sequence that *at least*
C, Ada, Pascal, FORTRAN, & COBOL could live with. This required some
tradeoffs, since each language requires slightly different environments
around a subroutine call (especially Pascal/Ada "displays").

The compromise reached was this: The C calling sequence is the base set,
and the most efficient. For the C linkage, "live" values in the "local"
registers (gr128-gr255 == lr0-lr127) are callee-saved (by opening a new
register-window context), whereas values in the "global" registers (gr95-127)
are callee-destroyable. Specific additional inter-module resources are
allocated in the global regs for other languages, but any "live" values
in these resources are *caller*-saved, though passed to the callee.

So, as a specific example, if a Pascal routine makes a call, it puts the
"display" pointer in a known global, which is used by a Pascal sub-routine
but (probably) destroyed by a C subroutine (which could care less about
displays). Compilers which are smart enough to do minimal inter-procedural
analysis can avoid saving/restoring these additional resources if the calls
from a given routine are all to compatible sub-routines, and emit the save/
restore code if not.

It seems to have worked. There was a lot of grumbling from the Ada vendors
(they wanted *lots* more "preserved" globals!), but reasonably efficient
mechanisms were worked out. [AMD has an app note available on the "Am29000
Standard Subroutine Calling Sequence", if anyone cares for more details.
Call your salescritter, not me...]

To date, there are 5 C compilers [soon 6], 2 Ada's, a Pascal, and [I think]
a FORTRAN and a COBOL, from various 3rd-party vendors, all of which use the
same calling sequence and which can interoperate [within the restrictions of
the environments... that is, it's easy to call C from Ada, but to go the other
way you have to go through a library routine to set up the Ada environment].
They all use the same COFF file format, too.

So, yes, the benefits obtained from things like the VAX "CALLS" can
be attained in the RISC world, but the discipline has to come from the
software, not the hardware. [Which is not surprising, considering the
general RISC philosophy.] And I think most of the current RISC chip
vendors are aware of this...


Rob Warnock
Systems Architecture Consultant

UUCP:	  {amdcad,fortune,sun}!redwood!rpw3
DDD:	  (415)572-2607
USPS:	  627 26th Ave, San Mateo, CA  94403

ted@nmsu.edu (Ted Dunning) (05/19/89)

In article <5490@hubcap.clemson.edu> billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe,2847,) writes:

   Reply-To: billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu
   Lines: 108

   From tada@athena.mit.edu (Michael Zehr):
     There have been a couple of good comments about using software
     components, but so far most of them have been along the lines of
     "such-and-such a language does this."  Could you describe *how*
     the language provides both efficiency and quick building time?
     For example, one person mentioned Ada's "re-binding" of
     executables.  What does this do?

	... many good descriptions of separate compilation ...

       Another mechanism is the "package".  One use of packages is to
       describe specifications (*compilable* specifications) for
       subsystems which collectively interact to form a larger system.
       Once the high-level specifications have been compiled together,
       the information flows across the subsystem interfaces have been
       verified for completeness and consistency.

this `checking for completeness and consistency' is actually nothing
more than a static type check.  since in a large application, the
constraints on the interface between packages is considerably more
complex than just having the communication merely be of the correct
type, this should not be characterized as a full verification.

	... more good comments are modular development ...

       The package specification forms a contract between the
       implementor and the other project members; if everyone fulfulls
       their end of the deal, then the entire system is guaranteed to
       work as far as information flows are concerned.

again, this checking is far from complete (as it must be at the
current state of the art).  satisfaction of the type constraints is a
BIG step along the way toward making things work smoothly, but there
are an enormous number of constraints in a real program that cannot
reasonably be expressed as type constraints.  some, such as the sum of
the squares of the first two arguments is greater than the square of
the third, could be effectively checked by assertions.  other aspects
of the interface cannot even theoretically be checked by a compiler
(such as the values returned by a function will be returned with 10
ms. of a call).

again, type checking is almost always a GOOD THING, but it is also
almost always not enough to verify that the communication between
parts of programs are correct.

	... more good comments about the benefits of libraries and
	separate compilation ...

   Someone also mentioned being able to reuse components in
   different languages.  I agree that this is crucial in combining
   efficiency with productivity.  So far, my experience with
   mixed-language coding is that it is very easy on Vaxes (becuase
   of the forced procedure interface of the CALLS instruction) but
   extremely difficult on IBM mainframes (because there is no
   standard calling sequence).  Does anyone have any experience in
   mixed-language coding on newer RISC machines?  Do compiler witers
   stick to "standardized" calling mechanisms?

	... good stuff about the ada party line ...


----------------------------------------------------------------

and now for the view from the rest of the world.

type checking is a good thing.  separate compilation is a good thing.
libraries and linkers that understand type checking are a good thing.

none of these are particular to ada.  some or all of these can be
found in ansii c, commercially available pascals, c++ and other
languages.  some of these solutions are partitioned differently from
the way specified by ada orthodoxy, often with the result of more
flexibility.  this is the case with make.

one place that most of these tools fall down at the current time is
with program generators and preprocessors.  the requirement that we
specify interfaces completely in terms of base language leads to very
serious awkwardness if when we try to replace the functionality of
yacc or make heavy use of precompilation as do the internals of the
gnu c compiler.

the failure of tools in the face of language extension is an issue
that has been long recognized in lisp circles.  only recently are
conventional languages being used (and shown insufficient to the task)
in such advanced applications.  a serious case in point is the x
toolkit as implemented in c.

jesup@cbmvax.UUCP (Randell Jesup) (06/02/89)

In article <2927@cps3xx.UUCP> rang@cpswh.cps.msu.edu (Anton Rang) writes:
>In article <11293@bloom-beacon.MIT.EDU> tada@athena.mit.edu (Michael Zehr) writes:
>
>   my experience so far has been that the more general a component is, the
>   slower it runs.  for example, given a routine to sort objects, you have
>   to give it a criteria.  somehow, the sorting routine has to make a
>   function call to compare two elements.
>
>Not necessarily true.  Given a good enough development system, the
>compiler can inline the function call (even to the point of it
>becoming a single machine instruction!).  This leads to a slower
>compile, so for test runs it's easier to leave the function call in
>(at the expense of speed).

	You're both right.

	Inlining can help for very small functions, or larger ones at the
cost of a lot of program size (and even speed, in a paging or cached
hardware environment).

	The original poster is right though: as the hardware becomes faster,
and the programs we write become more complex, we are moving to higher
levels of abstraction to keep the complexity from overwhelming us (or making
it too expensive to do).

	Take a program like omega.  Massive unix game written in C, >1Meg
of source.  It would run much faster and be smaller if written (well) in
assembler.  However, it would never have come close to being completed.
We trade off efficiency of execution/size for cost of production (programming).
The same thing occurs in the ever-higher level toolboxes we use in costruct-
ing programs, like XWindows, databases, "software ICs" (still not really
well-understood or very useful yet, though they may well be eventually),
and things like nExt's Interface builder.

	This has happened from the very beginning of the computer industry,
and there have been no indications this will stop in the forseeable future.

-- 
Randell Jesup, Commodore Engineering {uunet|rutgers|allegra}!cbmvax!jesup