tada@athena.mit.edu (Michael Zehr) (05/10/89)
Most of the discussion so far seems to be directed at retrieving components, but there's been very little said about using them. background: i haven't done any programming in ADA, so if there's a definition of component that only applies to ADA, i don't it. i'm basing this on my "feel" of component -- namely a group of code that's intended to for reuse. my experience so far has been that the more general a component is, the slower it runs. for example, given a routine to sort objects, you have to give it a criteria. somehow, the sorting routine has to make a function call to compare two elements. (in a recent article, someone said that this is how they'd like to see a sorting routine implemented. that function call can DOUBLE the time it takes to perform a sort!) for another example, some languages have arrays as their basic data unit. "+" becomes an array operator. but if all you need to do is add two scalar quantities, you're hit with a performance penalty. so far, the trend has largely been towards larger and larger components: software components have moved from machine language instructions to assembly language instructions and macros to high-level-languages to 4th-generation languages; hardware trends have been similar: very simple machine instructions, then simulating new features by subroutines, then hardware to perform new features, continuing with complicated things like MMUs and so on. but recently, the hardware trend has reversed -- use very simple components, with a much more complicated "compile" phase; let the compiler have access to all sorts of little details it didn't have before. (i'm referring to RISC of course.) there's still a lot of room for growth in compiler technology, particularly with global optimization, but this is assuming a few things: 1) the compiler needs to have access to the source code for the component you're using. no problem if you built all the components (although it might mean having to open several 1000 files to look at all the components!) but if you're using components produced by someone else and sold as a package, you won't have this option. (it also impinges on the abstraction level which the component is supposed to be hiding.) 2) the additional time required for the compile phase isn't a problem. 3) ability to recompile everything that uses a component if the implementation of a component changes, even if the interface remains the same. with the current trends in hardware and software (backlog of software applications, steady and fairly rapid growth in processing capability per cost), saving time building the application is more important than reducing the running time of the application. but my experience on applications has led to the conclusion that a lot of performance is being sacrificed for the decrease in application building time. there is a lot of work being done in decreasing the time required to build an application (object-oriented languages, CASE systems, etc.). what current work is being done towards increasing program performance? what sorts of things are *your* company doing towards performance of programs? (at my company, we've build a lot of systems using a fourth generation language, Stratagem (currently sold by Computer Associates). the long term path i'm seeing is coding a prototype in stratagem, finishing it and makin preliminary delivery in stratagem; then, recoding critical parts of the program in C, for rapid numerical calculations, etc. so far it's worked fairly well, but to get that performance there's the penalty of having to write the same thing in two different languages, plus testing/debugging it twice, etc...) do others agree that this is a problem in the software industry, or it is just me? if it is a problem, how should we face it and fix it before it becomes worse? (sorry for the length of this) -michael j zehr
rang@cpsin3.cps.msu.edu (Anton Rang) (05/11/89)
In article <11293@bloom-beacon.MIT.EDU> tada@athena.mit.edu (Michael Zehr) writes: my experience so far has been that the more general a component is, the slower it runs. for example, given a routine to sort objects, you have to give it a criteria. somehow, the sorting routine has to make a function call to compare two elements. Not necessarily true. Given a good enough development system, the compiler can inline the function call (even to the point of it becoming a single machine instruction!). This leads to a slower compile, so for test runs it's easier to leave the function call in (at the expense of speed). for another example, some languages have arrays as their basic data unit. "+" becomes an array operator. but if all you need to do is add two scalar quantities, you're hit with a performance penalty. I don't know any languages with arrays as their basic data type (maybe APL?). In most languages where "+" can be used on arrays, it's just overloading, and the compiler generates the appropriate code based on the type of the objects. [ trends have been largely toward larger, more complex components, both in software and hardware ] but recently, the hardware trend has reversed -- use very simple components, with a much more complicated "compile" phase; let the compiler have access to all sorts of little details it didn't have before. (i'm referring to RISC of course.) Well, there are problems with this. Take an HP-9000, which doesn't have an integer multiply instruction (DISCLAIMER: at least, it didn't when I read a paper on it, if I'm thinking of the right machine). Now suppose that somebody comes up with an extremely fast (hardware) integer multiplication unit. No programs will take advantage of it, because they're all compiled into code to do repeating additions. Similarly on the software end of things, it makes sense to abstract the function of a component at as high a level as possible, so that new algorithms can be effectively used with existing software. (Of course, it's also important to make the components flexible, which complicates things somewhat.) there's still a lot of room for growth in compiler technology, particularly with global optimization, but this is assuming a few things: 1) the compiler needs to have access to the source code for the component you're using. [ thus purchased components are a problem ] No, you don't need source code. You need some kind of intermediate form, though. A generalized RTL, perhaps. 2) the additional time required for the compile phase isn't a problem. You only need to do global optimization when producing the final version. Development versions can probably get by with only local optimization. 3) ability to recompile everything that uses a component if the implementation of a component changes, even if the interface remains the same. This is a problem. However, if you're storing the intermediate code (see my response to point 1), you don't need to do a full compilation: just the optimization (admittedly slow). Code generation can be done ahead of time, essentially. with the current trends in hardware and software (backlog of software applications, steady and fairly rapid growth in processing capability per cost), saving time building the application is more important than reducing the running time of the application. I think this is true. It can take two or three years to build a fairly small microcomputer program, where the difference between (say) 5 minutes and 10 minutes to generate a report can be annoying, but not critical (in many applications). but my experience on applications has led to the conclusion that a lot of performance is being sacrificed for the decrease in application building time. Depends. Using 4GL, object-oriented, etc. tends to hurt performance, at least in my experience. But I spent a few years working at a place with a nice component library (based around VAX Pascal), and we never had performance problems. Going to new languages (4GL/OO) seems to hurt performance if you don't have specialized hardware, especially when the compilers aren't well-done. The first version of TeleSoft Ada I used took 15 minutes or so to compile a small program, and execution time was over three minutes for the 8-queens (if I'm remembering right). The last version of it I've used took about 10 seconds to compile, and executed in 3 seconds. It takes a while for compiler technology to catch up. do others agree that this is a problem in the software industry, or it is just me? if it is a problem, how should we face it and fix it before it becomes worse? Which problem? Performance? I don't think it's a serious problem yet (then again, most companies haven't switched to 4GL or whatever yet). I think people need to start spending a *lot* more time figuring out what they need in environments and languages than they are right now. Anton +---------------------------+------------------------+-------------------+ | Anton Rang (grad student) | "VMS Forever!" | VOTE on | | Michigan State University | rang@cpswh.cps.msu.edu | rec.music.newage! | +---------------------------+------------------------+-------------------+ | Send votes for/against rec.music.newage to "rang@cpswh.cps.msu.edu". | +---------------------------+------------------------+-------------------+
billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe,2847,) (05/11/89)
From article <11293@bloom-beacon.MIT.EDU>, by tada@athena.mit.edu (Michael Zehr): % my experience so far has been that the more general a component is, the % slower it runs. for example, given a routine to sort objects, you have % to give it a criteria. somehow, the sorting routine has to make a % function call to compare two elements. (in a recent article, someone % said that this is how they'd like to see a sorting routine % implemented. that function call can DOUBLE the time it takes to % perform a sort!) Depends on the quality of the compiler being used... > for another example, some languages have arrays as their basic data > unit. "+" becomes an array operator. but if all you need to do is add > two scalar quantities, you're hit with a performance penalty. Ada allows you to "overload" operators; thus, "+" can be defined both as an array operator and as an integer operator, as well as any other types you are capable of dreaming up. The precise function to be applied is determined at compile time, through analysis of the types of parameters involved. Thus, algorithms optimized for one data types can coexist with algorithms optimized for other data types. > 1) the compiler needs to have access to the source code for the > component you're using. no problem if you built all the components > (although it might mean having to open several 1000 files to look at all > the components!) but if you're using components produced by someone else > and sold as a package, you won't have this option. (it also impinges > on the abstraction level which the component is supposed to be hiding.) Not necessarily. The vendor can provide you with a sublibrary which contains the vended components in compiled form, compiled under global optimization mode. The knowledge of the component is stored in the compiler's "notes" which were encoded into the sublibrary. When it is time to globally optimize the final product, the compiler reads its notes in order to obtain the relevant information, and finally the binder binds everything together, and you're done. No source code required; you just have to specify to the vendor which Ada compiler you're using. > 3) ability to recompile everything that uses a component if the > implementation of a component changes, even if the interface remains the > same. This isn't necessary if the interface remains the same. Just re-bind the executables; no recompilation required. (In Ada, at least) Bill Wolfe, wtwolfe@hubcap.clemson.edu
adamsf@cs.rpi.edu (Frank Adams) (05/12/89)
In article <2927@cps3xx.UUCP> rang@cpswh.cps.msu.edu (Anton Rang) writes: >In article <11293@bloom-beacon.MIT.EDU> tada@athena.mit.edu (Michael Zehr) writes: > 2) the additional time required for the compile phase isn't a problem. > >You only need to do global optimization when producing the final >version. Development versions can probably get by with only local >optimization. This isn't quite true. You can get by with only local optimization for MOST of the development cycle, but when you change over to the globally optimized version, you WILL find more bugs. You had better figure on spending some time doing debugging on that version. And, of course, if the compiler/global-optimizer is new, you will likely find some bugs in *it* when you start using it, too. Frank Adams adamsf@turing.cs.rpi.edu
tada@athena.mit.edu (Michael Zehr) (05/14/89)
There have been a couple of good comments about using software components, but so far most of them have been along the lines of "such-and-such a language does this." Could you describe *how* the language provides both efficiency and quick building time? For example, one person mentioned Ada's "re-binding" of executables. What does this do? Someone also mentioned being able to reuse components in different languages. I agree that this is crucial in combining efficiency with productivity. So far, my experience with mixed-language coding is that it is very easy on Vaxes (becuase of the forced procedure interface of the CALLS instruction) but extremely difficult on IBM mainframes (because there is no standard calling sequence). Does anyone have any experience in mixed-language coding on newer RISC machines? Do compiler witers stick to "standardized" calling mechanisms? -michael j zehr
thant@horus.SGI.COM (Thant Tessman) (05/16/89)
In article <11401@bloom-beacon.MIT.EDU>, tada@athena.mit.edu (Michael Zehr) writes: > [...] > > Someone also mentioned being able to reuse components in different > languages. I agree that this is crucial in combining efficiency with > productivity. So far, my experience with mixed-language coding is > that it is very easy on Vaxes (becuase of the forced procedure > interface of the CALLS instruction) but extremely difficult on IBM > mainframes (because there is no standard calling sequence). Does > anyone have any experience in mixed-language coding on newer RISC > machines? Do compiler witers stick to "standardized" calling > mechanisms? > > -michael j zehr Silicon Graphics uses RISC processors from MIPS (first workstation to do so). Fortran can call C and C can call Fortran. (and C++, and Ada, and Pascal) It's done all the time. Our Graphics Library works that way. Not only that, but the Graphics Library uses a shared library mechanism which allows executables to run on machines with different graphics hardware without recompiling (again, first (and only?) workstation to do so). thant@sgi.com "don't quote me, they're not paying me for that"
billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe,2847,) (05/16/89)
From tada@athena.mit.edu (Michael Zehr): > There have been a couple of good comments about using software > components, but so far most of them have been along the lines of > "such-and-such a language does this." Could you describe > *how* the language provides both efficiency and quick building time? > > For example, one person mentioned Ada's "re-binding" of > executables. What does this do? Briefly, Ada provides a number of excellent facilities for partitioning a problem into modular subsystems, in order to facilitate the construction of the subunits in an independent manner. One such facility is known as "separate compilation", whereby first the interface to a procedure or function is described: procedure Handle_This_Task (With_Parameter : in out Parameter_Type) is separate; Now we can compile the code which contains this specification without ever actually implementing the Handle_This_Task procedure. Eventually, before we bind everything into an executable, a procedure must actually be coded to satisfy this specification, but we can defer it until the rest of the code has been written and debugged. Moreover, if we wish to change something in that procedure's code, we can change and recompile only that procedure, without having to recompile the entire system. Another mechanism is the "package". One use of packages is to describe specifications (*compilable* specifications) for subsystems which collectively interact to form a larger system. Once the high-level specifications have been compiled together, the information flows across the subsystem interfaces have been verified for completeness and consistency. The project can then be "farmed out" to individual programmers, who will implement each subsystem independently of what everyone else is doing. The package specification forms a contract between the implementor and the other project members; if everyone fulfulls their end of the deal, then the entire system is guaranteed to work as far as information flows are concerned. The implementor constructs a package body, which is said to be "compiled against" the package specification, or "spec". So everyone goes about their business, implementing and testing their individual subsystems, (testing of subsystems can be accomplished by first developing them in temporary isolation from the system, and later compiling them against the system specs) and finally when everyone is done, the compiled units are *bound* together into an executable for integrated testing. Now suppose that a problem is found in one subsystem. That problem is corrected, and only that subsystem (and possibly only a small portion of that subsystem, if the subsystem has been internally partitioned as well) will require recompilation. Once the affected area of the system is recompiled, then the system is bound once again into a new executable for further testing. The technical mechanism by which this is accomplished is known as the "library" system; each program is a member of some program library or sublibrary, and when a program (or "program unit", or "unit") is compiled into a library, the library stores some intermediate form(s) of the program. Upon recompilation, the library simply updates its intermediate form(s). If there is an optimization flag turned on, then the library stores special information which will be used by the optimizer during the binding phase. Now when binding occurs, the binder scans through the libraries for the desired program units, and binds them together into a piece of executable code, perhaps exploiting the knowledge accumulated during the compilation phase in order to optimize the executable code. Although packages present boundaries to the programmer, an optimizer can corrupt any and all boundaries in pursuit of a more efficient executable, if global optimization is desired. Normally the work of optimization is not done until very near the end, during or after execution profiling. The library system is also important in that it permits the reuse of code (a software component needs to be compiled into only one library, and many other users can then make use of it); it also facilitates the tracking of the impact of a modification, in that a change in the specification of a component which is used by other users will require the recompilation of the relevant areas of code which made use of the modified unit. This enables the rapid identification of the impact of the modification, and allows the tracking of change as it spreads throughout the system. There is an automatic recompilation facility which can be used to automatically recompile all dependent units if it is known that the change will not require changes to the dependent units; automatic recompilation will also identify (in the form of compilation errors) specific places in which the modified specification impacts the dependent units (e.g., a dependent unit's attempt to call a procedure which was just removed from the specification). > Someone also mentioned being able to reuse components in different > languages. I agree that this is crucial in combining efficiency with > productivity. So far, my experience with mixed-language coding is > that it is very easy on Vaxes (becuase of the forced procedure > interface of the CALLS instruction) but extremely difficult on IBM > mainframes (because there is no standard calling sequence). Does > anyone have any experience in mixed-language coding on newer RISC > machines? Do compiler witers stick to "standardized" calling > mechanisms? Ada makes direct provision for interfacing to other languages; the only implementation dependencies are whether or not the compiler you are using supports calls to that particular language. Typically, compilers support calls to at least several other languages, and the means of use is via the standardized interfacing mechanism defined by the Ada programming language. Special provision is also made for machine code insertions; as with interfacing to other languages, there is direct support for machine code insertion in the Ada language, and the only question is whether the particular compiler you are using provides the service. Although the compilers are not required to provide machine code insertion either, many if not most of them do so. Bill Wolfe, wtwolfe@hubcap.clemson.edu
rpw3@amdcad.AMD.COM (Rob Warnock) (05/16/89)
In article <11401@bloom-beacon.MIT.EDU> tada@athena.mit.edu writes: +--------------- | ... So far, my experience with mixed-language coding is | that it is very easy on Vaxes (becuase of the forced procedure | interface of the CALLS instruction) but extremely difficult on IBM | mainframes (because there is no standard calling sequence). Does | anyone have any experience in mixed-language coding on newer RISC | machines? Do compiler witers stick to "standardized" calling | mechanisms? +--------------- One of the things AMD tried very hard to do with the Am29000 RISC CPU was to try to define a single subroutine calling sequence that *at least* C, Ada, Pascal, FORTRAN, & COBOL could live with. This required some tradeoffs, since each language requires slightly different environments around a subroutine call (especially Pascal/Ada "displays"). The compromise reached was this: The C calling sequence is the base set, and the most efficient. For the C linkage, "live" values in the "local" registers (gr128-gr255 == lr0-lr127) are callee-saved (by opening a new register-window context), whereas values in the "global" registers (gr95-127) are callee-destroyable. Specific additional inter-module resources are allocated in the global regs for other languages, but any "live" values in these resources are *caller*-saved, though passed to the callee. So, as a specific example, if a Pascal routine makes a call, it puts the "display" pointer in a known global, which is used by a Pascal sub-routine but (probably) destroyed by a C subroutine (which could care less about displays). Compilers which are smart enough to do minimal inter-procedural analysis can avoid saving/restoring these additional resources if the calls from a given routine are all to compatible sub-routines, and emit the save/ restore code if not. It seems to have worked. There was a lot of grumbling from the Ada vendors (they wanted *lots* more "preserved" globals!), but reasonably efficient mechanisms were worked out. [AMD has an app note available on the "Am29000 Standard Subroutine Calling Sequence", if anyone cares for more details. Call your salescritter, not me...] To date, there are 5 C compilers [soon 6], 2 Ada's, a Pascal, and [I think] a FORTRAN and a COBOL, from various 3rd-party vendors, all of which use the same calling sequence and which can interoperate [within the restrictions of the environments... that is, it's easy to call C from Ada, but to go the other way you have to go through a library routine to set up the Ada environment]. They all use the same COFF file format, too. So, yes, the benefits obtained from things like the VAX "CALLS" can be attained in the RISC world, but the discipline has to come from the software, not the hardware. [Which is not surprising, considering the general RISC philosophy.] And I think most of the current RISC chip vendors are aware of this... Rob Warnock Systems Architecture Consultant UUCP: {amdcad,fortune,sun}!redwood!rpw3 DDD: (415)572-2607 USPS: 627 26th Ave, San Mateo, CA 94403
ted@nmsu.edu (Ted Dunning) (05/19/89)
In article <5490@hubcap.clemson.edu> billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe,2847,) writes:
Reply-To: billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu
Lines: 108
From tada@athena.mit.edu (Michael Zehr):
There have been a couple of good comments about using software
components, but so far most of them have been along the lines of
"such-and-such a language does this." Could you describe *how*
the language provides both efficiency and quick building time?
For example, one person mentioned Ada's "re-binding" of
executables. What does this do?
... many good descriptions of separate compilation ...
Another mechanism is the "package". One use of packages is to
describe specifications (*compilable* specifications) for
subsystems which collectively interact to form a larger system.
Once the high-level specifications have been compiled together,
the information flows across the subsystem interfaces have been
verified for completeness and consistency.
this `checking for completeness and consistency' is actually nothing
more than a static type check. since in a large application, the
constraints on the interface between packages is considerably more
complex than just having the communication merely be of the correct
type, this should not be characterized as a full verification.
... more good comments are modular development ...
The package specification forms a contract between the
implementor and the other project members; if everyone fulfulls
their end of the deal, then the entire system is guaranteed to
work as far as information flows are concerned.
again, this checking is far from complete (as it must be at the
current state of the art). satisfaction of the type constraints is a
BIG step along the way toward making things work smoothly, but there
are an enormous number of constraints in a real program that cannot
reasonably be expressed as type constraints. some, such as the sum of
the squares of the first two arguments is greater than the square of
the third, could be effectively checked by assertions. other aspects
of the interface cannot even theoretically be checked by a compiler
(such as the values returned by a function will be returned with 10
ms. of a call).
again, type checking is almost always a GOOD THING, but it is also
almost always not enough to verify that the communication between
parts of programs are correct.
... more good comments about the benefits of libraries and
separate compilation ...
Someone also mentioned being able to reuse components in
different languages. I agree that this is crucial in combining
efficiency with productivity. So far, my experience with
mixed-language coding is that it is very easy on Vaxes (becuase
of the forced procedure interface of the CALLS instruction) but
extremely difficult on IBM mainframes (because there is no
standard calling sequence). Does anyone have any experience in
mixed-language coding on newer RISC machines? Do compiler witers
stick to "standardized" calling mechanisms?
... good stuff about the ada party line ...
----------------------------------------------------------------
and now for the view from the rest of the world.
type checking is a good thing. separate compilation is a good thing.
libraries and linkers that understand type checking are a good thing.
none of these are particular to ada. some or all of these can be
found in ansii c, commercially available pascals, c++ and other
languages. some of these solutions are partitioned differently from
the way specified by ada orthodoxy, often with the result of more
flexibility. this is the case with make.
one place that most of these tools fall down at the current time is
with program generators and preprocessors. the requirement that we
specify interfaces completely in terms of base language leads to very
serious awkwardness if when we try to replace the functionality of
yacc or make heavy use of precompilation as do the internals of the
gnu c compiler.
the failure of tools in the face of language extension is an issue
that has been long recognized in lisp circles. only recently are
conventional languages being used (and shown insufficient to the task)
in such advanced applications. a serious case in point is the x
toolkit as implemented in c.
jesup@cbmvax.UUCP (Randell Jesup) (06/02/89)
In article <2927@cps3xx.UUCP> rang@cpswh.cps.msu.edu (Anton Rang) writes: >In article <11293@bloom-beacon.MIT.EDU> tada@athena.mit.edu (Michael Zehr) writes: > > my experience so far has been that the more general a component is, the > slower it runs. for example, given a routine to sort objects, you have > to give it a criteria. somehow, the sorting routine has to make a > function call to compare two elements. > >Not necessarily true. Given a good enough development system, the >compiler can inline the function call (even to the point of it >becoming a single machine instruction!). This leads to a slower >compile, so for test runs it's easier to leave the function call in >(at the expense of speed). You're both right. Inlining can help for very small functions, or larger ones at the cost of a lot of program size (and even speed, in a paging or cached hardware environment). The original poster is right though: as the hardware becomes faster, and the programs we write become more complex, we are moving to higher levels of abstraction to keep the complexity from overwhelming us (or making it too expensive to do). Take a program like omega. Massive unix game written in C, >1Meg of source. It would run much faster and be smaller if written (well) in assembler. However, it would never have come close to being completed. We trade off efficiency of execution/size for cost of production (programming). The same thing occurs in the ever-higher level toolboxes we use in costruct- ing programs, like XWindows, databases, "software ICs" (still not really well-understood or very useful yet, though they may well be eventually), and things like nExt's Interface builder. This has happened from the very beginning of the computer industry, and there have been no indications this will stop in the forseeable future. -- Randell Jesup, Commodore Engineering {uunet|rutgers|allegra}!cbmvax!jesup