masotti@usc.edu (Glauco Masotti) (10/19/89)
A TUTORIAL INTRODUCTION TO EC++: EXTENDED C++ by Glauco Masotti(*) University of Southern California, Los Angeles September, 1989. (*) The Author is also with Ferrari Engineering, Modena, Italy. SUMMARY. EC++ stands for "Extended C++" and is an attempt to overcome some of the difficulties encountered working with C++, to make programming large systems somewhat more enjoyable. EC++ implements some extensions to the C++ language and provides an environment to support: - assertion checking, in the form of function preconditions and postconditions, and class invariants; - parameterized classes; - exception handling; - garbage collection. The extensions to the language are translated by a preprocessor (which is itself called EC++) into plain C++. The overall environment, is set up with an additional small number of files that define some required macros and classes that will be used by the applications. MOTIVATIONS: PROBLEMS AND WEAKNESSES OF C++. Assertions. Assertions are essentially logic statements that should be be verified at certains points in the code. Examples are invariants that must be satisfied by objects of a class at all times; preconditions that must be satified whenever a routine (function of any type) is called; postconditions that must guaranteed to be true after routine completion. A good presentation of the use of assertions can be found in [2]. No support is provided in C++ to express assertions in a disciplined way, and no emphasis has been put on them by other languages, except Eiffel. However, judging on my past experience with large software projects, and the present experience with EC++, assertions are very useful to write correct and robust software. Assertions have to be considered as an integral part of the source code. First of all, they play the role of documentation of the contract that controls the relationship between a function and its users: the precondition binds the users, the postcondition binds the function. This make clear what are the assumed working conditions for a function and whose responsibility it is to provide or ascertain required conditions. Making software coherent with the specified assertions therefore will require the necessary checks, but also eliminates wildly diffuse checking tests thoughout the code, which is what happens when the contract is not specified. Second, they enable us to express formal properties of classes and routines. This use of assertions may help particularly in the high-level design phase of a system, in combination with the use of abstract or virtual functions. Details of implementation will be provided only later, but requirements and properties of fundamental classes and relevant effects of functions can be defined, through assertions, at earlier stages of design. Third, by monitoring assertions at run-time we can verify the correctness of our code. Many programming errors are pointed out immediately by violations of some assertion. Therefore we can save considerable time in testing and debugging. Fourth, assertions are a means for guaranteeing coherence among different modules of a large program. We can use them to ensure non contradictory results of different functions and proper working conditions for each module. Libraries, reusable modules and parameterized types. While it is relatively easy in C++ to write functions of classes that depend on a single underlying type, e.g. the case of class Triangle and class Rectangle that inherit from a class Polygon, it is, however difficult to write functions that depend only on the general concepts of a container (sets, vectors, lists, tables, etc.) rather than on properties of individual containers [3]. This is the basic weakness that prevents writing more general reusable libraries and modules in C++. If polymorphic parameterized classes were available, these classes could be written just once, and instantiated to the actual needed types to provide code sharing and reusability. C++ may fake parameterized types using macros ([1] par. 7.3.5), but this approach doesn't work well except on a small scale. In fact lines of thousands of characters are generated, by the standard C++ preprocessor, even for simple classes. Moreover use of macros is cumbersome in discovering compile errors and in debugging. Another approach suggested for use of the GNU C++ Library [5], still places some burden on the programmer who wants to use generic classes. It also fails to reuse code, as code is duplicated, instead of resorting to the technique of using generic interfaces to a common implementation as suggested in [1]. Exception handling. During program execution a function may encounter a situation where an error occurs or something strange happens. If the problem is detected, some attempt to recover a manageable situation may be done, or otherwise a failure should be reported. If the problem is not detected or ignored the program may either crash or report incorrect results. In the most lucky cases, things may be patched up locally, but in many cases the routine that detects the problem cannot cope with the abnormal situation locally; an exception detected in one context may require giving up the current activity and performing some action in another context. A number of ad hoc techniques have been until now devised to deal with exceptional conditions: use of status variables, returning error values back along the chain of function calls, use of longjmp to reach a context where things can be recovered directly. This may involve restoring certain conditions and the problem is often complicated by memory management. In particular in C++ we have the problem of calling the destructors for all created and no more needed objects [3]. In any case coping with a particular kind of exception is tedious but feasible, coping with all possible exceptions leads to an explosion in code size and complexity, and is itself an error-prone activity, that is moreover difficult to test and debug, for the intrinsic reason that exceptions are usually unlikely situations. Not coping with exceptions (as is too often done) leads to programs that crash. Most of the popular languages however, as well as C++, don't offer any exception handling mechanism or disciplined procedure. Automatic storage management. C++ inherits its memory management scheme from C. Storage for static and automatic objects is managed automatically, but object allocated in the free store must be explicitly deallocated, before the space can be reused. Manual memory management usually provides efficiency in time and space but is a burden for the programmer since it increases considerably the complexity of code and is a common source of tricky bugs. Automatic garbage collection has not been put as part of the language in the belief that it is too much expensive. Some schemes indeed, need to maintain additional information in order to do garbage collection. Therefore they place an overhead, both in time and space, in an executing program; the representation of an object in memory takes more space, and accessing the objects takes more time than in a language without automatic garbage collection. However as Boehm has demonstrated in his work [4], there are also conservative schemes that don't need any particular assumption, and that don't pose serious efficiency problems. The most pertinent observation here is that as long as we can tolerate virtual memory, we should tolerate automatic garbage collection as well. Run-time and memory utilization however are not the only considerations to have in mind. If done at a reasonable cost, automatic garbage collection can yield substantial benefits in design and development time, as well in debugging and maintenance effort. It can also be a necessary key feature, in the design of general reusable modules, and in simplifying the exception handling problem. In fact in the former case without a system taking care of memory management automatically it would not be possible to put responsibility for it in either the general modules or their users, and in the latter case it would not be possible to call all the necessary destructors for the objects created to cope with an abnormal situation, as seen before. However if freeing memory is all that destructors have to do (as is often the case), with automatic garbage collection we don't need to care about the problem. THE EC++ SOLUTION. Assertions. In EC++ we can express assertions in the form of class invariants and function pre and post conditions. A class invariant is defined like any other member function, but it uses the name "invariant", which is a reserved word in EC++. Function pre and post conditions must be specified after the declaration of a function and prior to its body. A function precondition is prefixed by >> (reminiscent of an input condition) and the conditional expression is enclosed in parentheses. Any valid conditional expression that may be written at the beginning of the body of the routine is allowed. A function postcondition is prefixed by << (reminiscent of an output condition) and enclosed in parentheses. Valid expressions are those that may be expressed prior to every possible return path from the function. Besides using the variables that are accessible in the scope of the function, an OLD specification may be used, to mean the value that a certain expression had upon entering the routine. We will refer to some examples describing some of the member functions of the following (simplified) class: typedef void* word; class WList { /* A general container with list-like capabilities, implemented as a dynamic array of elements of the same type. It can contain elements that fit in a word. */ friend class WList_iterator; public: WList(); void append(word, int); void insert(int, word, int); ... void remove(int, int); int length(); word& operator[](int); int invariant() {return len>=0 && len<=size);} protected: short int len; short int size; word* root; /* pointer to a vector of words in the free-store */ private: word* resize(int); void grow_or_create_maybe(int); void shrink_or_delete_maybe(int); void shift_forward_from(int); void shift_backward_after(int); }; Memory is allocated in chunks of fixed size, when needed. It is worth noting at this point the invariant of the class, that asserts the fundamental property that relates the length of an object (i.e. the number of elements stored) and its size (the allocated storage). If an "invariant" function is not defined for a class, and invariant checking is activated, a global "invariant" function (returning always 1) will be called. The following examples should explain the admissable syntax for function pre and post conditions: word* WList::resize(int newsize) >>(root != 0 && size > 0 && newsize > 0) <<(newptr != 0 && *newptr == OLD(*root)) { word* newptr; newptr = new word[newsize]; word* p; word* q; int min = (size<newsize) ? size : newsize; for(p=root, q=newptr; p-root < min; p++, q++) *q=*p; return newptr; } void WList::grow_or_create_maybe(int stp) >>( (root != 0 && size > 0) || (root == 0 && size == 0) ) <<( root != 0 && size >= OLD(size) ) { if( root == 0 ) { /* Create! */ root = new word[stp]; size=stp; } else if( len+1 > size ) { /* Space too small => resize */ root = resize(size+stp); size+=stp; } len++; } Given the previous definition of the function "resize" the preprocessor produces C++ code, that (omitting the #line directives and other details) looks like this: #define CONTEXT__ "word* WList::resize(int newsize)" word* WList::resize(int newsize) { PRECONDITION(root != 0 && size > 0 && newsize > 0) DECLARE_OLDS( \ typeof(*root) old__root = *root; \ ) word* newptr; newptr = new word[newsize]; word* p; word* q; int min = (size<newsize) ? size : newsize; for(p=root, q=newptr; p-root < min; p++, q++) *q=*p; { POSTCONDITION(newptr != 0 && *newptr == old__root) INVARIANT() return newptr; } } The macros are defined in an include file "globals/ASSERTIONS.h" in such a way to be expanded as follows if the corresponding condition checking is requested: #define PRECONDITION( CONDITION ) \ if( !(CONDITION) ) badNews(NOT_PRECOND, CONTEXT__); #define DECLARE_OLDS( OLDS ) \ OLDS #define INVARIANT() \ if( !invariant() ) badNews(NOT_INVAR, CONTEXT__); #define POSTCONDITION( CONDITION ) \ if( !(CONDITION) ) badNews(NOT_POSTCOND, CONTEXT__); \ INVARIANT(); One has separate control on each case defining the macro names: CHECK_PRECONDITION CHECK_POSTCONDITION CHECK_INVARIANT If these macros are not defined the corresponding condition checking macro is expanded in a blank, and therefore no code is generated. If some assertion checking is activated and an assertion fails to be verified, the function "badNews" is called. In the simplest case this function limits itself to printing some error information along with the name of the function where the failure has been detected (this is passed in the variable CONTEXT__), but in general it may do better if an exception handling procedure is defined, as we will see later. Support for parametric classes. Given the previous definition of the container class WList, a parametric interface to it may be provided to use the container for different kind of objects, following the scheme at par. 7.3.5 in [1]. The generic interface class may be written in dependence of a parameter type <T> as follows: #typedef #include "containers/WList.h" extern int step<T>; extern void initializeWList<T>(int); struct WList<T> : WList { /* A generic parameterized container with list-like capabilities, implemented as a dynamic array of elements of type <T>. Elements of type <T> must fit in a word. */ friend class WList_iterator<T>; WList<T>() : WList() {} void append(<T> w) {WList::append((word)w, step<T>);} void insert(int i, <T> w) {WList::insert(i, (word)w, step<T>);} ... ... void remove(int i) {WList::remove(i, step<T>);} int length() {return WList::length();} <T>& operator[](int i) {return (<T>&) WList::operator[](i);} }; Note the "#typedef" keyword, that will be used later in the instantiation of this parametric definition, and the use of generic identifiers, depending on the parameter <T>. This generic class may be used as follows: #include <String.h> #include "generics/WList<String*>.h" struct Sentence : WList<String*> { Sentence(); void input_sentence(); void print_sentence(); ... private: void print_current(String*); }; Sentence::Sentence() : WList<String*>() {}; void Sentence::input_sentence() { String* s; do { ... append(s); } while(*s != "."); remove(len-1); } Parametric classes may therefore be instantiated to a particular type and parametric identifiers may be used. An arbitrary type (given certain constraints) may be used and placed between "<...>" in all the places where a match with the corresponding generic parameter <T> in the generic class is possible. In order to get this working no manual operation is necessary, the preprocessor of EC++ will take care of all that is needed: 1) First of all the conventional expressions of a parametric identifier must be expanded in the source files that use parametric classes, in order to obtain valid C++ identifiers for the parametric names. In our case the source files are transformed this way: #include <String.h> #include "generics/WListString_.H" struct Sentence : WListString_ { Sentence(); void input_sentence(); void print_sentence(); ... private: void print_current(String*); }; Sentence::Sentence() : WListString_() {}; void Sentence::input_sentence() { String* s; do { ... append(s); } while(*s != "."); remove(len-1); } The substring "String_" take the place of "<String*>" in the parametric names, and is concatenated with the rest of the name. As is shown in the example, besides simple types, pointer types may constitute valid parameter types for EC++. In the actual implementation more complex parameter type will require a typedef statement, but these should be quite uncommon cases. 2) We have to generate the files that define the required instance of the generic class. In the previous example these files are "generics/WListString_.h" and "generics/WListString_.cc" (the suffixes .H and .C will then placed by the final preprocessing step). In our example the code of the instantiated interface class, generated by the preprocessor, is: // WListString_.h: typedef String* String_; #include "containers/WList.h" extern int stepString_; extern void initializeWListString_(int); struct WListString_ : WList { /* A container with list-like capabilities, implemented as a dynamic array of elements of type String_. Type String_ must fit in a word. */ friend class WList_iteratorString_; WListString_() : WList() {} void append(String_ w) {WList::append((word)w, stepString_);} void insert(int i, String_ w) {WList::insert(i, (word)w, stepString_);} ... ... void remove(int i) {WList::remove(i, stepString_);} int length() {return WList::length();} String_& operator[](int i) {return (String_&) WList::operator[](i);} }; // WListString_.cc: #include "WListString_.h" int stepString_; void initializeWListString_(int step) {stepString_ = step;} To automatically produce these files, the EC++ preprocessor edits the makefiles that perform the preprocessing and compilation steps in the directory where the generic class is defined. The user should initially provide only the basic pattern (which is part of the EC++ distibution) for the makefiles in this directory. Processing a file that uses a parametric class, in case a new instance is required, EC++ adds new targets to the makefiles, and instructions to produce the new files via text substitution from the original generic class definition. In order to make the process of getting an executable system coherent, it is assumed, using EC++, that the make process is split into two steps: a preprocess step that eventually generates some new instances and produces all the necessary transformations on the original source files, and a final compilation and linking step. Support for exception handling. EC++ provides a mechanism for dealing with exceptions. The following are considered exceptions: 1) failed assertions; 2) hardware or system generated signals; 3) detection, in the user code, of an abnormal condition that cause an explict call to the "Help!" facility. The construct "Help!" is reserved and it causes execution of the code of the active "rescue clause". In EC++ a rescue clause may be specified after the declaration and prior to the body of a function (like assertions). It consists of a block of statements that could be allowed at the beginning of the function body. Let us suppose we have a certain number of functions for which a rescue clause has been specifed. During program execution the active rescue clause is the last seen in the stack. If at a certain point an exception is detected, arising any of the exceptional conditions above, then control is transferred to the code of the actual active rescue clause. This code may specify also a program exit, if the situation is considered unrecoverable. If no tranfer of control is specified, normally the routine body is reentered from the beginning, and execution is retried. The active rescue clause may not belong to the current executing function, therefore an exception may involve unraveling the stack until the function where the rescue is specified. This may be dangerous, it involves a longjmp and is accompanied by the problem of the destructors that will not be called. The standard assumption of EC++, however, is that automatic garbage collection is active, and this virtually eliminate the destructors problem. In any case, very often the exception cannot be resolved locally in the function where it is detected, so this feature is necessary in many situations. The following are two very simple examples of the admissable syntax and use of the rescue clause and help invocation (they don't intend to suggest doing things this way, but are just illustrative): extern double inverse(double x); main() rescue:3 {} { double x; cin>>x; cout<<"\n"<<inverse(x)<<"\n"; } double inverse(double x) >>(x!=0) { return 1/x; } In this example an exception will be detected by the precondition assertion of the function "inverse" if 0 is given as input. The effect will be to unravel the function from the stack and execute the rescue clause of the main, which in this case does nothing besides reinitiating the program and asking for a new input value. The integer number after the colon following the rescue keyword is the number of times that rescue is allowed. If the program doesn't come out with a correct situation after retrying that number of times the program will exit reporting a failure (i.e. an error condition). This is (approximately) the code that we get in output from the translator: extern double inverse(double x); #define CONTEXT__ "main()" main() { jmp_buf rescue_code; Rescue(&rescue_code, 3); if(setjmp(rescue_code)) {} double x; cin>>x; cout<<"\n"<<inverse(x)<<"\n"; } #define CONTEXT__ "double inverse(double x)" double inverse(double x) { PRECONDITION(x!=0) return 1/x; } The class "Rescue" supports the required operations in conjunction with the function "badNews": 1) Pointers to jump locations of rescue instructions are maintained in a stack list. 2) A global variable "rescue__" points to the top of the stack and represent the active rescue clause at any given moment. 3) If a rescue clause is active (i.e. the Rescue stack is not empty) and an exception occurs, the function badNews calls for a longjmp to the rescue code. 4) The number of residual chances after a rescue is executed is updated. In the second example the rescue clause is local to the routine "inverse" and is invoked explicitly by the "Help!" instruction. main() { double x; cin>>x; cout<<"\n"<<inverse(x)<<"\n"; } double inverse(double x) rescue:1{x=1e-30;} { if(x==0) Help!; return 1/x; } An exception always involves a call to "badNews". It is worthwhile to point out that, besides being raised by assertion checking and explicit "Help!" invocation, an exception may be caused by an error detected by the hardware or the operating system. This usually causes a signal to be generated, e.g. floating point exceptions or segmentation violation signals. The default initialization phase of an EC++ application makes provisions to trap these signals at any time and therefore also in these cases an exception is detected and the function "badNews" is called. The method is relatively simple but with this basic framework more complex behaviors, when needed, can also be simulated. Extension of this mechanism, in order to make it more flexible, could be considered depending on practical experience indications. Support for garbage collection. EC++ may be used without garbage collection, but in the kind of application it is intended for, as well as for avoiding some of the difficulties encountered in exception handling and in writing rescue clauses, I suggest using it, and writing code making the assumption that garbage collection is active. The Boehm's garbage collector [4] is used and therefore the support provided by EC++ is just a couple of files to interface the Boehm's package with C++. It is worth noting that, using automatic garbage collection, a very different style of programming can be adopted. All the complications of memory management that we encounter when we need multiple reference to the same object and that usually require manual reference counting suddenly disappear. Also the difference in semantics of assignment and initialization, that took a chapter in the Stroustrup's book [1], virtually disappears and we can use pointer assignment without troubles. Some notes on the environment. EC++ assumes work is in a directory tree with a root that is passed to the EC++ preprocessor as a parameter. Files are searched in this tree. The following assumptions are also either necessary or recommended: 1) Original source files have suffixes ".cc" and ".h". 2) Each directory that contains source files has two separate makefiles: "preprocess.mk" and "compile.mk". The former applies the preprocessor to the original source files and produces the corresponding processed files with suffixes ".C" and ".H". These files contain #line directives to the original files. 3) To produce an executable file or a set of coherent object files, all the preprocess makefiles are executed first in all directories, in order to have all the include files preprocessed and updated, as well as all instances of generic classes generated, then the compilations are performed. EC++ uses the "typeof" feature, that I think is not yet implemented in AT&T 2.0, but that is available in GNU g++. Therefore use of g++, or of other C++ compilers (if any) with equivalent features, is assumed. The current implementation of EC++ also uses classes of the GNU g++ library. You should therefore have it available in order to compile and load the files of the EC++ distribution. POSSIBLE FUTURE DEVELOPMENTS. EC++ is currently being used and tested in this implementation. Some future development (as we can foresee at this time) may include support for: 1) debug instructions and assertion checking in any needed location; 2) atomic allocation [4] to improve garbage collection performance; 3) a more flexible exception handling mechanism; 4) automatic production of documentation. REFERENCES. [1] - B. Stroustrup, "The C++ Programming Language", Addison-Wesley, 1986. [2] - B. Meyer, "Object-Oriented Software Construction", Prentice-Hall, 1988. [3] - B. Stroustrup, "Possible Directions for C++", Proc. USENIX, Nov. 1987. [4] - H.J. Boehm, "Garbage Collection in an Uncooperative Environment", Software Practice and Experience, Vol. 18, pp. 807-820, Sept. 1988. [5] - D. Lea, "User's Guide to GNU C++ Library", May 1989.
jima@hplsla.HP.COM (Jim Adcock) (10/21/89)
Can someone explain "typeof" ?
bertrand@eiffel.UUCP (Bertrand Meyer) (10/25/89)
If imitation is flattery, I certainly have reason to be honored by the recent ``EC++'' posting. With the exception of the garbage collection strategy (for which I believe our approach to be superior), everything it adds to C++ is directly lifted from Eiffel. The promised ``future developments'' have been present in Eiffel for years. If we ever needed a confirmation that we have been doing the right thing all along (rather than following whims of fashion), this is it. Reinventing the wheel (especially if done with some of the proper acknowledgments, as here) is permitted. In the meantime, of course, we are working on the next steps. Good luck anyway. I am afraid, however, that the path followed by the posting - yet another set of semi-compatible extensions to an obsolete technology - holds little promise. Although few will listen, as few have listened in the past, I will once again repeat Hoare's and Wirth's admonition: you don't build programming languages by adding features ad libitum. Hoare was reacting to the orgies of the sixties, but the present circumstances are not much better. Like any other good engineering design (or any good design whatsoever), a good programming language is defined as much by what it excludes as by what it includes. (See also my article in ``Structured Programming'', Vol. 10 no. 1, 1989, pages 10-39.) It is also rather saddening, after more than twenty years of use of the catchphrase ``software engineering'', to see programming language features defined in terms of translator's output, ``unraveling the stack'', ``longjmps'' and so on. This is not an inappropriate place to repeat (for those readers who prefer originals to imitations) that anyone is free and welcome to produce an implementation of Eiffel - compiler, interpreter, ... - as well as libraries, tools and the like. The name is a trademark and must be acknowledged, but that's all. It is so much more fun to do the real thing. -- Bertrand Meyer bertrand@eiffel.com -- -- Bertrand Meyer bertrand@eiffel.com
ttwang@polyslo.CalPoly.EDU (Thomas Wang) (10/26/89)
The EC++ manual describes the exception handling mechanism as implemented by setjmp() and longjmp(). I had considered implementing such a mechanism a while ago, and the response from USENET unix gurus are that setjmp() and longjmp() are not portable. They are not even guaranteed to work in future compilers on traditional CISC machines. The problem is that longjmp() will restore the value of register variables, and keep the value of memory variables unchanged. As the 'register' declaration is only a hint to the compiler, all variables declared as 'register' will have undefined values after a longjmp(). If one uses longjmp(), then all local variables must be declared 'volatile'. This will have some performance impact. Also using longjmp() will leave the destructors not called. The assumption made by EC++ is that people will be using Boehm's garbage collector, so there will not be any destructors to call. I think generally this will not be true. Exception handling is useful independent of any garbage collector used. There are other garbage collection schemes that requires destructors to be called (such as my MM garbage collector). A way of using longjmp() but at the same time call all the destructors is presented in 1988 USENIX C++ Conference "Excepton Handling Without Language Extensions". Perhaps EC++ can use this scheme to improve the exception handling implementation. The long term solution I believe not to be any variant of setjmp()/longjmp() implementation, but one which is build into the language. I likes the idea that every function must have a default exception handler, and zero or more user defined exception handlers. This way, the program can directly branch to the exception handler without stack manipulation when an error occurred. I think there is two appropriate ways an exception handler can propagate the error down the exception handler chain. The exception handler can directly modify the return address on the stack to point to the next exception handler. This option requires the address of the next exception handler passed as a hidden parameter. A second option is just set an error variable, and return. The caller then test the error variable, and branch to its own exception handler. Both of these schemes require a small constant overhead. This disadvantage is offset by the ability to use register variables and non-volatile variables freely, not to mention the portability issues. The constant overhead of the first approach is the time to push the address of the exception handler on stack. The constant overhead of the second approach is the time to test the error variable and branch to exception handler. -Thomas Wang (Programming without exception handling is like driving without breaks!) ttwang@polyslo.edu