weide@elephant.cis.ohio-state.edu (Bruce Weide) (10/21/89)
Occasionally something goes by on this newsgroup that cries out for a response. Case in point: Recently Ralph Johnson made an argument that in order to reuse a software component, one must be able to see its source code. The essence of his case seemed to be: (1) Writing formal behavioral specifications for components is hard. (2) A complete formal specification IS a program. (3) Smalltalk programs he has written look a lot like specifications he has seen. Therefore, why not just stick with source code and forget formal specifications? Or, perhaps, why not write formal specifications but still provide source code just in case? (I couldn't tell for sure what the conclusion was. These seemed likely candidates.) (1) Writing formal behavioral specifications for components is hard. Here there is no disagreement. Fortunately, one of the nice things about reusable components is that relatively few people need to be able to WRITE specifications. Lots of people need to be able to read them. This suggests that, while writing specifications is presently difficult, we really ought to be working on specification methods and languages that also emphasize ease of understanding. Of course it will also help to improve the ease with which specifications are written. Both areas call for further work. But one should not conclude from the fact that specs are hard to write NOW that it must always be so. Nor should one conclude that they are necessarily hard to understand. In fact I think we already have the concepts necessary to write comprehensible formal specifications of non-trivial components, and plenty of anecdotal evidence that both CS students and practicing software engineers in industry can understand them; I have previously commented on such techniques in this forum. (By the way, I don't think Ralph was really claiming that formal behavioral specifications are INHERENTLY IMPOSSIBLE to write or INHERENTLY DIFFICULT to read and understand, which seems to be what is needed to support his argument. If that is the claim it's going to be difficult to defend.) (2) A complete formal specification IS a program. On this point there may be some grounds for disagreement. Formal specifications for a component will in general be abstract and implementation-neutral, i.e., they will hide both representation data structures and algorithms used in the component's implementation, explaining behavior in abstract terms, and admitting of multiple plug-compatible implementations. Specifications say WHAT the component does, not HOW it does it. Now there's a hell of a gap between such specifications and the source code for an implementation in an algorithmic language. To claim that this is "only" a compiler technology issue -- that a smart compiler can invent an acceptable (in performance) implementation from such abstract specifications -- is a remarkable leap of faith that flies in the face of reality. (3) Smalltalk programs Ralph has written look a lot like specifications he has seen. Perhaps Ralph's real claim in (2) is that one might simply consider the specifications to BE the source code. I can only suggest that such "specifications" are inappropriately named. If they reveal any representational or algorithmic details of HOW the component works they are not specifications by the above definition. I'm not fluent in Smalltalk, and maybe Ralph's programs look a lot different from Smalltalk code I've seen. But those samples don't look much like formal specifications to me. SUMMARY - Can we agree on the general definition, the nature/purpose of formal "specifications": to say WHAT the component does? Can we agree on the nature of "source code": to say HOW it does it? Can we agree that these are not the same thing? If so, I think we ought to be able to agree on something about the nature of reusable components in a (future?) software components industry: There is no reason in principle that a client programmer should need to examine component source code in order to understand what the component does. If we can't agree on the above definitions, we'd better move the discussion back to that issue. -Bruce
johnson@p.cs.uiuc.edu (10/25/89)
weide@elephant.cis.ohio-state.edu said >Occasionally something goes by on this newsgroup that cries out for a >response. Naturally, I was trying to get a response! I'm glad I succeeded. There are lots of things I could say in response, such as I disagree with the idea that it is irrelevant how hard it is to make reuseable code because the cost will be amortized later. However, the main point that I was trying to make was that in practice there is no real difference between a complete specification of what behavior a component has and a specification of how it is implemented. In other words, I think that the generally well accepted idea that "Formal specifications for a component will in general be abstract and implementation-neutral, i.e., they will hide both representation data structures and algorithms used in the component's implementation, explaining behavior in abstract terms, and admitting of multiple plug-compatible implementations" is wrong. It is only correct for specifying tricky algorithms, i.e. fast, complex algorithms for problems that also have slow, simple algorithms. It is certainly false for most real code, which consists of mostly user interface. Moreover, to the extent that it is true, it is also true of programs written in very-high-level languages. There is certainly a wide semantic gulf between formal specifications and Pascal programs. This is irrelevant to the claim that there is a wide semantic gulf between formal specification languages and ALL programming languages. For example, there are many formal specification languages that are executable. Thus, these specification languages are also programming languages. More to the point I am trying to make, real high-level languages (i.e. not Pascal or C) can look very much like specification languages. For example, compare these specifications of stacks. class stack sequence elements; push(elem) = {elements := elements, elem} pop() = {elements := elements(1:(elements.size-1)} top() = {return elements.last} empty() = {return elements.size = 0} or pop(push(elem,S)) = S top(push(elem,S)) = elem empty(push(elem,S)) = false empty(newstack()) = true The second (the "abstract" specification) is shorter, but not much easier to read. Most importantly, it is not easier to understand. Thus, I claim it is no better as a specification than the first. Stacks are about the easiest component for which one can write a formal specification. However, a better example to show the strengths of formal specifications would be sorting, since the specification is that the output sequence is a permutation of the input sequence and that the output sequence is sorted, while most fast sorting algorithms take a lot of more code to express. This is an example of a problem for which there exists both fast complex algorithms and slow simple algorithms. The specification I described corresponds to the sorting algorithm that takes each permutation and checks whether it is sorted. This simple specification could be represented just as well by a program. To raise a last point, we probably disagree on what a component is. I don't believe in the mythology that we will be able to reuse components without changing them. It is very hard to predict how people will want to change components, so it is very hard to know what information to hide. Components include abstract classes and frameworks, which are really just partial programs. This makes it even harder to make provide specifications that are different from the code, significantly simpler, and useful. Ralph Johnson
johnson@p.cs.uiuc.edu (10/25/89)
I argued that to reuse a software component, one must be able to see its source code, and that it was a mistake to think that formal specifications were going to solve the problem. Bruce Weide summarized my argument as (1) Writing formal behavioral specifications for components is hard. (2) A complete formal specification IS a program. (3) Smalltalk programs I has written look a lot like specifications I has seen. It is always interesting to see how others interpret my words. I don't think (1) is a crucial point. (2) is crucial, and (3) is just there to back up (2). In fact, I don't think that Smalltalk is at all unique in this regard. For example, the first Ada compiler was written in SETL and was really more of a specification for Ada, or a design of an Ada compiler, than it was a "real program". Although it was executable, it only compiled a few lines per minute. My last message elaborated on (2). However, I've been thinking about (1). I'd like to say some provocative things. First, to the best of my knowledge, nobody has written formal behavioral specifications for a real component library. There was a group at Tektronix that was using Z to specify a few Smalltalk classes, and there is a group here at Illinois that is using Z to specify a few classes from a library of operating system classes written in C++, but neither of these groups are dealing with more than a few classes. There is also the Larch group at MIT, though I haven't seen anything from them dealing with real libraries. It seems to me that the burden of proof is on the side of those who claim that it is reasonable to expect to have formal specifications for component libraries. I would love to be proved wrong. I am not one of those who think that formal specifications are useless. Quite the contrary, I demand that my students take courses in programming language semantics and program verification, and I studied both when I was in graduate school. However, so far the effect of these topics on computer science has been indirect. They have helped us invent better languages and learn to think about programming better, but they have had little use in commercial programming. Perhaps this will change soon, but I see little sign of it. There are too many unsolved problems, like how to reason about pointers. On the other hand, I am greatly in favor of more research being done on the problem and wish that the area was funded a lot better than it is. It is an important problem, and one worthy of the attention of many people, but it is an area that offers many interesting research topics, not one that offers ready-made solutions to current practical programming problems. Ralph Johnson -- University of Illinois at Urbana-Champaign
mitchell@community-chest.uucp (George Mitchell) (10/25/89)
Until your last posting you were only getting close, but now you have really "pushed my button". In article <130200016@p.cs.uiuc.edu> johnson@p.cs.uiuc.edu writes: > >To raise a last point, we probably disagree on what a component is. >I don't believe in the mythology that we will be able to reuse >components without changing them. It is very hard to predict how >people will want to change components, so it is very hard to know >what information to hide. Components include abstract classes and >frameworks, which are really just partial programs. This makes it >even harder to make provide specifications that are different from >the code, significantly simpler, and useful. > At OOPSLA '89 the statement was made (I wish that I had said it) that modifying [code/component] was not reuse but recycling. Since the cost of producing code is a small portion of the total lifecycle cost, the goal of reusing complete components should not be so quickly abandoned. The potential savings in testing and maintenance are well worth while. If this requires extensive domain analysis, system designs constrained by component availability, and use of languages supporting inheritance, the benefits should usually be well worth the sacrifices. -- /s/ George vmail: 703/883-6029 email: mitchell@community-chest.mitre.org [alt: gmitchel@mitre.arpa] snail: GB Mitchell, MITRE, MS Z676, 7525 Colshire Dr, McLean, VA 22102
weide@elephant.cis.ohio-state.edu (Bruce Weide) (10/26/89)
Ralph Johnson's recent comments about the differences between specification and implementation call for a few more comments and questions of clarification. (1) Ralph gives an example of a simple implementation of stacks and simple algebraic specification, and concludes: "The second (the "abstract" specification) is shorter, but not much easier to read. Most importantly, it is not easier to understand. Thus, I claim it is no better as a specification than the first." I'm not sure I understand the point of this example, except that it is POSSIBLE for an implementation to be about as short and as simple as its formal specification. No argument there. But the claim that the algebraic specification is "THE abstract specification" is too strong. There are other well-known specification methods that, we would both apparently agree, render specifications easier to comprehend than the algebraic approach. Such specifications remain abstract but do not rely on using equations, as in Ralph's example. Such alternative specification methods should result in easier-to-understand specifications of stacks. Admittedly these alternatives might look on the surface very much like Ralph's source code (for this example but not in general). But they are subtly and importantly different in meaning precisely BECAUSE they are abstract. One might, for example, use a mathematical sequence as a mathematical model of a stack. But there would be no implication that program sequences are used in the implementation, as apparently Ralph has done in his code. This is important for lots of reasons, upon which I can elaborate if the discussion continues along this line. (2) Ralph continues: "Stacks are about the easiest component for which one can write a formal specification. However, a better example to show the strengths of formal specifications would be sorting, since the specification is that the output sequence is a permutation of the input sequence and that the output sequence is sorted, while most fast sorting algorithms take a lot of more code to express. This is an example of a problem for which there exists both fast complex algorithms and slow simple algorithms. The specification I described corresponds to the sorting algorithm that takes each permutation and checks whether it is sorted. This simple specification could be represented just as well by a program." The general claim here seems to be that for a reusable component with a compact formal specification, there is also a simple but possibly inefficient ALGORITHM that is its implementation and that, to some people at least, would be easier to understand. This seems unlikely at best. Consider a simple example. Suppose you have a component that keeps track of a function from some domain type D to some range type R (both parameters to the component). You want to specify a boolean-valued operation that returns "true" given f and r iff function f maps some value of type D to the specific value r. This is easy to specify in the abstract using an existential quantifier; something like: there exists d:D such that f(d) = r Presumably the source code to explain this behavior consists of a program to search through all values of type D for one that f maps to r. Two problems here. First, D may not be finite, in which case this is not an algorithm at all because it may not terminate and could never return "false". Second, how do you write this program (except in pseudocode or something) without knowing what type D is? But Ralph has provided a good idea here: to challenge specification writers to design and specify a component that permits one to do sorting. I'm happy to oblige, and will post such specs to this forum if asked to do so. But we should also be able to see the "simple specification" mentioned in the last sentence above for comparison purposes relative to ease of understanding. A biased audience, readers of this forum, but we might get some info nonetheless. Of course, comparisons on other grounds (e.g., ease of verification that alternative implementations meet the specs; whether the spec method helps one design a good reusable component in the first place; lots of other questions) might be too difficult to carry out in this forum. But it would be a start at seeing whether people might prefer to see source code or formal specifications for their reusable components. -Bruce ------ Prof. Bruce W. Weide Dept. of Computer and Information Science The Ohio State University 2036 Neil Ave. Mall Columbus, Ohio 43210-1277 USA Phone: 614-292-1517 E-mail: weide@cis.ohio-state.edu
weide@elephant.cis.ohio-state.edu (Bruce Weide) (10/26/89)
Ralph Johnson's second reply to the question of the place of formal specification was very interesting. I'd like to add our group to his list of those who have written formal specifications for "real" components. Ralph, I'll send you a few reports if you wish; just drop me a note. I wish I could cite a bunch of papers we've had published but few people seem to want to hear what we're trying to say. That's one of the reasons we think this newsgroup is so important: at least the people here have not already abandoned the idea that reusable software components might someday become practical. Nor are they convinced that they already know the one-and-only way to do something. That leads to Ralph's claim that one of the deficiencies in relating formal specification, verification -- formal methods in general -- to real programming is the difficulty of dealing with pointers. How true!!! Again, all I can say is that we have done quite a bit of work on this, but it generally falls on deaf ears. Many people can't imagine programming without pointers sticking out of everything, so don't even bother suggesting that pointers spell trouble. But this is another important reason that formal specifications are important and that you don't want to look at source code to try to understand a component's behavior. By making specifications abstract, one can shield a client of a reusable component from all the pointer tricks that often need to be played in the implementation in order to implement it at all, and certainly to make it efficient. I guess the bottom line is that Ralph might grant that, if certain technical problems could be solved, there might be some hope for substituting formal specifications for source code as descriptions of component behavior. Ralph, if I've misrepresented your position on this, please advise. -Bruce
hogg@db.toronto.edu (John Hogg) (10/26/89)
Ralph Johnson has claimed that every specification is a program, and this statement has been challenged by other posters. Since I take it as an article of faith myself, I'll try to clarify it. No specification or program by itself does anything, except take up disk space. A program (as we all learned at a tender age) is a sequence of instructions. These instructions must be executed by (in cases of interest to us) a computer. And the code we see is seldom executed directly; it is normally processed by a compiler and friends. Therefore, when we speak about executing a program, we implicitly acknowledge the existence of large amounts of hardware, software and cleverness, which make up our program execution engine. Specifications are not generally executed, and therefore the market for specification language engines is thin. However, in principle, such an animal is quite possible. A specification defines the set of acceptable output states for a given input state. (I'll certainly accept other flavours of specification, but similar arguments can be made.) To implement the specification, simply enumerate the domain of output states until one satisfying the postcondition is found. Or, if you prefer, select states nondeterministically until satisfaction. If the specification is satisfiable, then the specification engine will terminate, albeit in unbounded time. Bruce Weide gave a description of a specification which he claims is unimplementable: >Suppose you have a component >that keeps track of a function from some domain type D to some range >type R (both parameters to the component). You want to specify a >boolean-valued operation that returns "true" given f and r iff >function f maps some value of type D to the specific value r. This is >easy to specify in the abstract using an existential quantifier; >something like: > > there exists d:D such that f(d) = r > >Presumably the source code to explain this behavior consists of a >program to search through all values of type D for one that f maps to >r. Two problems here. First, D may not be finite, in which case this >is not an algorithm at all because it may not terminate and could >never return "false". Second, how do you write this program (except >in pseudocode or something) without knowing what type D is? This isn't quite fair, because he's given a specification of a component or function g without specifying the function f that it uses. If f is specified, then we can enumerate through the range of g ({true, false}) in quite acceptable time. The specification of f is executed in the same manner. The only reason that implementations are somehow more ``concrete'' than specifications, then, is that we've gone to the trouble of building the program execution engines. Specification engines would actually be easier to build in some ways, because we can be totally uninterested in performance. So both specifications and implementations are descriptions of a computation, which merely differ in undescribed side effects. Formal verification is just the process of showing that the two are in fact equivalent. The advantage of the implementation is its efficiency. The advantage of the specification *should* be its clarity. If the specification is no clearer than the implementation, then it is of academic interest at most. My PhD thesis, by the way (Extremely Real Soon Now) is on the subject of specification and verification of object languages. This should make me the world's greatest booster of component selection by specification alone. Well, the work is worth doing so that giants can step on my shoulders. However, we're a long way from modular, sound, and complete object specifications, let alone ones that are easier to read than the source. (Those who wish to challenge this statement are invited to do so by mail. Note that the claim has four adjectives and a noun that require precise definitions.) Probably the main advantage of formal specifications is that they can be used to prove that if sub-components meet their specifications, their component sum will too. But that's a property of an associated proof system, not the specification language per se. -- John Hogg hogg@csri.utoronto.ca Department of Computer Science, University of Toronto
weide@elephant.cis.ohio-state.edu (Bruce Weide) (10/26/89)
I recently suggested an operation that I claimed could be more easily specified abstractly than by giving source code for an implementation: ">Suppose you have a component >that keeps track of a function from some domain type D to some range >type R (both parameters to the component). You want to specify a >boolean-valued operation that returns "true" given f and r iff >function f maps some value of type D to the specific value r. This is >easy to specify in the abstract using an existential quantifier; >something like: > > there exists d:D such that f(d) = r > >Presumably the source code to explain this behavior consists of a >program to search through all values of type D for one that f maps to >r. Two problems here. First, D may not be finite, in which case this >is not an algorithm at all because it may not terminate and could >never return "false". Second, how do you write this program (except >in pseudocode or something) without knowing what type D is?" John Hogg has subsequently suggested that I claimed that the specification "was unimplementable." Sorry for the confusion; this is not what I meant to imply. It is quite implementable. It's just that the "obvious," and therefore presumably easy-to-understand implementation that would make the "source code specification" of behavior simple, is not in fact a correct implementation. John goes on to say: "This isn't quite fair, because he's given a specification of a component or function g without specifying the function f that it uses. If f is specified, then we can enumerate through the range of g ({true, false}) in quite acceptable time. The specification of f is executed in the same manner." Perhaps I wasn't being clear enough, or I don't understand John's point. The reusable component I have in mind provides a type whose ABSTRACT MATHEMATICAL MODEL is a function from D to R; let's call it type "foo" so we don't confuse it with a program function. The component also provides operations that permit the client to construct a foo, and also provides an operation that asks the question above: for the foo variable f (which has been constructed in this way), does any value of its domain map to r? The specified doesn't know what mathematical function models f. All he/she knows about f is that it has been constructed by the client using the operations provided by the component to operate on variables of type foo. The fact that the abstract specification of this component may talk about f as though it were a mathematical function is what makes the abstract specifications simple. That source code doesn't have this abstract view of f is what, I claim, will make writing "source code specifications" more difficult and make them harder to understand. (Not impossible, just clumsier.) Overall I agree with most of John's points, and in particular with his view of what verification is all about and what properties it should have (modularity, soundness, etc.). This is another reason for having abstract specifications. If one wishes to reason about the correctness of programs -- whether they meet their specifications -- one should probably draw on the ready supply of techniques, notations, and concepts of mathematical logic to do this reasoning. Having abstract specifications, for both the component being implemented and the components its implementation uses, seems to support this approach directly. Reasoning about the relationships among concrete implementations of components would seem, by comparison, to be more likely to cause problems when issues of soundness, relative completeness, etc., are considered. Again, I'm not claiming that it would be impossible to do it that way, just that the abstract approach seems more promising from here because it "reuses" what mathematicians have already done. -Bruce
johnson@p.cs.uiuc.edu (10/27/89)
I guess I don't really believe that "if certain technical problems could be solved, there might be some hope for substituting formal specifications for source code as descriptions of component behavior." That might be true, but it remains to be seen. It is pretty hard to *prove* my opinion. Even if we haven't started using formal specifications a hundred years from now, the solution might lie right around the corner. So far, we have been debating my claim that there is not that big a difference between specifications and programs. My claim is that our programming languages will improve to the point where they let us write programs that are just like the specifications. For example, the specification f(x)=y, where y is in D and g(y)=x can be written in Smalltalk as D detect: [:y | y g = x]. I know the notation is a little wierd, but that's not the point. The point is that it is POSSIBLE to write programs on the same level as the specifications. If D is infinite then the program might not terminate, but I write infinite objects all the time (i.e. the stream of all prime numbers). Naturally, I tend to reimplement these inefficient programs as time goes on, though it would be possible to argue that what I am really doing is starting with specifications and incrementally implementing them. In fact, that's how I like to think about it. The second point that I think is important is that most of the code in a large system is not algorithmic, and is not amenable to this kind of specification. The specification of user interfaces, for example, tends to look an aweful lot like the implemenation unless you go to a lot of work to invent higher level concepts. For example, if you worry about exactly how the screen will look then the specification is just a slow implementation. On the other hand, if you introduce the idea of "choices" and say that a particular "choice" can be made by a "menu" then you can reason about the problem much more abstractly. However, modern user interface management systems are moving in this direction---they are making these specifications executable. The third point is specific to object-oriented programming, which I think is the best way to achieve code reuse. One of the several important ways to reuse software (some people seem to think this is the only one o-o languages provide, which is wrong) is to inherit a class and to override some of the procedures from it. This ability to override procedures is crucial---many classes are designed in such a way that they cannot be reused until procedures are overridden. It is possible to provide formal specifications of the procedures that you expect will be overridden, but it turns out that it is very hard to predict exactly what will be overridden. A class is most reusable when it is possible to override anything. This means that every procedure will be formally specified. Since the median procedure size in Smalltalk is 3 lines, this means that few procedures will have formal specifications that are anything other than their implementation. Thus, providing formal specifications is really the same thing as providing the implementation. I would like to advance a fourth argument, which is one that I have not used before. That argument is that almost all the design effort goes into making the formal specification. That is one of the reason that formal specifications are so important. They let you debug a program at a high level, where it is easy to think about the problem. Of course, I believe in programming at a high level, but lots of other people seem to be content with C, and it certainly is a good idea for them to design their system at a higher level and then translate it into the lower level. The effort in designing an object-oriented system is defining the classes and their interfaces. Once you give this away to people, it is easy for them to recreate the code. Thus, programs with formal specifications are almost as easy to recreate as programs with source provided are to understand. This argues that formal specifications are not going to be solutions to the problem of protecting the ownership rights of the authors of the components. Research in formal verification has had many influences on CS. One of the most important is that it has taught us what is difficult. For example, it showed that goto statements were hard to think about. It also shows that pointers are hard to think about. This is why lots of modern languages, especially the symbolic ones, don't have explicit pointers. However, the effect is for programming language designers to borrow good ideas from the specification people. I think that the future will lie in that direction. Ralph Johnson -- University of Illinois at Urbana-Champaign
steve@cs.qmc.ac.uk (Steve Cook) (10/30/89)
In article <130200018@p.cs.uiuc.edu> johnson@p.cs.uiuc.edu writes: > >Research in formal verification has had many influences on CS. One of >the most important is that it has taught us what is difficult. For >example, it showed that goto statements were hard to think about. >It also shows that pointers are hard to think about. This is why >lots of modern languages, especially the symbolic ones, don't have >explicit pointers. However, the effect is for programming language >designers to borrow good ideas from the specification people. I >think that the future will lie in that direction. I have been interested and rather surprised to read the articles by Ralph Johnson which seem to be saying there is no place for formal specification as distinct from programming in today's software technology. My own experience with Smalltalk-80 is that much of the time I need to ask the question "what assumptions, exactly, does this method make about the current state of affairs?". In many cases this question is hard to answer. Nevertheless, if for example I wish to alter a method, in the hope that my modification will be a proper generalization and thus not break any current clients of the method, I must get the answer exactly right. Therefore I would greatly value some formal assertions which would enable me to know the assumptions exactly. A system of verifying the assumptions statically would be very helpful too. A first step towards this would be a type-checking scheme, and of course Ralph himself has done a great deal of work in this area. An obvious further step would be a more powerful system of static assertions, such as the preconditions, postconditions and invariants provided in the Eiffel system. A step from there would be automated support for verification of these assumptions, and a range of more expressive constructs. The notion of software contracting surely requires this kind of support to make sense. An effective contract says exactly what is agreed without unnecessary verbiage: just like a formal specification. Surely Ralph Johnson is not saying that the contract between the supplier and user of a software component must be based on the source code? Once the need for a language, or sub-language, of assertions has been acknowledged, the possibility of reusability in this domain itself emerges. In summary I claim that formal declarations of program properties (i.e. specifications) do play a very important role, as statements of contract between suppliers and consumers of software components. Smalltalk itself in my experience is not well-suited to this task. -- Mr Steve Cook JANET: steve@cs.qmc.ac.uk Research Fellow ARPA: steve%cs.qmc.ac.uk@nsfnet-relay.ac.uk Dept of Computer Science Queen Mary and Westfield College, London E1 4NS. 01 975 5236