orville@weyrich.UUCP (Orville R. Weyrich) (05/25/91)
Please e-mail responses to orville%weyrich@uunet.uu.net or else post to comp.software-eng (note the followup line). > My questions: > > I understand that there are several program restructuring tools > currently available which help turn old GO TO laden COBOL and FORTRAN > code into more structured code. > > What products are available now, and what (if any) research is > presently being done in this field? > > I have heard rumors to the effect that the present tools are less > than satisfactory. I would be interested in hearing of the experiences > (good and bad) of anyone who has used these tools. Bad experiences > are particularly useful, since they help clarify where improvements > need to be made. > > References to any articles on the subject would also be appreciated. > > What companies are currently using/developing program restructuring > tools that I might offer my services to as either a consultant or an > employee? Does anyone know of names of people withing these > organizations that I could contact? > > >"Reverse Engineering" can mean a lot of things to different people, including >the following: > 1) Support for extracting "business rules" from an existing program. > This becomes necessary when dealing with dusty-deck code > when the people in an organization that were responsible for > creating the program have disappeared, along with those > that knew how the business was run prior to the installation > of the program. Here, we are talking about "knowledge > acquisition" as is done in building expert systems, > but in this case, there are no human experts left, only a > bunch of undocumented source code (or worse!). > 2) Support for "Program Archeology" or "Design Recovery" on an > existing program. This is done to recover the program design > documents which have been lost (or never made). > This becomes necessary in order to maintain the code. > 3) Producing a "work-alike" product by working backwards from the > external behavior to figure out how to code a new application > which is a clone, but which does not use the original code. > An example would be the "NewWord" product which cloned the > "WordStar" word processor. This technique is under legal attack, > and is not what I have in mind. > >I am aware of rumors that companies such as ViaSoft and Transform Logic >Corporation are preparing products to support Reverse Engineering, but >would like any additional information which the net knows of. In particular, >I know that Transform Logic Corporation demonstrated at their "Transform User's >Group" last year a product under development which converts dirty old COBOL >decks into a hypertext format for browsing by the programmer. > >I suspect that many consulting/contract job shops have tool suites that they >use in-house. > >My questions: > > Does anyone know of companies interested in this kind of stuff that > I might offer my services to (as either a consultant or an employee?) > > What products are available now, and what (if any) research is > presently being done in this field? > > I have heard rumors to the effect that the present tools are less > than satisfactory. I would be interested in hearing of the experiences > (good and bad) of anyone who has used these tools. Bad experiences > are particularly useful, since they help clarify where improvements > need to be made. > > References to any articles on the subject would also be appreciated. > I am aware of the issue of IEEE Software on Reverse Engineering > (January 1990). > >Thanks, > >Orville. -------------------------------------- ****************************** Orville R. Weyrich, Jr., Ph.D. Certified Systems Professional Internet: orville%weyrich@uunet.uu.net Weyrich Computer Consulting Voice: (602) 391-0821 POB 5782, Scottsdale, AZ 85261 Fax: (602) 391-0023 (Yes! I'm available) -------------------------------------- ****************************** Thanks to all who replied to my postings regarding program restructuring and reverse engineering. I would be happy to get any additional comments and/or feedback on this information. Note that I also posted a question regarding program translation, the responses to which I will summarize later [when responses stop coming in]. ------------------------------------------------------------------------------- I SUMMARIZE BELOW COMMENTS BY: straub@cs.UMD.EDU (Pablo A. Straub) johnson@natasha.ics.hawaii.edu (Philip Johnson) lamson@el1.crd.ge.com (Scott Lamson) coxs@dg-rtp.dg.com (Stan Cox) hen@bucsf.bu.edu (Wm. H. Henneman) colbrook@concerto.lcs.mit.edu (Dr Adrian Colbrook) Keith.Bierman@Eng.Sun.com (Keith Bierman fpgroup) cassel@sce.carleton.ca (Ron Casselman) jls@netcom.com (Jim Showalter) hartman@cs.utexas.edu (John Hartman) Kevin.Lano@prg.oxford.ac.uk (Kevin Lano) oneel@heawk1.rosserv.gsfc.nasa.gov (Bruce Oneel) render@zeppo.Colorado.EDU (Hal Render) franklin@enuxha.eas.asu.edu (Malcolm L. Franklin) metcalf@crnvma.cern.ch (Mike Metcalf) dalamb@umiacs.UMD.EDU (David Lamb) dent@DIALix.oz.au (Andrew Dent) agnew@ast.saic.com (Robert Allen Agnew) paulg@tplrd.tpl.oz.au (Paul Gittings) nol2105@dsacg2.dsac.dla.mil (Robert E. Zabloudil) iwen2@c3po.d.umn.edu (ichang wen) alan@mel.dms.csiro.au (Alan Miller) map@cadillac.siemens.com (Michael Platoff) ------------------------------------------------------------------------------- MY CONCLUSIONS: I found no comments on successful reverse engineering products beyond simple cross-reference generators and flow-chart generators. Restructuring is widely done; the problems reported have to do with obfuscation of the code when a "forbidden" control structure (such as an exit in the middle of a loop) is translated into a much less obvious WHILE loop with less obvious nested if statements in the body. Other problems include ignoring the program semantics and data dimension of the program. Work continues at a number of research and commercial institutions in the area of restructuring and reverse engineering. ------------------------------------------------------------------------------- PUBLICATIONS TO LOOK AT: "From Code to Z Specifications" P. Breuer, K. Lano, in the 1989 Z User Workshop Proceedings: Springer-Verlag Workshops in Computer Science. "Reverse-Engineering and Validating COBOL" by K. Lano. Ward M., Calliss F.W., Munro M. "The Maintainer's Assistant", Proceedings of Conference on Software Maintenance 1989, Miami, Florida, IEEE Computer Society Press, Oct 1989. pp307-315. The Proceedings of the Conference on Software Maintenance (1986-1990) published by IEEE have various articles on reverse engineering and related topics. Look at a few articles in Girish Parikh's Tutorial on Software Maintenance (IEEE publication). Keep you eye open for the "Proceedings of the 13th International Conference on Software Engineering" that was held in Austin earlier this month [May 1991]. Names to search for in the literature are: Harry Sneed Mitchell D. Lubars Balasubramaniam Ramesh Ted J. Biggerstaff. Ted J. Biggerstaff, "Design Recovery for Maintenance and Reuse," IEEE Computer, Vol. 22, No. 7, July 1989, pp 36-50. "Extracting and Restructuring the Design of Large System " IEEE S.E. Jan 90 by Song C. Choi and Walt Scacchi in USC. "An Ada Restructuring Assistant", P. Johnson, D. Hildum, A. Kaplan, C. Kay, J. Wileden. In the Proceedings of the Fourth Annual Conference on Artificial Intelligence and Ada, Fairfax, VA. 1988. The Programmer's Apprentice work done at MIT. Look through SIGPLAN and Communications of the ACM mid-seventies. A big name in the was Amir Pnueli (sp?) from the University of Texas at Austin. John Hartman. Understanding Natural Programs Using Proper Decomposition. 13th International Conference On Software Engineering - ICSE-13, May 13-16, 1991. John Hartman. AUTOMATIC CONTROL UNDERSTANDING FOR NATURAL PROGRAMS. PhD thesis, The University of Texas at Austin, Dept. of Computer Sciences. May 1991. John Hartman. Pragmatic Program Understanding For Automated Re-engineering. In International Computer Software and Applications Conference - COMPSAC '90. IEEE Computer Society Press, October 31-November 2, 1990. Position paper for panel: ``Industrial Experience in Automating Software Re-engineering''. John Hartman. Automatic Program Understanding and the Maintenance Interface. In Second Annual User-System Interface Conference, Austin, Texas, February 20-21, 1987. John Hartman. Automatic Program Understanding For Cobol Restructuring. In Computer Science Research Review, Dept. of Computer Sciences, November 1986. The University of Texas at Austin. John Hartman. Restructuring Cobol Programs Into Abstract Data Type Modules. Technical Report SDBEG-21, Dept. of Computer Sciences, The University of Texas at Austin, 1980. Implementation is described in ``XTC User's Guide'', SDBEG-18. ------------------------------------------------------------------------------- PRODUCTS AVAILABLE: SPAG program marketed by Polyhedron Software. Try their toll-free fax no.: 1-800-777-5519. It is my understanding that OTG takes care of normal orders: OTG voice 717 222 9100 fax 717 229 9103 Alternatively, one may wish to go to the source Polyhedron Software Ltd Magdalen House 98 Abingdon Road Standlake Witney Oxon OX8 7RN Tel 0865 300 579 [See review below]. PRISM OTG Systems [same as above?] Mentor Graphics has a CASE tool called CASEStation. One of the support tools for this product will take "C" source and produce a structure chart. Support for CASEStation has recently moved to the UK. CADRE Technology, the makers of TEAMWORK, also have a similar product called C-REV. This again takes "C" source and produces a structure chart. CADRE unlike Mentor also have products to aid in the development of Cobol programs I don't know if they have any reverse engineering products for Cobol. RECODER tool from Language Technology, Inc. [See review below]. Toolpack has tools for restructuring code. [See review below]. FORGE Pacific Sierra Research analysis/restructurer/vectorizer/parallelizer FAUST from U of Illinois/U of Indiana dennis gannon Commercial tool is available from Bachman. Commercial tool is available from XA Systems. FORWARN/etc Quibus Enterprises Fortran Development Tools Quibus. Pretty sort of like SPAG. Seems less powerful. Outline: indentation creator with bonus graphic symbols to highlight logic. PREP : another conditional compilation, includization tool. Not exciting as such, but prices start at $500, so it is at least cheap. FOR_STRUCT from Cobalt-Blue. Like SPAG. Restructures Fortran particularly old Fortran (Fortran 66/ Fortran IV) into Fortran 77. The same company produces FOR_C which translates Fortran -> C. The company really pushes automated fortran->c|c++ translation which [some folks think] is a very, very bad idea. May be a great implementation though. 408 723 0474. ------------------------------------------------------------------------------- OTHER COMMENTS BELOW ------------------------------------------------------------------------------- [This was posted independently rather than in reply to my posting, but is relevant] Subject: Experience with Ovum Ltd. reports? Summary: Any experiences with Reverse Engineering report by Ovum Ltd.? Message-ID: <map.674776195@honda> Date: 20 May 91 21:49:55 GMT Sender: news@siemens.siemens.com We're considering purchasing a research report from Ovum Ltd. entitled "Reverse Engineering: Markets Supplier and Products." The report is fairly expensive, so I'd like to make sure it's worth the money. I asked Ovum for references or excerpts from a section of the report, but they seem to have overlooked my request. Has anyone purchased an Ovum report? What did you think of it? In particular, has anyone read the Reverse Engineering report? Michael Platoff email: map@cadillac.siemens.com Siemens Corporate Research phone: (609) 734-3354 755 College Road East Princeton, NJ 08540-6668 ------------------------------------------------------------------------------- Although [Dent's] FORTRAN-Pascal translation currently performs minimal restructuring, there is much potential. This has come about through the extensive use of Object-oriented programming techniques and decomposition of programs into chained and nested objects. If possible, could you please append a note to your posting suggesting OOP can play an important part in writing restructuring tools, particularly if the programmer wants to stay away from languages such as LISP. It would be very interesting to see performance comparisons between a similar tool in some of the more exotic languages vs Object Pascal or C++. ------------------------------------------------------------------------------- As part of the REDO ESPRIT project, on the maintenance of systems, we are doing work on restructuring COBOL code by means of abstracting these codes to a functional language, and then performing transformations on this representation, to eliminate recursions that are not in WHILE-IF form. We are interested in applying machine-learning techniques to determine useful restructuring strategies, but not much work has yet been done by us on this. Tools that transform code to the functional language, and support the user in re-writing this code, have been developed in Prolog. Work at Durham University by Martin Ward also involves program transformation using formal methods. Their REFORM Project, to reverse-engineer assembly language code to Z, is described in his paper cited above. ------------------------------------------------------------------------------- There is a guy at MCC (Consortium in Austin TX) that is trying to apply neural network technology to reverse engineering; it's called the TAO subsystem of the DESIRE project. A lot of the stuff they publish is in MCC Proprietary publications (you gotta pay MEGA$ to get access to these). ------------------------------------------------------------------------------- I am currently working on a translation of MS-DOS FORTRAN programs to THINK Pascal on the Macintosh, for a local government department. As part of the job I've produced (well, nearly finished) a translator which will do most of the job, leaving only minor issues such as filename strings to be completed by hand. ------------------------------------------------------------------------------- We have developed these tools for Fortran and Cobol and are developing tools for Jovial. Contact John Hampton, 619-456-6017. ------------------------------------------------------------------------------- I've seen the Mentor Graphics tool called CASEStation and it worked rather well with the code we supplied, it could use some more work in the actual layout of the structured chart. ------------------------------------------------------------------------------- Our organization has the RECODER tool from Language Technology, Inc. I used it on a program that "followed" me when I transferred up from DESC in Dayton, and was fairly pleased with the results. My only "complaint", if you will, is that I had to go back in and "fix up" the result. I have a certain way I like my programs to look, and RECODER didn't do it the way I usually do. The biggest tangible difference was the complete lack of GO TOs; we are allowed to GO TO an EXIT, which helps hold down your program size and complexity. ------------------------------------------------------------------------------- I am working on my Master's project concerning updating a graphics package. Part of the project lies in restructuring. There are several folds of restructuring. Deleting GO_TO's is certainly one of the main streams in business. My preference is in modules and variables. The program I am working on is a one_shot program. i.e. all codes in one file and no separate compilation. For module restructuring I classify the routines then put them in several files which I think each one has it's own major data structure and consistent functionality. About the variables they were declared excessively globally. I try to minimize their scopes and confine the global declaration. ------------------------------------------------------------------------------ You did not mention a very important use of program restructuring: discovering parallelism in programs (yeah, they usually talk about the old dusty deck of FORTRAN code, but they also mean new programs). I have not a particular reference to give you on this, but any book, conference or journal on parallel programming will have plenty of articles. On the topic of program restructuring for parallelization there is mixed success: On one hand experienced programmers can write code (in a special version of FORTRAN) that will parallelize very well, and on the other hand if you write code unaware of the parallelizer (is that a word?) you achieve poor performance. ------------------------------------------------------------------------------ I helped develop a system called ARA for module-level restructuring in Ada using AI techniques. ARA is implemented in Common Lisp on a TI Explorer using GBB, a blackboard architecture development environment. Our research was concerned with supporting evolution in the composition of Ada packages over time, rather than removing GOTOs, but the techniques might be interesting to you. ------------------------------------------------------------------------------ Toolpack has tools for restructuring code. Good points: toolpack is not just an isolated tool, but has modular tools for doing many things: restructuring, redoing declarations, rolling/unrolling loops, checking for portability. Bad points: pretty strict F-77, with few extensions supported. If you cannot get toolpack to "compile" your code, none of the other tools will be usable. ------------------------------------------------------------------------------ [Indirect reference to] Eric Bush at Language Technology. Their main product was a Cobol reformatter, and I think it initially was written in PL/I and ran on an IBM mainframe. The last time I heard, they were working on a prolog version, and the company was doing pretty well. The The company is located in Salem, MA. ------------------------------------------------------------------------------ I wrote a program to translate 4.5 Megalines of a very strange FORTRAN dialect into idiomatic C. By idiomatic, I mean that the resulting code should look like it had been written by fluent C programmers, not translated FORTRAN. I used an ATN as the basic translating engine, and still think it was a reasonable choice. ------------------------------------------------------------------------------ As far as AI based systems goes you might want to look at the Programmer's Apprentice work done at MIT. Those papers mention the reversal of their techniques to perform reverse engineering. I did some work in this area when I was at Surrey University and most of the results were published at the IEEE conference. Simply source code restructuring is algorithmic and probably does not require the application of AI techniques. The real win for AI would be in reverse engineering. ------------------------------------------------------------------------------ My experiences with such tools are usually quite good. Here is a review I did of one. Subject: SPAG: a Fortran restructuring tool (preliminary notes) <long> Executive Summary ====================== If neither you nor your customers ever use Fortran, skip this review. If all your Fortran code is brand new, this product is of some use, but not wonderful. If much of you (and your customers) Fortran code dates back to FORTRAN II ... you (they) probably need this product or one like it. Has some rough edges, but worth serious consideration. I've just obtained a demo copy of Polyhedron's SPAG tool (and friends). I've run a few 10K's lines of code through it and have formed some initial impressions. A full review will come later. It has two major components: ==== CRAM Provides a report of who calls who. Checks arg lists for consistent number of arguments. Provides a potential overlay map (really handy on Virtual Memory systems :>). ===== SPAG SPAG provides several functions which: 0) Can make your life easier as you try to port huge hunks of code for benchmarking. Can also make optimizations easier to spot ... also handy for you benchmarkers out there. 1) Simple "pretty printing" mode, can be used to help encourage (enforce) coding style guidelines. Especially helpful when you start out with a Million Lines of raw sewage. 2) Can be configured to replace all DO 1 J=1, N ... 1 x=10 with DO j=1, n .... x=10 ; END DO Which, as it happens can help to expose good candidates for loop fusion, viz. DO 4 I=1,20 4 DPDTT (I) =INI DO 5 I=1,20 5 TDPDT (I) =INI DO 6 I=1,20 6 DBSOR (I) =INI DO 7 I=1,20 7 TSOR (I) =INI becomes DO i = 1 , 20 dpdtt(i) = ini ENDDO DO i = 1 , 20 tdpdt(i) = ini ENDDO DO i = 1 , 20 dbsor(i) = ini ENDDO which is trivally changed to DO i = 1 , 20 dpdtt(i) = ini tdpdt(i) = ini dbsor(i) = ini ENDDO Which executes faster. When we have run out of bigger automatic optimization wins, and other scheduled work, the compiler will probably do this for you. But for now, SPAG can help you spot this sort of thing. Especially probable is that the original code had assigned goto's and such ... and only after SPAG is this sort of thing obvious. IF(Z6.GT.ZMAX)GOTO 50 IF(Z6.LT.ZMIN)GOTO 45 I6FT=I J6FT=J 45 IF(Z8.GT.ZMAX)GOTO 50 I8FT=I J8FT=J GOTO 55 50 CONTINUE 55 CONTINUE becomes IF ( z6.LE.zmax ) THEN IF ( z6.GE.zmin ) THEN i6ft = i j6ft = j ENDIF IF ( z8.LE.zmax ) THEN i8ft = i j8ft = j GOTO 100 .... etc. Don't get unreasonable expectations, the revised code is likely to run a few %faster, but not a lot faster ... and can be modestly slower too. The point is that after SPAGization (pity they didn't call it DESPAG :>) transformations are likely to be easier. 3) About 50 user configurable transformations. I suspect that most users will pick a transformation and stick to it. As I said in the opening, I've run a few tens of thousands of lines through SPAG and it works, and aside from the MVS/PC style of interaction it does quite well. The code produced worked correctly (there were some warnings which the dedicated would have fixed up) and produced identical answers (although FP arithmetic being what it is, this could vary a bit ... just as optimization levels can impact 'em). With the nattering redirected to a file, on a 4/260 operating on NFS mounted files (so it's very like a 4/60), the conversion rate was a respectable 20Klines/min. ------------------------------------------------------------------------------ There have been such tools around for many years, typically site specific, and maintained by the local gurus (remember that most of the C tools we take for granted, cxref, flow, etc. are based on Fortran tools which predate them ... but never made it to Unix). Alas, much of the good culture got left behind when we (as an industry) unplugged the mainframes and shot all the support staff. This is the first commercial tool which I've seen running on Sun boxes. ------------------------------------------------------------------------------- I looked at this sort of thing several years ago when contracted to create a "smart" FORTRAN to Ada translator (it was never built). Anyway, we were able to track down many papers on program restructuring. This was a really hot topic in the 70s. When it proved to not solve the bigger problems "software engineering" came into the picture. I remember contacting the Army about tools for restructuring (ROME Defense Establishment I think) and most were currently not in use. ------------------------------------------------------------------------------- My feeling about program restructuring tools: GIGO. ------------------------------------------------------------------------------- Summary of Doctoral Research John Hartman Artificial Intelligence Laboratory* Department of Computer Sciences The University of Texas at Austin Austin, Texas 78712-1188 hartman@cs.utexas.edu (512) 471-9586 Program understanding involves recognizing abstract concepts like ``read-process loop'' in existing programs. Programmers spend much of their time understanding programs, so studying and automating the process has many benefits. For example, program understanding plays a major role in software maintenance, and progress in program understanding can make an important contribution to maintenance tools and methods. Programming plans are units of programming knowledge connecting abstract concepts and their implementations. Existing research assumes that plan instances can be recognized to recover the programmer's abstract concepts and intentions. This approach has not been confirmed empirically. Because of input restrictions and performance limitations, current program understanders have only been demonstrated with small example programs. A program understander that runs on a significant sample of real-world programs is needed to investigate the feasibility, limits, and applications of plan instance recognition. This dissertation describes a general and useful program understanding method and its implementation, evaluation and applications. Control concepts are abstract notions about interactions between control flow, data flow, and computation, eg. ``do loop'', ``read-process loop'', and ``bounded linear search''. We present a practical method for bottom-up control concept recognition in large, unstructured imperative programs . The method is implemented in the UNPROG program understander and tested with Cobol and Lisp source programs. UNPROG's value is demonstrated in Cobol restructuring and the empirical study of programs. Contributions include: A recognition method that is robust, efficient and scalable Implementation, testing, and evaluation Improved Cobol restructuring using recognized concepts Empirical study of programs at the conceptual level. Recognition Method Plan instances and concepts are recognized by comparing a program representation against a library of standard plans. The program representation is an abstract program model given by a hierarchical control flow/data flow graph. The model is decomposed into a tree of sub-models using propers (single entry/exit control flow sub-graphs). Plans are represented by similar graphs with added qualifications. Recognition is based on simple matching between sub-models and plans. It produces model-plan-concept bindings. Implementation, Testing and Evaluation The method was implemented in the UNPROG program understander and tested using Lisp and Cobol analyzers and a plan library. The method's design and analysis were confirmed in tests where efficient control and data flow checking procedures produced low recognition costs. The method was evaluated for performance and generality over languages, programs, and concepts. Comparing sub-models and plans is efficient because sub-models are small and have restricted, canonical control flow. Their abstraction structure focuses recognition on criterial program features and hides non-plan material. The model can be formed for all language constructs which permit static determination of control and data flow. The number of sub-models and comparisons increases linearly with program size. These performance and scaling advantages decrease as plans and concepts correspond less well to the sub-models produced by proper decomposition. Cobol Restructuring To show how our method can raise the level of reverse engineering and re-engineering tools, we demonstrated its application to automatic Cobol restructuring. The RESTRUC restructurer uses the model-plan-concept bindings found by UNPROG. Knowledge associated with plans and concepts interprets the cause of unstructured program parts and permits more specific and insightful transformation, code generation, and documentation than is possible with existing syntactic methods. Empirical Study of Programs Our method can also be used for empirical study of programs at the conceptual level. We described a methodology using UNPROG to improve recognizer performance, acquire plans, catalog natural plans and concepts, and characterize program populations with respect to aspects like recognizable and unrecognizable concepts, plan use, and domain features. For example, the distribution of plan recognition rates is a measure of the population's planfulness and can be used to test the hypothesis that programs are planful. Future Work and Implications Future work includes empirical studies using UNPROG, method extensions, and expansion to other program understanding domains. Larger studies with UNPROG will characterize program populations and method performance, and aid plan library acquisition and development. Method extensions include changes in the program representation, plan representation, and the recognition method. For example, abstract branches and other decomposition principles will be investigated. Expansion beyond bottom-up control plan instance recognition includes non-control concepts, hierarchical recognition, use of additional information, and advanced program understander architectures. Besides restructuring, our method can be applied to reverse engineering and re-engineering tools for applications like analysis, documentation, and translation. Recognizing and cataloging control concepts and plans has implications for future maintenance and development systems that represent and manipulate programs using programming concepts, including programming assistants, environments, and interfaces. Empirical studies using program understanders can confirm and bound program understanding theories. Conceptual data and program understanding models have many uses in software psychology. For example, they can identify natural concepts and programming knowledge, and describe their role in program comprehension and programming. Implications extend to complexity measures, language design, structured programming, and education. ---- * This work has taken place in the Qualitative Reasoning Group at the Artificial Intelligence Laboratory, The University of Texas at Austin. Research of the Qualitative Reasoning Group is supported in part by NSF grants IRI-8602665, IRI-8905494, and IRI-8904454, by NASA grants NAG 2-507 and NAG 9-200, and by the Texas Advanced Research Program under grant no. 003658175.