Orville.R..Weyrich@sunbrk.FidoNet.Org (Orville R. Weyrich) (05/25/91)
l from Language Technology, Inc. I used it on a program that "followed" me when I transferred up from DESC in Dayton, and was fairly pleased with the results. My only "complaint", if you will, is that I had to go back in and "fix up" the result. I have a certain way I like my programs to look, and RECODER didn't do it the way I usually do. The biggest tangible difference was the complete lack of GO TOs; we are allowed to GO TO an EXIT, which helps hold down your program size and complexity. ------------------------------------------------------------------------------- I am working on my Master's project concerning updating a graphics package. Part of the project lies in restructuring. There are several folds of restructuring. Deleting GO_TO's is certainly one of the main streams in business. My preference is in modules and variables. The program I am working on is a one_shot program. i.e. all codes in one file and no separate compilation. For module restructuring I classify the routines then put them in several files which I think each one has it's own major data structure and consistent functionality. About the variables they were declared excessively globally. I try to minimize their scopes and confine the global declaration. ------------------------------------------------------------------------------ You did not mention a very important use of program restructuring: discovering parallelism in programs (yeah, they usually talk about the old dusty deck of FORTRAN code, but they also mean new programs). I have not a particular reference to give you on this, but any book, conference or journal on parallel programming will have plenty of articles. On the topic of program restructuring for parallelization there is mixed success: On one hand experienced programmers can write code (in a special version of FORTRAN) that will parallelize very well, and on the other hand if you write code unaware of the parallelizer (is that a word?) you achieve poor performance. ------------------------------------------------------------------------------ I helped develop a system called ARA for module-level restructuring in Ada using AI techniques. ARA is implemented in Common Lisp on a TI Explorer using GBB, a blackboard architecture development environment. Our research was concerned with supporting evolution in the composition of Ada packages over time, rather than removing GOTOs, but the techniques might be interesting to you. ------------------------------------------------------------------------------ Toolpack has tools for restructuring code. Good points: toolpack is not just an isolated tool, but has modular tools for doing many things: restructuring, redoing declarations, rolling/unrolling loops, checking for portability. Bad points: pretty strict F-77, with few extensions supported. If you cannot get toolpack to "compile" your code, none of the other tools will be usable. ------------------------------------------------------------------------------ [Indirect reference to] Eric Bush at Language Technology. Their main product was a Cobol reformatter, and I think it initially was written in PL/I and ran on an IBM mainframe. The last time I heard, they were working on a prolog version, and the company was doing pretty well. The The company is located in Salem, MA. ------------------------------------------------------------------------------ I wrote a program to translate 4.5 Megalines of a very strange FORTRAN dialect into idiomatic C. By idiomatic, I mean that the resulting code should look like it had been written by fluent C programmers, not translated FORTRAN. I used an ATN as the basic translating engine, and still think it was a reasonable choice. ------------------------------------------------------------------------------ As far as AI based systems goes you might want to look at the Programmer's Apprentice work done at MIT. Those papers mention the reversal of their techniques to perform reverse engineering. I did some work in this area when I was at Surrey University and most of the results were published at the IEEE conference. Simply source code restructuring is algorithmic and probably does not require the application of AI techniques. The real win for AI would be in reverse engineering. ------------------------------------------------------------------------------ My experiences with such tools are usually quite good. Here is a review I did of one. Executive Summary ====================== If neither you nor your customers ever use Fortran, skip this review. If all your Fortran code is brand new, this product is of some use, but not wonderful. If much of you (and your customers) Fortran code dates back to FORTRAN II ... you (they) probably need this product or one like it. Has some rough edges, but worth serious consideration. I've just obtained a demo copy of Polyhedron's SPAG tool (and friends). I've run a few 10K's lines of code through it and have formed some initial impressions. A full review will come later. It has two major components: ==== CRAM Provides a report of who calls who. Checks arg lists for consistent number of arguments. Provides a potential overlay map (really handy on Virtual Memory systems :>). ===== SPAG SPAG provides several functions which: 0) Can make your life easier as you try to port huge hunks of code for benchmarking. Can also make optimizations easier to spot ... also handy for you benchmarkers out there. 1) Simple "pretty printing" mode, can be used to help encourage (enforce) coding style guidelines. Especially helpful when you start out with a Million Lines of raw sewage. 2) Can be configured to replace all DO 1 J=1, N ... 1 x=10 with DO j=1, n .... x=10 ; END DO Which, as it happens can help to expose good candidates for loop fusion, viz. DO 4 I=1,20 4 DPDTT (I) =INI DO 5 I=1,20 5 TDPDT (I) =INI DO 6 I=1,20 6 DBSOR (I) =INI DO 7 I=1,20 7 TSOR (I) =INI becomes DO i = 1 , 20 dpdtt(i) = ini ENDDO DO i = 1 , 20 tdpdt(i) = ini ENDDO DO i = 1 , 20 dbsor(i) = ini ENDDO which is trivally changed to DO i = 1 , 20 dpdtt(i) = ini tdpdt(i) = ini dbsor(i) = ini ENDDO Which executes faster. When we have run out of bigger automatic optimization wins, and other scheduled work, the compiler will probably do this for you. But for now, SPAG can help you spot this sort of thing. Especially probable is that the original code had assigned goto's and such ... and only after SPAG is this sort of thing obvious. IF(Z6.GT.ZMAX)GOTO 50 IF(Z6.LT.ZMIN)GOTO 45 I6FT=I J6FT=J 45 IF(Z8.GT.ZMAX)GOTO 50 I8FT=I J8FT=J GOTO 55 50 CONTINUE 55 CONTINUE becomes IF ( z6.LE.zmax ) THEN IF ( z6.GE.zmin ) THEN i6ft = i j6ft = j ENDIF IF ( z8.LE.zmax ) THEN i8ft = i j8ft = j GOTO 100 .... etc. Don't get unreasonable expectations, the revised code is likely to run a few %faster, but not a lot faster ... and can be modestly slower too. The point is that after SPAGization (pity they didn't call it DESPAG :>) transformations are likely to be easier. 3) About 50 user configurable transformations. I suspect that most users will pick a transformation and stick to it. As I said in the opening, I've run a few tens of thousands of lines through SPAG and it works, and aside from the MVS/PC style of interaction it does quite well. The code produced worked correctly (there were some warnings which the dedicated would have fixed up) and produced identical answers (although FP arithmetic being what it is, this could vary a bit ... just as optimization levels can impact 'em). With the nattering redirected to a file, on a 4/260 operating on NFS mounted files (so it's very like a 4/60), the conversion rate was a respectable 20Klines/min. ------------------------------------------------------------------------------ There have been such tools around for many years, typically site specific, and maintained by the local gurus (remember that most of the C tools we take for granted, cxref, flow, etc. are based on Fortran tools which predate them ... but never made it to Unix). Alas, much of the good culture got left behind when we (as an industry) unplugged the mainframes and shot all the support staff. This is the first commercial tool which I've seen running on Sun boxes. ------------------------------------------------------------------------------- I looked at this sort of thing several years ago when contracted to create a "smart" FORTRAN to Ada translator (it was never built). Anyway, we were able to track down many papers on program restructuring. This was a really hot topic in the 70s. When it proved to not solve the bigger problems "software engineering" came into the picture. I remember contacting the Army about tools for restructuring (ROME Defense Establishment I think) and most were currently not in use. ------------------------------------------------------------------------------- My feeling about program restructuring tools: GIGO. ------------------------------------------------------------------------------- Summary of Doctoral Research John Hartman Artificial Intelligence Laboratory* Department of Computer Sciences The University of Texas at Austin Austin, Texas 78712-1188 hartman@cs.utexas.edu (512) 471-9586 Program understanding involves recognizing abstract concepts like ``read-process loop'' in existing programs. Programmers spend much of their time understanding programs, so studying and automating the process has many benefits. For example, program understanding plays a major role in software maintenance, and progress in program understanding can make an important contribution to maintenance tools and methods. Programming plans are units of programming knowledge connecting abstract concepts and their implementations. Existing research assumes that plan instances can be recognized to recover the programmer's abstract concepts and intentions. This approach has not been confirmed empirically. Because of input restrictions and performance limitations, current program understanders have only been demonstrated with small example programs. A program understander that runs on a significant sample of real-world programs is needed to investigate the feasibility, limits, and applications of plan instance recognition. This dissertation describes a general and useful program understanding method and its implementation, evaluation and applications. Control concepts are abstract notions about interactions between control flow, data flow, and computation, eg. ``do loop'', ``read-process loop'', and ``bounded linear search''. We present a practical method for bottom-up control concept recognition in large, unstructured imperative programs . The method is implemented in the UNPROG program understander and tested with Cobol and Lisp source programs. UNPROG's value is demonstrated in Cobol restructuring and the empirical study of programs. Contributions include: A recognition method that is robust, efficient and scalable Implementation, testing, and evaluation Improved Cobol restructuring using recognized concepts Empirical study of programs at the conceptual level. Recognition Method Plan instances and concepts are recognized by comparing a program representation against a library of standard plans. The program representation is an abstract program model given by a hierarchical control flow/data flow graph. The model is decomposed into a tree of sub-models using propers (single entry/exit control flow sub-graphs). Plans are represented by similar graphs with added qualifications. Recognition is based on simple matching between sub-models and plans. It produces model-plan-concept bindings. Implementation, Testing and Evaluation The method was implemented in the UNPROG program understander and tested using Lisp and Cobol analyzers and a plan library. The method's design and analysis were confirmed in tests where efficient control and data flow checking procedures produced low recognition costs. The method was evaluated for performance and generality over languages, programs, and concepts. Comparing sub-models and plans is efficient because sub-models are small and have restricted, canonical control flow. Their abstraction structure focuses recognition on criterial program features and hides non-plan material. The model can be formed for all language constructs which permit static determination of control and data flow. The number of sub-models and comparisons increases linearly with program size. These performance and scaling advantages decrease as plans and concepts correspond less well to the sub-models produced by proper decomposition. Cobol Restructuring To show how our method can raise the level of reverse engineering and re-engineering tools, we demonstrated its application to automatic Cobol restructuring. The RESTRUC restructurer uses the model-plan-concept bindings found by UNPROG. Knowledge associated with plans and concepts interprets the cause of unstructured program parts and permits more specific and insightful transformation, code generation, and documentation than is possible with existing syntactic methods. Empirical Study of Programs Our method can also be used for empirical study of programs at the conceptual level. We described a methodology using UNPROG to improve recognizer performance, acquire plans, catalog natural plans and concepts, and characterize program populations with respect to aspects like recognizable and unrecognizable concepts, plan use, and domain features. For example, the distribution of plan recognition rates is a measure of the population's planfulness and can be used to test the hypothesis that programs are planful. Future Work and Implications Future work includes empirical studies using UNPROG, method extensions, and expansion to other program understanding domains. Larger studies with UNPROG will characterize program populations and method performance, and aid plan library acquisition and development. Method extensions include changes in the program representation, plan representation, and the recognition method. For example, abstract branches and other decomposition principles will be investigated. Expansion beyond bottom-up control plan instance recognition includes non-control concepts, hierarchical recognition, use of additional information, and advanced program understander architectures. Besides restructuring, our method can be applied to reverse engineering and re-engineering tools for applications like analysis, documentation, and translation. Recognizing and cataloging control concepts and plans has implications for future maintenance and development systems that represent and manipulate programs using programming concepts, including programming assistants, environments, and interfaces. Empirical studies using program understanders can confirm and bound program understanding theories. Conceptual data and program understanding models have many uses in software psychology. For example, they can identify natural concepts and programming knowledge, and describe their role in program comprehension and programming. Implications extend to complexity measures, language design, structured programming, and education. ---- * This work has taken place in the Qualitative Reasoning Group at the Artificial Intelligenc * Origin: Seaeast - Fidonet<->Usenet Gateway - sunbrk (1:343/15.0)