[comp.lang.c] Summary: Restructuring and Reverse Engineering.

orville@weyrich.UUCP (Orville R. Weyrich) (05/25/91)

Please e-mail responses to orville%weyrich@uunet.uu.net or else post to
comp.software-eng (note the followup line).

> My questions:
> 
> 	I understand that there are several program restructuring tools 
> 	currently available which help turn old GO TO laden COBOL and FORTRAN
> 	code into more structured code.
> 
> 	What products are available now, and what (if any) research is
> 	presently being done in this field?
> 
> 	I have heard rumors to the effect that the present tools are less
> 	than satisfactory. I would be interested in hearing of the experiences
> 	(good and bad) of anyone who has used these tools. Bad experiences
> 	are particularly useful, since they help clarify where improvements
> 	need to be made.
> 
> 	References to any articles on the subject would also be appreciated.
> 
>	What companies are currently using/developing program restructuring
>	tools that I might offer my services to as either a consultant or an
>	employee? Does anyone know of names of people withing these
>	organizations that I could contact?
>
>
>"Reverse Engineering" can mean a lot of things to different people, including
>the following:
>	1) Support for extracting "business rules" from an existing program.
>		This becomes necessary when dealing with dusty-deck code
>		when the people in an organization that were responsible for 
>		creating the program have disappeared, along with those 
>		that knew how the business was run prior to the installation 
>		of the program. Here, we are talking about "knowledge 
>		acquisition" as is done in building expert systems,
>		but in this case, there are no human experts left, only a
>		bunch of undocumented source code (or worse!). 
>	2) Support for "Program Archeology" or "Design Recovery" on an 
>		existing program. This is done to recover the program design 
>		documents which have been lost (or never made). 
>		This becomes necessary in order to maintain the code.
>	3) Producing a "work-alike" product by working backwards from the
>		external behavior to figure out how to code a new application
>		which is a clone, but which does not use the original code.
>		An example would be the "NewWord" product which cloned the
>		"WordStar" word processor. This technique is under legal attack,
>		and is not what I have in mind.
>
>I am aware of rumors that companies such as ViaSoft and Transform Logic
>Corporation are preparing products to support Reverse Engineering, but
>would like any additional information which the net knows of. In particular,
>I know that Transform Logic Corporation demonstrated at their "Transform User's
>Group" last year a product under development which converts dirty old COBOL
>decks into a hypertext format for browsing by the programmer.
>
>I suspect that many consulting/contract job shops have tool suites that they 
>use in-house.
>
>My questions:
>
>	Does anyone know of companies interested in this kind of stuff that
>	I might offer my services to (as either a consultant or an employee?)
>
>	What products are available now, and what (if any) research is
>	presently being done in this field?
>
>	I have heard rumors to the effect that the present tools are less
>	than satisfactory. I would be interested in hearing of the experiences
>	(good and bad) of anyone who has used these tools. Bad experiences
>	are particularly useful, since they help clarify where improvements
>	need to be made.
>
>	References to any articles on the subject would also be appreciated.
>	I am aware of the issue of IEEE Software on Reverse Engineering
>	(January 1990).
>
>Thanks,
>
>Orville.

--------------------------------------           ******************************
Orville R. Weyrich, Jr., Ph.D.                   Certified Systems Professional
Internet: orville%weyrich@uunet.uu.net             Weyrich Computer Consulting
Voice:    (602) 391-0821                         POB 5782, Scottsdale, AZ 85261
Fax:      (602) 391-0023                              (Yes! I'm available)
--------------------------------------           ******************************

Thanks to all who replied to my postings regarding program restructuring and 
reverse engineering. I would be happy to get any additional comments and/or 
feedback on this information.

Note that I also posted a question regarding program translation, the
responses to which I will summarize later [when responses stop coming in].

-------------------------------------------------------------------------------
I SUMMARIZE BELOW COMMENTS BY:

	straub@cs.UMD.EDU (Pablo A. Straub)
	johnson@natasha.ics.hawaii.edu (Philip Johnson)
	lamson@el1.crd.ge.com (Scott Lamson)
	coxs@dg-rtp.dg.com (Stan Cox)
	hen@bucsf.bu.edu (Wm. H. Henneman)
	colbrook@concerto.lcs.mit.edu (Dr Adrian Colbrook)
	Keith.Bierman@Eng.Sun.com (Keith Bierman fpgroup)
	cassel@sce.carleton.ca (Ron Casselman)
	jls@netcom.com (Jim Showalter)
	hartman@cs.utexas.edu (John Hartman)
	Kevin.Lano@prg.oxford.ac.uk (Kevin Lano)
	oneel@heawk1.rosserv.gsfc.nasa.gov (Bruce Oneel)
	render@zeppo.Colorado.EDU (Hal Render)
	franklin@enuxha.eas.asu.edu (Malcolm L. Franklin)
	metcalf@crnvma.cern.ch (Mike Metcalf)
	dalamb@umiacs.UMD.EDU (David Lamb)
	dent@DIALix.oz.au (Andrew Dent)
	agnew@ast.saic.com (Robert Allen Agnew)
	paulg@tplrd.tpl.oz.au (Paul Gittings)
	nol2105@dsacg2.dsac.dla.mil (Robert E. Zabloudil)
	iwen2@c3po.d.umn.edu (ichang wen)
	alan@mel.dms.csiro.au (Alan Miller)
	map@cadillac.siemens.com (Michael Platoff)

-------------------------------------------------------------------------------
MY CONCLUSIONS:
	I found no comments on successful reverse engineering products beyond
	simple cross-reference generators and flow-chart generators.

	Restructuring is widely done; the problems reported have to do with
	obfuscation of the code when a "forbidden" control structure (such as
	an exit in the middle of a loop) is translated into a much less obvious
	WHILE loop with less obvious nested if statements in the body.
	Other problems include ignoring the program semantics and data
	dimension of the program.

	Work continues at a number of research and commercial institutions
	in the area of restructuring and reverse engineering.

-------------------------------------------------------------------------------
PUBLICATIONS TO LOOK AT:

	"From Code to Z Specifications" P. Breuer, K. Lano, in the  
	1989 Z User Workshop Proceedings: Springer-Verlag Workshops 
	in Computer Science. 

	"Reverse-Engineering and Validating COBOL" by K. Lano.  

	Ward M., Calliss F.W., Munro M. "The Maintainer's Assistant",
 	Proceedings of Conference on Software Maintenance 1989, Miami, 
	Florida, IEEE Computer Society Press, Oct 1989. pp307-315.

	The Proceedings of the Conference on Software Maintenance (1986-1990)
	published by IEEE have various articles on reverse engineering and 
	related topics. 

	Look at a few articles in Girish Parikh's Tutorial on Software 
	Maintenance (IEEE publication).

	Keep you eye open for the "Proceedings of the 13th International 
	Conference on Software Engineering" that was held in Austin earlier 
	this month [May 1991].  

	Names to search for in the literature are:
		Harry Sneed 
		Mitchell D. Lubars
		Balasubramaniam Ramesh
		Ted J. Biggerstaff.

	Ted J. Biggerstaff, "Design Recovery for Maintenance and Reuse,"
	IEEE Computer, Vol. 22, No. 7, July 1989, pp 36-50.

	"Extracting and Restructuring the Design of Large System " 
	IEEE S.E. Jan 90 by Song C. Choi and Walt Scacchi in USC.

	"An Ada Restructuring Assistant", P. Johnson, 	D. Hildum, A. Kaplan, 
	C. Kay, J. Wileden. In 	the Proceedings of the Fourth Annual 
	Conference on Artificial Intelligence and Ada, Fairfax, VA. 1988.

	The Programmer's Apprentice work done at MIT.

	Look through SIGPLAN and Communications of the ACM mid-seventies. 

	A big name in the was Amir Pnueli (sp?) from the University of Texas 
	at Austin.

	John Hartman.  Understanding Natural Programs Using Proper
	Decomposition.  13th International Conference On Software Engineering
	- ICSE-13, May 13-16, 1991.

	John Hartman.  AUTOMATIC CONTROL UNDERSTANDING FOR NATURAL PROGRAMS.
	PhD thesis, The University of Texas at Austin, Dept. of Computer
	Sciences.  May 1991.

	John Hartman.  Pragmatic Program Understanding For Automated
	Re-engineering.  In International Computer Software and Applications
	Conference - COMPSAC '90. IEEE Computer Society Press, October
	31-November 2, 1990.  Position paper for panel: ``Industrial Experience
	in Automating Software Re-engineering''.

	John Hartman.  Automatic Program Understanding and the Maintenance
	Interface.  In Second Annual User-System Interface Conference, Austin,
	Texas, February 20-21, 1987.

	John Hartman.  Automatic Program Understanding For Cobol
	Restructuring.  In Computer Science Research Review, Dept. of Computer
	Sciences, November 1986. The University of Texas at Austin.

	John Hartman.  Restructuring Cobol Programs Into Abstract Data Type
	Modules.  Technical Report SDBEG-21, Dept. of Computer Sciences, The
	University of Texas at Austin, 1980.  Implementation is described in
	``XTC User's Guide'', SDBEG-18.

-------------------------------------------------------------------------------
PRODUCTS AVAILABLE:

	SPAG program marketed by Polyhedron Software. Try their toll-free 
	fax no.:  1-800-777-5519. It is my understanding that OTG takes 
	care of normal orders:
	      OTG
	      voice 717 222 9100
	      fax   717 229 9103
	Alternatively, one may wish to go to the source
	       Polyhedron Software Ltd
	       Magdalen House
	       98 Abingdon Road
	       Standlake
	       Witney
	       Oxon OX8 7RN

	       Tel 0865 300 579
	[See review below].

	PRISM OTG Systems [same as above?]

	Mentor Graphics  has a CASE tool called CASEStation.  One of the 
	support tools for this product will take "C" source and produce a 
	structure chart. Support for CASEStation has recently moved to the UK.

	CADRE Technology, the makers of TEAMWORK, also have a similar product 
	called C-REV.  This again takes "C" source and produces a structure 
	chart. CADRE unlike Mentor also have products to aid in the 
	development of Cobol programs I don't know if they have any reverse 
	engineering products for Cobol.

	RECODER tool from Language Technology, Inc.  [See review below].

	Toolpack has tools for restructuring code.  [See review below].

	FORGE  Pacific Sierra Research   
		analysis/restructurer/vectorizer/parallelizer

	FAUST from U of Illinois/U of Indiana  dennis gannon

	Commercial tool is available from Bachman.

	Commercial tool is available from XA Systems.

	FORWARN/etc  Quibus Enterprises

	Fortran Development Tools Quibus. Pretty sort of like SPAG. 
		Seems less powerful.

		Outline: indentation creator with bonus graphic symbols 
		to highlight logic. 

		PREP   : another conditional compilation, includization tool.

		Not exciting as such, but prices start at $500, so it is at 
		least cheap. 

	FOR_STRUCT from Cobalt-Blue. Like SPAG. Restructures Fortran 
		particularly old Fortran (Fortran 66/ Fortran IV) into 
		Fortran 77.   The same company produces FOR_C which 
		translates Fortran -> C. The company really pushes automated 
		fortran->c|c++ translation which [some folks think] is a very, 
		very bad idea. May be a great implementation though. 
		408 723 0474. 


-------------------------------------------------------------------------------
OTHER COMMENTS BELOW

-------------------------------------------------------------------------------
[This was posted independently rather than in reply to my posting, 
but is relevant]

Subject: Experience with Ovum Ltd. reports?
Summary: Any experiences with Reverse Engineering report by Ovum Ltd.?
Message-ID: <map.674776195@honda>
Date: 20 May 91 21:49:55 GMT
Sender: news@siemens.siemens.com

We're considering purchasing a research report from Ovum Ltd.
entitled "Reverse Engineering: Markets Supplier and Products."
The report is fairly expensive, so I'd like to make sure it's
worth the money.  I asked Ovum for references or excerpts from a
section of the report, but they seem to have overlooked my
request.  Has anyone purchased an Ovum report?  What did you
think of it?  In particular, has anyone read the Reverse
Engineering report?

Michael Platoff               email: map@cadillac.siemens.com
Siemens Corporate Research    phone: (609) 734-3354
755 College Road East
Princeton, NJ 08540-6668

-------------------------------------------------------------------------------
Although [Dent's] FORTRAN-Pascal translation currently performs 
minimal restructuring, there is much potential. This has come
about through the extensive use of Object-oriented programming
techniques and decomposition of programs into chained and nested
objects.

If possible, could you please append a note to your posting
suggesting OOP can play an important part in writing restructuring
tools, particularly if the programmer wants to stay away from
languages such as LISP.

It would be very interesting to see performance comparisons between
a similar tool in some of the more exotic languages vs Object Pascal
or C++.

-------------------------------------------------------------------------------
As part of the REDO ESPRIT project, on the maintenance of systems, we  
are doing work on restructuring COBOL code by means of abstracting these  
codes to a functional language, and then performing transformations on  
this representation, to eliminate recursions that are not in WHILE-IF  
form.  
 
We are interested in applying machine-learning techniques to determine  
useful restructuring strategies, but not much work has yet been done by  
us on this. Tools that transform code to the functional language, and  
support the user in re-writing this code, have been developed in Prolog.  
 
Work at Durham University by Martin Ward also involves program 
transformation using formal methods. Their REFORM Project, to 
reverse-engineer assembly language code to Z, is described in his
paper cited above.
 
-------------------------------------------------------------------------------
There is a guy at MCC (Consortium in Austin TX) that is trying to apply
neural network technology to reverse engineering; it's called the TAO 
subsystem of the DESIRE project. A lot of the stuff they publish is in 
MCC Proprietary publications (you gotta pay MEGA$ to get access to these).

-------------------------------------------------------------------------------
I am currently working on a translation of MS-DOS FORTRAN programs to
THINK Pascal on the Macintosh, for a local government department.

As part of the job I've produced  (well, nearly finished) a translator which
will do most of the job, leaving only minor issues such as filename strings
to be completed by hand.

-------------------------------------------------------------------------------
We have developed these tools for Fortran and Cobol and are developing
tools for Jovial. Contact John Hampton, 619-456-6017.

-------------------------------------------------------------------------------
I've seen the Mentor Graphics tool called CASEStation and it worked rather 
well with the code we supplied, it could use some more work in the actual 
layout of the structured chart.

-------------------------------------------------------------------------------
Our organization has the RECODER tool from Language Technology, Inc.  I used it
on a program that "followed" me when I transferred up from DESC in Dayton, and
was fairly pleased with the results.  My only "complaint", if you will, is that
I had to go back in and "fix up" the result.  I have a certain way I like my
programs to look, and RECODER didn't do it the way I usually do.  The biggest
tangible difference was the complete lack of GO TOs; we are allowed to GO TO
an EXIT, which helps hold down your program size and complexity.

-------------------------------------------------------------------------------
I am working on my Master's project concerning updating a graphics package. 
Part of the project lies in restructuring.  There are several folds 
of restructuring. Deleting GO_TO's is certainly one of the main streams in
business. My preference is in modules and variables.

The program I am working on is a one_shot program. i.e. all codes in one file 
and no separate compilation. 

For module restructuring I classify the routines then put them in several 
files which I think each one has it's own major data structure and consistent 
functionality. About the variables they were declared excessively globally. 
I try to minimize their scopes and confine the global declaration. 
------------------------------------------------------------------------------
You did not mention a  very  important  use  of  program  restructuring:
discovering  parallelism  in programs (yeah, they usually talk about the
old dusty deck of FORTRAN code, but they also mean new programs). I have
not a particular reference to give you on this, but any book, conference
or journal on parallel programming will have plenty of articles.

On the topic of program restructuring for parallelization there is mixed
success: On one hand  experienced  programmers  can  write  code  (in  a
special  version of FORTRAN) that will parallelize very well, and on the
other hand if you write code unaware of  the  parallelizer  (is  that  a
word?) you achieve poor performance.

------------------------------------------------------------------------------
I helped develop a system called ARA for module-level restructuring in Ada 
using AI techniques.  

ARA is implemented in Common Lisp on a TI Explorer using GBB, a blackboard 
architecture development environment.

Our research was concerned with supporting evolution in the composition of 
Ada packages over time, rather than removing GOTOs, but the techniques might 
be interesting to you.

------------------------------------------------------------------------------
Toolpack has tools for restructuring code.  Good points:  toolpack is
not just an isolated tool, but has modular tools for doing many
things: restructuring, redoing declarations, rolling/unrolling loops,
checking for portability.  Bad points:  pretty strict F-77, with few
extensions supported.  If you cannot get toolpack to "compile" your
code, none of the other tools will be usable.

------------------------------------------------------------------------------
[Indirect reference to] Eric Bush at Language Technology.  Their
main product was a Cobol reformatter, and I think it initially was
written in PL/I and ran on an IBM mainframe.  The last time I heard, 
they were working on a prolog version, and the company was doing pretty well.  The
The company is located in Salem, MA.

------------------------------------------------------------------------------
I wrote a program to translate 4.5 Megalines of a very strange FORTRAN
dialect into idiomatic C.  By idiomatic, I mean that the resulting code
should look like it had been written by fluent C programmers, not
translated FORTRAN.

I used an ATN as the basic translating engine, and still think it was a 
reasonable choice.

------------------------------------------------------------------------------
As far as AI based systems goes you might want to look at the Programmer's 
Apprentice work done at MIT.  Those papers mention the reversal of their 
techniques to perform reverse engineering.  I did some work in this area 
when I was at Surrey University and most of the results were published at 
the IEEE conference. Simply source code restructuring is algorithmic and 
probably does not require the application of AI techniques.  The real win 
for AI would be in reverse engineering.

------------------------------------------------------------------------------
My experiences with such tools are usually quite good. Here is a
review I did of one.

Subject: SPAG: a Fortran restructuring tool (preliminary notes) <long>


		    Executive Summary
		   ======================

	If neither you nor your customers ever use Fortran, skip this
	review. 

	If all your Fortran code is brand new, this product is of some
	use, but not wonderful.

	If much of you (and your customers) Fortran code dates back to
	FORTRAN II ... you (they) probably need this product or one
	like it.

	Has some rough edges, but worth serious consideration.


I've just obtained a demo copy of Polyhedron's SPAG tool (and
friends). I've run a few 10K's lines of code through it and have
formed some initial impressions. A full review will come later. 

It has two major components:
==== CRAM
	Provides a report of who calls who. Checks arg lists for
	consistent number of arguments. Provides a potential overlay
	map (really handy on Virtual Memory systems :>).

===== SPAG

SPAG provides several functions which:

0)   Can make your life easier as you try to port huge hunks of code
     for benchmarking. Can also make optimizations easier to spot ...
     also handy for you benchmarkers out there.

1)   Simple "pretty printing" mode, can be used to help encourage
     (enforce) coding style guidelines. Especially helpful when you
     start out with a Million Lines of raw sewage.

2)   Can be configured to replace all 

	 DO 1 J=1, N ... 1 x=10

     with

         DO j=1, n .... x=10 ; END DO

     Which, as it happens can help to expose good candidates for loop
     fusion, viz.

      DO     4 I=1,20
    4 DPDTT  (I)  =INI
      DO     5 I=1,20
    5 TDPDT  (I)  =INI
      DO     6 I=1,20
    6 DBSOR  (I)  =INI
      DO     7 I=1,20
    7 TSOR   (I)  =INI
becomes
      DO i = 1 , 20
         dpdtt(i) = ini
      ENDDO
      DO i = 1 , 20
         tdpdt(i) = ini
      ENDDO
      DO i = 1 , 20
         dbsor(i) = ini
      ENDDO
which is trivally changed to 

      DO i = 1 , 20
         dpdtt(i) = ini
         tdpdt(i) = ini
         dbsor(i) = ini
      ENDDO

Which executes faster. When we have run out of bigger automatic
optimization wins, and other scheduled work, the compiler will
probably do this for you. But for now, SPAG can help you spot this
sort of thing. Especially probable is that the original code had
assigned goto's and such ... and only after SPAG is this sort of thing
obvious. 

      IF(Z6.GT.ZMAX)GOTO 50
      IF(Z6.LT.ZMIN)GOTO 45
      I6FT=I
      J6FT=J
  45  IF(Z8.GT.ZMAX)GOTO 50
      I8FT=I
      J8FT=J
      GOTO 55
   50 CONTINUE
   55 CONTINUE

becomes

            IF ( z6.LE.zmax ) THEN
               IF ( z6.GE.zmin ) THEN
                  i6ft = i
                  j6ft = j
               ENDIF
               IF ( z8.LE.zmax ) THEN
                  i8ft = i
                  j8ft = j
                  GOTO 100
.... etc.

     Don't get unreasonable expectations, the revised code is likely
     to run a few %faster, but not a lot faster ... and can be
     modestly slower too. The point is that after SPAGization (pity
     they didn't call it DESPAG :>) transformations are likely to be
     easier. 

3)   About 50 user configurable transformations. I suspect that most
     users will pick a transformation and stick to it.

As I said in the opening, I've run a few tens of thousands of lines
through SPAG and it works,  and aside from the
MVS/PC style of interaction it does quite well. The code produced
worked correctly (there were some warnings which the dedicated would
have fixed up) and produced identical answers (although FP arithmetic
being what it is, this could vary a bit ... just as optimization
levels can impact 'em). With the nattering redirected to a file, on a
4/260 operating on NFS mounted files (so it's very like a 4/60), the
conversion rate was a respectable 20Klines/min.

------------------------------------------------------------------------------
There have been such tools around for many years, typically site
specific, and maintained by the local gurus (remember that most of the
C tools we take for granted, cxref, flow, etc. are based on Fortran
tools which predate them ... but never made it to Unix). Alas, much of
the good culture got left behind when we (as an industry) unplugged
the mainframes and shot all the support staff. This is the first
commercial tool which I've seen running on Sun boxes.

-------------------------------------------------------------------------------
I looked at this sort of thing several years ago when contracted to
create a "smart" FORTRAN to Ada translator (it was never built).
Anyway, we were able to track down many papers on program
restructuring. This was a really hot topic in the 70s. When it proved
to not solve the bigger problems "software engineering" came into the
picture. 

I remember contacting the Army about tools for restructuring (ROME
Defense Establishment I think) and most were currently not in use. 

-------------------------------------------------------------------------------
My feeling about program restructuring tools: GIGO.

-------------------------------------------------------------------------------
		     Summary of Doctoral Research

			     John Hartman

		 Artificial Intelligence Laboratory*
		   Department of Computer Sciences
		  The University of Texas at Austin
		       Austin, Texas 78712-1188
	       hartman@cs.utexas.edu     (512) 471-9586


Program understanding involves recognizing abstract concepts like
``read-process loop'' in existing programs.  Programmers spend much of
their time understanding programs, so studying and automating the
process has many benefits.  For example, program understanding plays a
major role in software maintenance, and progress in program
understanding can make an important contribution to maintenance tools
and methods.

Programming plans are units of programming knowledge connecting
abstract concepts and their implementations. Existing research assumes
that plan instances can be recognized to recover the programmer's
abstract concepts and intentions.  This approach has not been
confirmed empirically.  Because of input restrictions and performance
limitations, current program understanders have only been demonstrated
with small example programs.  A program understander that runs on a
significant sample of real-world programs is needed to investigate the
feasibility, limits, and applications of plan instance recognition.

This dissertation describes a general and useful program understanding
method and its implementation, evaluation and applications. Control
concepts are abstract notions about interactions between control flow,
data flow, and computation, eg. ``do loop'', ``read-process loop'',
and ``bounded linear search''. We present a practical method for
bottom-up control concept recognition in large, unstructured
imperative programs . The method is implemented in the UNPROG program
understander and tested with Cobol and Lisp source programs. UNPROG's
value is demonstrated in Cobol restructuring and the empirical study
of programs. Contributions include:
  
  A recognition method that is robust, efficient and scalable  

  Implementation, testing, and evaluation 

  Improved Cobol restructuring using recognized concepts

  Empirical study of programs at the conceptual level.


Recognition Method 
  
Plan instances and concepts are recognized by comparing a program
representation against a library of standard plans.  The program
representation is an abstract program model given by a hierarchical
control flow/data flow graph. The model is decomposed into a tree of
sub-models using propers (single entry/exit control flow sub-graphs).
Plans are represented by similar graphs with added qualifications.
Recognition is based on simple matching between sub-models and plans.
It produces model-plan-concept bindings.


Implementation, Testing and Evaluation 
  
The method was implemented in the UNPROG program understander and
tested using Lisp and Cobol analyzers and a plan library.  The
method's design and analysis were confirmed in tests where efficient
control and data flow checking procedures produced low recognition
costs.
 
The method was evaluated for performance and generality over
languages, programs, and concepts.  Comparing sub-models and plans is
efficient because sub-models are small and have restricted, canonical
control flow.  Their abstraction structure focuses recognition on
criterial program features and hides non-plan material.  The model can
be formed for all language constructs which permit static
determination of control and data flow. The number of sub-models and
comparisons increases linearly with program size. These performance
and scaling advantages decrease as plans and concepts correspond less
well to the sub-models produced by proper decomposition.


Cobol Restructuring 

To show how our method can raise the level of reverse engineering and
re-engineering tools, we demonstrated its application to automatic
Cobol restructuring. The RESTRUC restructurer uses the
model-plan-concept bindings found by UNPROG.  Knowledge associated
with plans and concepts interprets the cause of unstructured program
parts and permits more specific and insightful transformation, code
generation, and documentation than is possible with existing syntactic
methods.


Empirical Study of Programs  

Our method can also be used for empirical study of programs at the
conceptual level.  We described a methodology using UNPROG to improve
recognizer performance, acquire plans, catalog natural plans and
concepts, and characterize program populations with respect to aspects
like recognizable and unrecognizable concepts, plan use, and domain
features.  For example, the distribution of plan recognition rates is
a measure of the population's planfulness and can be used to test the
hypothesis that programs are planful.


Future Work and Implications  

Future work includes empirical studies using UNPROG, method
extensions, and expansion to other program understanding domains.
Larger studies with UNPROG will characterize program populations and
method performance, and aid plan library acquisition and development.
Method extensions include changes in the program representation, plan
representation, and the recognition method. For example, abstract
branches and other decomposition principles will be investigated.
Expansion beyond bottom-up control plan instance recognition includes
non-control concepts, hierarchical recognition, use of additional
information, and advanced program understander architectures.

Besides restructuring, our method can be applied to reverse
engineering and re-engineering tools for applications like analysis,
documentation, and translation.  Recognizing and cataloging control
concepts and plans has implications for future maintenance and
development systems that represent and manipulate programs using
programming concepts, including programming assistants, environments,
and interfaces.

Empirical studies using program understanders can confirm and bound
program understanding theories.  Conceptual data and program
understanding models have many uses in software psychology.  For
example, they can identify natural concepts and programming knowledge,
and describe their role in program comprehension and programming.
Implications extend to complexity measures, language design,
structured programming, and education.

----
*   This work has taken place in the Qualitative Reasoning Group at
the Artificial Intelligence Laboratory, The University of Texas at
Austin.  Research of the Qualitative Reasoning Group is supported in
part by NSF grants IRI-8602665, IRI-8905494, and IRI-8904454, by NASA
grants NAG 2-507 and NAG 9-200, and by the Texas Advanced Research
Program under grant no. 003658175.