abbott@aero.ARPA (Russell J. Abbott) (06/03/88)
Can anyone refer me to a system that does data structure transformations? The general problem is to define interfaces between pre-exising tools in an environment. A particular example is to use P-Nut, a petri net analyzer, on a system produced by Teamwork, a data flow diagrammer. Besides allowing one to draw data flow diagrams, Teamwork also allows one to draw process activation tables. For example, the following are two rows in a process activation table. 1. The first row says that if condition C1 holds and condition C4 does not hold, the processes P1 and P4 should be started. Once they finish, process P5 should be started. 2. The second row says that if conditions C1 and C2 hold and condition C3 does not hold, processes P1 and P3 should be activated. When P1 and P3 complete, process P2 should be activated. When it completes processes P1 and P3 should be reactivated. When they complete a second time, processes P4 and P5 should be activated. A complete process activation table would consist of any number of such rows, each row having a unique distribution of +'s and -'s among the conditions. C1 | C2 | C3 | C4 || P1 | P2 | P3 | P4 | P5 ----+----+----+----++----+----+----+----+---- + | | | - || | 1 | | 1 | 2 + | + | - | || 1,3| 2 | 1,3| 4 | 4 .... The petri net input to say the same thing looks like the following. C1, not C4 -> row(1,1), P2s, P4s row(1,1), P2f, P4f -> row(1,2), P5s row(1,2), P5f -> <empty> C1, C2, not C3 -> row(2,1), P1s, P3s row(2,1), P1f, P3f -> row(2,2), P2s row(2,2), P2f -> row(2,3), P1s, P3s row(2,3), P1f, P3f -> row(2,2), P4s, P5s row(2,3), P4f, P5f -> <empty> P1s -> P1f P2s -> P2f P3s -> P3f P4s -> P4f P5s -> P5f where (a) P1s, P2s, P3s, P4s, and P5s stand for the start of P1, P2, P3, P4, and P5; (b) P1f, P2f, P3f, P4f, and P5f stand for the finish of the same processes; and (c) row(<row_number>, <level>) identifies a row executing at a particular level. So the problem is to transform the information from table form into petri net form. The question, though, is not how to write a program for this particular transformation, but to develop a more general system in which transformations of this and similar kinds can be described and implemented. I have already found the following work. 1. IDL, an Interface Description Language by David Alex Lamb of Queens University. IDL was developed as part of CMU's Production Quality Compiler Compiler project. It allows one to describe abstractly the data structures that two system components agree on. (Since the components agree on a data structure, this problem is not the same as the one I'm asking about.) IDL then generates reader and writer code for that abstract data structure in whatever concrete terms the two components require. IDL is described in "IDL: Sharing Intermediate Representations,", TOPLAS, July, 1987. 2. FORMAL a system for transforming hierarchical databases. In FORMAL, one provides a description of an input database along with a sketch of the desired output structure expressed in terms of the input structure. Basically, this is what I'm asking for, but FORMAL is limited to hierarchical databases. This work reminded me of Query By Example except that the output may be a (hierarchical) database structure and not just a list of tuples satisfying the sketched conditions. FORMAL is described in "Automatic Data Transformation and Restructuring," by Nan C. Shu, Proc. 3rd Intl. Conf. on Data Engr., 1987. 3. Stage, a system for generating application generators by J. Craig Cleaveland. The application generators Stage generates are, in effect, data transformation systems. The input to Stage is (a) a grammar describing the input to the desired application and (b) a sketch-like description of the desired output of the application expressed in terms of the parse tree of the input. Stage then generates a program to perform such transformations. Stage is described in a forthcoming article "Building Application Generators," in IEEE Software, July, 1988. It is clear that the problem as posed it is not well formed. That is, any program could be described as a data structure transformer, so asking for a system in which to describe more or less arbitrary transformations is asking for too much. One approach would be to limit the problem to a system for describing transformations that operate on the structure of the data and that are not dependent on the content. That is not satisfactory for a couple of reasons. - For one thing the Teamwork/P-Nut example given above does not fit this description since the content of the data structure must be examined at least to the extent of determining concurrency and process sequencing. - Even worse, since a two counter machine is Turing equivalent, a system that deals generally with a pair of lists whose lengths (but not content) matter is equivalent to a general purpose programming language. All that notwithstanding, there does seem to be some value in developing a notation in which data structures and transformations on them can be described--and then automatically implemented. In addition, it seems that prolog would be a suitable language in which to develop such a system. As an experiment, a program has been written in C-Prolog that transforms data transformation descriptions into corresponding data transformation programs. The input to be transformed is assumed to be given as a set. For this example, each set element is a pair, where the first component is a set of positive and negative conditions and the second is a set of processes and their execution levels. Thus the original table is assumed to be available in the following form on the file teamwork_to_pnut. [c1/(+), c4/(-)], [p2/1, p4/1, p5/2]. [c1/(+), c2/(+), c3/(-)], [p1/1, p1/3, p2/2, p3/1, p3/3, p4/4, p5/4]. In specifying the output, the transformation description (see below) includes new symbols: start(Process) and finish(Process), corresponding to Ps and Pf, for each Process P; and row(<Conditions>, <Level>) for each row and level. The actual (although manually formatted) output produced by the generated program is as follows. /* The processes */ start(p1) -> finish(p1) start(p2) -> finish(p2) start(p3) -> finish(p3) start(p4) -> finish(p4) start(p5) -> finish(p5) /* The first row */ [c1, c2, not c3] -> [start(p1), start(p3), row([c1, c2, not c3], 1)] [finish(p1), finish(p3), row([c1, c2, not c3], 1)] -> [start(p2), row([c1, c2, not c3], 2)] [finish(p2), row([c1, c2, not c3], 2)] -> [start(p1), start(p3), row([c1, c2, not c3], 3)] [finish(p1), finish(p3), row([c1, c2, not c3], 3)] -> [start(p4), start(p5), row([c1, c2, not c3], 4)] [finish(p4), finish(p5), row([c1, c2, not c3], 4)] -> [] /* The second row */ [c1, not c4] -> [start(p2), start(p4), row([c1, not c4], 1)] [finish(p2), finish(p4), row([c1, not c4], 1)] -> [start(p5), row([c1, not c4], 2)] [finish(p5), row([c1, not c4], 2)] -> [] Here is the actual data transformaton specification. The input file and its structure is specified as follows. input = file teamwork_to_pnut. structure row = (conditions, processes). The output is also described as a set--the set of Petri net transformations. Four different kinds of transformations are generated. Letting + stand for set union: output(Input) = output1(Input) + output2(Input) + output3(Input) + output4(Input). The individual transformations may be described in traditional set notation as follows. (The notation {Term | Predicate} refers to the set of Terms that satisfy Predicate.) 1. Each row has a unique starting Petri Net transition. Its LHS is the set of conditions that characterize the row. Its RHS is the set of processes at level 1. output1(Input) = {conditions(Row) -> rhs(Row, 1) | Row in Input}. where conditions(Row) = pos_conds(Row.conditions) + neg_conds(Row.conditions). pos_conds(Conditions) = {Condition | Condition/(+) in Conditions}. neg_conds(Conditions) = {not Condition | Condition/(-) in Conditions}. rhs(Row, Level) = {start(Process) | Process/Level in Row.processes} + {row(conditions(Row), Level)}. 2. The second kind of transformation appears once for each Process. Each Process has an associated transformation from its start to its finish. output2(Input) = {start(Process) -> finish(Process) | Row in Input, Process/_ in Row.processes}. where lhs(Row, Level) = {finish(Process) | Process/Level in Row.processes} + {row(conditions(Row), Level)}. 3. The third kind of transformation links the termination of Processes at one level to the start of Processes at the next level. output3(Input) = {lhs(Row, Level) -> rhs(Row, Level+1) | Row in Input, _/Level in Row.processes, _/(Level+1) in Row.processes}. 4. The fourth kind of transformation terminates the processing of the rows. There is one for each row. output4(Input) = {lhs(Row, Level) -> [] | Row in Input, _/Level in Row.processes, _/(Level+1) not in Row.processes}. The preceding does the required transformations, but it has limitations. - The transformations are done abstractly. In practice, the concrete representations of the input and output must be dealt with also. - The generated program is not optimized. - In this example, sets suffice. For some other problem some data structure other than sets may be required. But since any data structure can be mapped onto relations, perhaps this is all we need. All in all, it doesn't seem like a bad start. My question is whether anyone knows of existing work along these or similar lines? Thanks, -- Russ Abbott