[comp.software-eng] Data Transformation

abbott@aero.ARPA (Russell J. Abbott) (06/03/88)
Can anyone refer me to a system that does data structure transformations?
The general problem is to define interfaces between pre-exising tools in
an environment.

A particular example is to use P-Nut, a petri net analyzer, on a system
produced by Teamwork, a data flow diagrammer.  Besides allowing one to draw
data flow diagrams, Teamwork also allows one to draw process activation 
tables.  For example, the following are two rows in a process activation
table.

   1. The first row says that if condition C1 holds and condition C4 does
      not hold, the processes P1 and P4 should be started.  Once they
      finish, process P5 should be started.

   2. The second row says that if conditions C1 and C2 hold and condition
      C3 does not hold, processes P1 and P3 should be activated.  When P1
      and P3 complete, process P2 should be activated.  When it completes
      processes P1 and P3 should be reactivated.  When they complete a
      second time, processes P4 and P5 should be activated.

A complete process activation table would consist of any number of such rows,
each row having a unique distribution of +'s and -'s among the conditions.

                C1 | C2 | C3 | C4 || P1 | P2 | P3 | P4 | P5
               ----+----+----+----++----+----+----+----+----
                 + |    |    |  - ||    |  1 |    |  1 |  2
                 + |  + |  - |    || 1,3|  2 | 1,3|  4 |  4
                                 ....


The petri net input to say the same thing looks like the following.

    C1, not C4                        -> row(1,1), P2s, P4s
    row(1,1), P2f, P4f    	      -> row(1,2), P5s
    row(1,2), P5f	              -> <empty>

    C1, C2, not C3                    -> row(2,1), P1s, P3s
    row(2,1), P1f, P3f		      -> row(2,2), P2s
    row(2,2), P2f      		      -> row(2,3), P1s, P3s
    row(2,3), P1f, P3f 		      -> row(2,2), P4s, P5s
    row(2,3), P4f, P5f		      -> <empty>

    P1s                               -> P1f
    P2s                               -> P2f
    P3s                               -> P3f
    P4s                               -> P4f
    P5s                               -> P5f


where (a) P1s, P2s, P3s, P4s, and P5s stand for the start of P1, P2, P3,
P4, and P5; (b) P1f, P2f, P3f, P4f, and P5f stand for the finish of the
same processes; and (c) row(<row_number>, <level>) identifies a row
executing at a particular level.

So the problem is to transform the information from table form into petri net
form.

The question, though, is not how to write a program for this particular
transformation, but to develop a more general system in which transformations
of this and similar kinds can be described and implemented.

I have already found the following work.

   1. IDL, an Interface Description Language by David Alex Lamb of Queens
      University.  IDL was developed as part of CMU's Production Quality
      Compiler Compiler project.  It allows one to describe abstractly the
      data structures that two system components agree on.  (Since the
      components agree on a data structure, this problem is not the same
      as the one I'm asking about.)  IDL then generates reader and writer
      code for that abstract data structure in whatever concrete terms the
      two components require.  IDL is described in "IDL: Sharing
      Intermediate Representations,", TOPLAS, July, 1987.

   2. FORMAL a system for transforming hierarchical databases.  In FORMAL,
      one provides a description of an input database along with a sketch
      of the desired output structure expressed in terms of the input
      structure.  Basically, this is what I'm asking for, but FORMAL is
      limited to hierarchical databases.  This work reminded me of Query
      By Example except that the output may be a (hierarchical) database
      structure and not just a list of tuples satisfying the sketched
      conditions.  FORMAL is described in "Automatic Data Transformation
      and Restructuring," by Nan C. Shu, Proc. 3rd Intl. Conf. on Data
      Engr., 1987.

   3. Stage, a system for generating application generators by J. Craig
      Cleaveland.  The application generators Stage generates are, in
      effect, data transformation systems.  The input to Stage is (a) a
      grammar describing the input to the desired application and (b) a
      sketch-like description of the desired output of the application
      expressed in terms of the parse tree of the input.  Stage then
      generates a program to perform such transformations.  Stage is
      described in a forthcoming article "Building Application
      Generators," in IEEE Software, July, 1988.

It is clear that the problem as posed it is not well formed.  That is, any
program could be described as a data structure transformer, so asking for a
system in which to describe more or less arbitrary transformations is asking
for too much.

One approach would be to limit the problem to a system for describing
transformations that operate on the structure of the data and that are not
dependent on the content.  That is not satisfactory for a couple of reasons.

   - For one thing the Teamwork/P-Nut example given above does not fit
     this description since the content of the data structure must be
     examined at least to the extent of determining concurrency and
     process sequencing.

   - Even worse, since a two counter machine is Turing equivalent, a
     system that deals generally with a pair of lists whose lengths (but
     not content) matter is equivalent to a general purpose programming
     language.

All that notwithstanding, there does seem to be some value in developing a
notation in which data structures and transformations on them can be
described--and then automatically implemented.  In addition, it seems that
prolog would be a suitable language in which to develop such a system.

As an experiment, a program has been written in C-Prolog that transforms data
transformation descriptions into corresponding data transformation programs.

The input to be transformed is assumed to be given as a set.  For this
example, each set element is a pair, where the first component is a set
of positive and negative conditions and the second is a set of processes
and their execution levels.  Thus the original table is assumed to be
available in the following form on the file teamwork_to_pnut.

    [c1/(+), c4/(-)],         [p2/1, p4/1, p5/2].

    [c1/(+), c2/(+), c3/(-)], [p1/1, p1/3, p2/2, p3/1, p3/3, p4/4, p5/4].


In specifying the output, the transformation description (see below)
includes new symbols: start(Process) and finish(Process), corresponding
to Ps and Pf, for each Process P; and row(<Conditions>, <Level>) for
each row and level.  The actual (although manually formatted) output
produced by the generated program is as follows.


                            /* The processes */

                           start(p1) -> finish(p1)
                           start(p2) -> finish(p2)
                           start(p3) -> finish(p3)
                           start(p4) -> finish(p4)
                           start(p5) -> finish(p5)


                             /* The first row */

    [c1, c2, not c3] -> [start(p1), start(p3), row([c1, c2, not c3], 1)]

    [finish(p1), finish(p3), row([c1, c2, not c3], 1)] ->
                         [start(p2), row([c1, c2, not c3], 2)]

    [finish(p2), row([c1, c2, not c3], 2)] ->
                         [start(p1), start(p3), row([c1, c2, not c3], 3)]

    [finish(p1), finish(p3), row([c1, c2, not c3], 3)] ->
                         [start(p4), start(p5), row([c1, c2, not c3], 4)]

    [finish(p4), finish(p5), row([c1, c2, not c3], 4)] -> []


                             /* The second row */
    [c1, not c4] -> [start(p2), start(p4), row([c1, not c4], 1)]

    [finish(p2), finish(p4), row([c1, not c4], 1)] ->
                                 [start(p5), row([c1, not c4], 2)]

    [finish(p5), row([c1, not c4], 2)] -> []



Here is the actual data transformaton specification.  The input file and its
structure is specified as follows.

    input = file teamwork_to_pnut.

    structure row = (conditions, processes).


The output is also described as a set--the set of Petri net
transformations.  Four different kinds of transformations are generated.
Letting + stand for set union:

    output(Input) =   output1(Input) + output2(Input) +
                      output3(Input) + output4(Input).


The individual transformations may be described in traditional set
notation as follows.    (The notation {Term | Predicate} refers to the
set of Terms that satisfy Predicate.)

   1. Each row has a unique starting Petri Net transition.  Its LHS is the
      set of conditions that characterize the row.  Its RHS is the set of
      processes at level 1.

          output1(Input) =
                  {conditions(Row) -> rhs(Row, 1) | Row in Input}.


      where

          conditions(Row) = pos_conds(Row.conditions) +
                            neg_conds(Row.conditions).

          pos_conds(Conditions) =
                  {Condition     | Condition/(+) in Conditions}.
          neg_conds(Conditions) =
                  {not Condition | Condition/(-) in Conditions}.

          rhs(Row, Level) =
                  {start(Process) | Process/Level in Row.processes} +
                  {row(conditions(Row), Level)}.


   2. The second kind of transformation appears once for each Process.
      Each Process has an associated transformation from its start to its
      finish.

          output2(Input) =
           {start(Process) -> finish(Process) |
                                    Row in Input,
                                    Process/_ in Row.processes}.


      where

          lhs(Row, Level) =
                  {finish(Process) | Process/Level in Row.processes} +
                  {row(conditions(Row), Level)}.


   3. The third kind of transformation links the termination of Processes
      at one level to the start of Processes at the next level.

          output3(Input) =
           {lhs(Row, Level) -> rhs(Row, Level+1) |
                                      Row in Input,
                                      _/Level in Row.processes,
                                      _/(Level+1) in Row.processes}.


   4. The fourth kind of transformation terminates the processing of the
      rows.  There is one for each row.

          output4(Input) =
           {lhs(Row, Level) -> [] | Row in Input,
                                    _/Level in Row.processes,
                                    _/(Level+1) not in Row.processes}.


The preceding does the required transformations, but it has limitations.

   - The transformations are done abstractly.  In practice, the concrete
     representations of the input and output must be dealt with also.

   - The generated program is not optimized.

   - In this example, sets suffice.  For some other problem some data
     structure other than sets may be required.  But since any data
     structure can be mapped onto relations, perhaps this is all we need.

All in all, it doesn't seem like a bad start.  My question is whether anyone
knows of existing work along these or similar lines?

Thanks,

-- Russ Abbott