[mod.std.mumps] std-mumps Digest V1 #12

Hokey (The Moderator) <hokey@plus5.uucp> (03/31/85)

std-mumps Digest            Sat, 30 Mar 85       Volume  1 : Issue  12

Today's Topics:
                           New $QUERY paper
----------------------------------------------------------------------

Date: Wed, 27 Mar 85 17:13:10 est
From: maryland!ihnp4!seismo!osiris!ocsplx!pete
Subject: New $QUERY paper
To: maryland!osiris!aplvax!umcp-cs!seismo!ihnp4!plus5!std-mumps

Hokey,

Here is the latest copy of the $QUERY proposal.

Pete Kuzmak	[I had to edit out the boldfacing and underlining - HMS]







                           $QUERY MUMPS Function Proposal



     INTRODUCTION

          MUMPS combines the power of a simple language with the flexibility  of
     a  hierarchical  database,  and  is  successful  for  a  large  variety  of
     applications.  It  has  been  described  as  a  "linguistically  integrated
     database  system".   Yet,  for all its merits, there are some rather simple
     database tasks that are not  easily  performed  in  MUMPS.   Two  examples,
     copying  a  subtree  and ad-hoc data retrieval, illustrate this point.  The
     following example shows how hard it is to write code to copy a node and its
     descendents.

     Example -- Copy a subtree (Modified from File Manager routine ^%RCR)

     "X" contains the source reference in the form "^GLO(SS1,SS2,...,SSn,"
     "Y" contains the destination reference in the same format

     ENTRY   SET Z=1,A="",C(0)=0,B="" GO NEXT

     UP      SET Z=Z-1,@("B="_$PIECE(A,",",Z+C(Z-1),Z+C(Z)))
             SET A=$PIECE(A,",",1,Z-1+C(Z-1))_$EXTRACT(",",Z>1)

     NEXT    SET @("B=$ORDER("_X_A_"B))"),C(Z)=C(Z-1)
             IF B="" GO EXIT:Z=1,UP
             IF @("$DEFINE("_X_A_"B))#10=1") SET @(Y_A_"B)="_X_A_"B)")
             IF @("$DEFINE("_X_A_"B))<9") GO NEXT

             ;Descend the tree -- Special processing for non-numeric subscripts
             IF +B'=B DO COMMA:B[",",QUOTE:B["""" SET B=""""_B_""""
             SET A=A_B_",",Z=Z+1,B="" GO NEXT

     COMMA   ;Count number of embedded commas for $PIECEing of subscripts
             FOR C=0:0 SET C=$FIND(B,",",C) QUIT:C  SET C(Z)=C(Z)+1
             QUIT

     QUOTE   ;Replace each single quote in a subscript with a double quote
             F C=0:0 S C=$F(B,"""",C) Q:'C  S B=$E(B,1,C-1)_""""_$E(B,C,999),C=C+1 Q
             QUIT

     EXIT    KILL A,B,C,Z QUIT









                                        - 2 -



          The above algorithm requires  concatenating  and  $PIECEing  apart  of
     subscripts  and  a  considerable  amount  of  indirection.  Special code is
     needed for non-numeric subscripts to add the enclosing quotes and to handle
     embedded  commas  and  quotes.  The algorithm must keep track of the level,
     and ascend and descent accordingly.

          The ad-hoc data retrieval task consists of  traversing  a  global  and
     finding specific nodes that satisfy an arbitrary search criteria.  The code
     for this task is quite complicated.  Not only does  the  tree  have  to  be
     transversed,  but  all  of  the nodes have to be checked against the search
     criteria.

          These examples illustrate several reasons why some database tasks  are
     hard in MUMPS.  Both subtree copy and ad-hoc data retrieval are essentially
     sequential tasks.  The hierarchical structure of MUMPS  actually  "gets  in
     the  way" when trying to perform tasks that are not hierarchical in nature.
     MUMPS  lacks  facilities  for  manipulating  subscripts.   As   a   result,
     subscripts  must  be  treated  as  character  strings that are concatenated
     together.  Special care then must be given to embedded quotes, commas,  and
     differentiating  numeric  from  non-numeric subscripts.  Finally, MUMPS can
     only access only a single node  at  a  time.   It  has  no  facilities  for
     handling multi-node operations (except to kill subtrees).

          All of these things make dealing with some tasks more  difficult  than
     necessary.   A  new  set of capabilities for MUMPS can greatly simplify the
     programming while at the same time improving performance.  What  is  needed
     is  a  new  way  to  look  at  a  hierarchical database, new facilities for
     traversing the database and manipulating  subscripts,  and  functions  that
     operate on more than one node a time.























                                        - 3 -



     Name Value Strings

          A hierarchical database can  be  viewed  as  a  flat  sequential  file
     containing  two  items per record, a name value and a data value.  The name
     values and data values together form a relationship.

          A name value is a global or local variable name (glvn)  in  which  all
     the  subscript values are represented by either numeric or string literals.
     Subscripts values which are non-numeric are represented by string  literals
     (i.e.,  they are bounded by quotes and contain pairs of adjacent quotes for
     each embedded quote).

          A variable or an expression may be used for a  subscript  in  a  glvn.
     This  is  fully evaluated to a numeric or a string literal to obtain a name
     value.  A naked reference glvn has its subscripts  similarly  evaluated  to
     numeric  or  string  literals  and then is expanded to obtain a global name
     value.

          Every node in the hierarchical database that has data can be viewed as
     a  name  value/data  value  pair  in the relationship.  The name values are
     unique, they form an ordered set, and each name value can have one and only
     one  data value.  The data value of a node can always be referenced via its
     corresponding "name value" by indirection.


     Example
     SET X=1,Y="ALPHA",Z="BETA"
     SET ^ABC(X,Y,Z)="TEST"

     The name value of this node is the string ^ABC(1,"ALPHA","BETA").
     The data value is "TEST".



          It is possible to define a "name value string" in MUMPS  as  a  string
     which  has  the form of a name value.  New functions and operators can then
     be defined to allow MUMPS  to  work  with  name  values.   The  ability  to
     sequentially  traverse  multi-level  global  or  local arrays could then be
     provided.  The new functions would allow a search criteria to  be  used  in
     accessing  the  database.   The  new  functions and operators would return,
     compare, or manipulate name value strings, making it as easy to  work  with
     the  hierarchical  keys  as with the data.  Such capabilities would provide
     new ways to access the database, making the global copy tasks and  movement
     of subtrees trivial, and greatly simplifying ad-hoc data retrieval.








                                        - 4 -



     New Functions and Operators

          The new $QNAME function would return the "name value" of a glvn.


     Syntax:   $QNAME(glvn)          Abbreviation: $QN

     Function: Return the "name value" of the glvn.

     Example
     SET X=1,Y="ALPHA",Z="BETA"
     SET A=$QNAME(^ABC(X,Y,Z))
     SET ^ABC(X,Y,Z)="TEST"
     SET B=$QNAME(^(Z))

     The variables "A" and  "B"  both  have  the  same  value,  the  name  value
     ^ABC(1,"ALPHA","BETA").


          The $QNAME function would also have a two-argument  form.  The  second
     argument,  an  integer  expression  (intexpr)  would  limit  the  number of
     subscripts of the "name value" returned by $QNAME.  If "n" were  the  value
     of  the  limit,  only  the  first  n-subscripts  of the name value would be
     returned.  If the name value contained more subscripts than the limit, they
     would  be  ignored.   This  form  of the $QNAME function would be useful in
     checking descendency in sub-trees.


























                                        - 5 -



     Syntax:   $QNAME(glvn1,intexpr2)

     Function: Return "name value" containing a numerically limited number
               of subscripts.

         Let $QNAME(glvn) be of the form Name(S1,S2,...Sn), having
         "n" subscripts, and let "m" be the value of intexpr.
         Then $QNAME(glvn,intexpr) is defined as follows:

             If m is less than 0, it is an error.

             If m=0, return only "Name".

             If m>0, m<n, return the partial name value "Name(S1,S2,...Sm)",
               containing the first m subscripts.

             Otherwise return the complete name value "Name(S1,S2,...,Sn)".


     Example
     SET X=1,Y="ALPHA",Z="BETA"
     SET A=$QNAME(^ABC(X,Y,Z))
     SET B=$QNAME(^ABC(X,Y,Z),0)
     SET C=$QNAME(^ABC(X,Y,Z),1)
     SET D=$QNAME(^ABC(X,Y,Z),2)
     SET E=$QNAME(^ABC(X,Y,Z),3)
     SET F=$QNAME(^ABC(X,Y,Z),4)

     After executing, the values of variables A-F are as follows:
             "A" = ^ABC(1,"ALPHA","BETA")
             "B" = ^ABC
             "C" = ^ABC(1)
             "D" = ^ABC(1,"ALPHA")
             "E" = ^ABC(1,"ALPHA","BETA")
             "F" = ^ABC(1,"ALPHA","BETA")

















                                        - 6 -



          The $QLENGTH function is used to determine the number of subscripts in
     a glvn.


     Syntax:   $QLENGTH(glvn)                        Abbreviation: $QL

     Function: Return number of subscripts in glvn.

        Let $QNAME(glvn) be of the form Name(S1,S2,...,Sn), having "n"
        subscripts.  Then, $QLENGTH(glvn) returns the integer "n".
        If there are no subscripts, then "n" is zero.

     Example
     SET X=1,Y="ALPHA",Z="BETA"
     SET N1=$QLENGTH(^ABC(X,Y,Z))
     SET A=$QNAME(^ABC(X,Y,Z))
     SET N2=$QLENGTH(@A)

     The variables N1 and N2 will both contain the value 3 after execution.

































                                        - 7 -



          The $QSUBSCRIPT function is used to obtain the value of a subscript.

     Syntax:   $QSUBSCRIPT(glvn1,intexpr2)

     Function: Returns the value of a designated subscript from glvn.

        Let $QNAME(glvn) be of the form Name(S1,S2,...,Sn), having "n"
        subscripts, and let "m" be the value of intexpr.
        Then $QSUBSCRIPT(glvn,intexpr) is defined as follows:

             If m is less than 0, it is an error.

             If m=0, return only Name.

             If m>0, m<n, or m=n, return the value of subscript Sm.

             Otherwise m>n and return the empty string.

     Note: The value of a subscript is either a numeric data value or
           a string data value.  The $QSUBSCRIPT function returns the
           actual value of the subscript, and not its external literal
           representation.  The value is not bounded by quotes and may
           have embedded quotes.


     Example
     SET X=1,Y="ALPHA",Z="BETA"
     SET A=$QSUBSCRIPT(^ABC(X,Y,Z),0)
     SET B=$QSUBSCRIPT(^ABC(X,Y,Z),1)
     SET C=$QSUBSCRIPT(^ABC(X,Y,Z),2)
     SET D=$QSUBSCRIPT(^ABC(X,Y,Z),3)
     SET E=$QSUBSCRIPT(^ABC(X,Y,Z),4)

     After executing, the values of variables A-E are as follows:
             "A" = ^ABC
             "B" = 1
             "C" = ALPHA
             "D" = BETA
             "E" = the empty string













                                        - 8 -



          The  $QUERY  function  permits  the  multiple-levels  of  hierarchical
     storage  to be sequentially traversed.  It also allows hierarchical storage
     to be sequentially searched.  There are one, two, and three-argument  forms
     of  the  $QUERY  function.  The one-argument form returns the name value of
     the next node following a glvn in  the  ordering  sequence.   The  two  and
     three-argument forms include truth-value expressions (tvexpr's) for testing
     the name value and/or the data value to control selection  nodes  according
     to a search criteria.


     Syntax:   $QUERY(glvn)          Abbreviation: $QU

     Function: Traverse multiple-levels of a hierarchical file in ordering
               sequence.  Return name value of next node following the
               designated glvn.  If there is none, return the empty
               string.

               If $QUERY returns a non-empty name value string, then the data
               value of the node can be obtained by applying indirection.

               The truth value switch $T will be set to 1 upon return.

               Note: If the name value is a global, then the $QUERY function
                     will update the naked reference.


     Example
     SET X=1,Y="ALPHA",Z="BETA"
     SET ^ABC(X)=1,^ABC(X,Y)=2,^ABC(X,Y,Z)=3
     SET A="^ABC"
     SET B=$QUERY(@A)
     SET C=$QUERY(@B)
     SET D=$QUERY(@C)
     SET E=$QUERY(@D)

     After executing, the values of variables A-E are as follows:
             "A" = ^ABC
             "B" = ^ABC(1)
             "C" = ^ABC(1,"ALPHA")
             "D" = ^ABC(1,"ALPHA","BETA")
             "E" = the empty string











                                        - 9 -



          The second argument of $QUERY is a tvexpr that selects the nodes  that
     are returned by the function.  The tvexpr is evaluated for each node as the
     storage is traversed.  If the tvexpr is true, the name value of the node is
     returned.  If it is false, the node is skipped and the search continues.

          In order to reference the node in the tvexpr during the search, a  new
     special  variable  $Q[UERY] is necessary.  The $Q special variable contains
     the name value of the node that is being tested.

          The third argument of $QUERY is a second tvexpr that causes the search
     to  abnormally terminate.  This tvexpr is evaluated second for each node as
     the storage is transversed.  If this tvexpr is true, the truth value switch
     $T is set to 0 and the name value of the node is returned.

          In order to make the abnormal termination of $QUERY more  useful,  two
     more additional special variables are proposed.  $QC[OUNT] would be a count
     of the number of nodes  traversed  for  the  invocation  of  the  function.
     $QT[IME]  would  be the elapsed time (in seconds) for the invocation of the
     function.  The abnormal termination can then be triggered  by  exceeding  a
     designated number of nodes or a time limit.
































                                       - 10 -




     Syntax:   $QUERY(glvn1,tvexpr2)
               $QUERY(glvn1,tvexpr2,tvexpr3)

     Function: Search multiple-levels of a hierarchical file in ordering
               sequence.  Return the name value of the next node following
               the glvn1 that satisfies the designated search criteria.
               If there is none, return the empty string.

               The second argument, tvexpr2, selects nodes that
               satisfy the search criteria.  If tvexpr2 is true
               when evaluated for a node, the search stops and the
               name value of the node is returned by the function.

               The optional third argument, tvexpr3, controls
               abnormal termination.  If tvexpr3 (evaluated after
               tvexpr2) is true for a node, stop the search, set
               $T to 0, and return.  (The name value of the last node
               is returned by the function.)

               If $QUERY returns a non-empty name value string, then the data
               value of the node can be obtained by applying indirection.

               Normally, the truth value switch $T will be set to 1 upon
               return.  Only in the event of an abnormal termination is it
               set to 0.  A normal termination includes returning an empty
               string.

               Note: If the name value is a global, then the $QUERY function
                     will update the naked reference.






















                                       - 11 -



     Example
     ;Find subscripts containing the letter "A"
     SET X=1,Y="ALPHA",Z="BETA"
     SET ^ABC(X)=1,^ABC(X,Y)=2,^ABC(X,Y,Z)=3

     SET TEST="$PIECE($Q,""("",2,999)[""A"""

     SET A="^ABC"
     SET B=$QUERY(@A,@TEST)
     SET C=$QUERY(@B,@TEST)
     SET D=$QUERY(@C,@TEST)

     After executing, the values of variables A-D are as follows:
             "A" = ^ABC
             "B" = ^ABC(1,"ALPHA")
             "C" = ^ABC(1,"ALPHA","BETA")
             "D" = the empty string




     Example
     ;Find third level nodes that are greater than one
     SET X=1,Y="ALPHA",Z="BETA"
     SET ^ABC(X)=1,^ABC(X,Y)=2,^ABC(X,Y,Z)=3

     SET TEST="($QLENGTH($Q)=3)&(@$Q>1)"

     SET A="^ABC"
     SET B=$QUERY(@A,@TEST)
     SET C=$QUERY(@B,@TEST)

     After executing, the values of variables A-C are as follows:
             "A" = ^ABC
             "B" = ^ABC(1,"ALPHA","BETA")
             "C" = the empty string
















                                       - 12 -



     Example
     ;Do a long search -- abnormally terminate after passing 1000 nodes
     FOR I=1:1:2000 SET ^XYZ(I)="ABCDEF"

     SET TEST="@$Q[""XYZ"""; Find a data value containing "XYZ"
     SET ABORT="$QC>1000"

     SET A="^XYZ"
     SET B=$QUERY(@A,@TEST,@ABORT)
     IF  W !,"Search Succeeded"
     ELSE  W !,"Search Failed"
     WRITE " -- ",B

     After executing, the values of variables A and B are as follows:
             "A" = ^XYZ
             "B" = ^XYZ(1001)
     The message "Search Failed -- ^XYZ(1001)" will be displayed.



































                                       - 13 -



          A MUMPS implementation of the three-argument $QUERY function is  given
     below.   The  first  argument "START" is the starting point for the search.
     The second  argument  "SEARCH"  contains  the  expression  for  the  search
     criteria.   The  third argument "ABORT" contains the expression controlling
     abnormal termination.  In this example the single-argument $QUERY  function
     and the system variables "$Q", "$QC", and "$QT" are presumed to exist.


     QUERY   ;PMK@JHH -- Three-argument $QUERY function

     ENTRY(START,SEARCH,ABORT)

             NEW ITIME,HIT

             SET ITIME=$PIECE($H,",",2); Initial time
             SET $Q=START,HIT=0
             FOR $QC=0:0 SET $Q=$QUERY(@$Q) QUIT:$Q=""  DO TEST QUIT:HIT
             IF HIT'=-1; Set $T for normal/abnormal termination
             QUIT $Q

     TEST    SET $QT=$PIECE($H,",",2)-ITIME
             IF @SEARCH SET HIT=1 QUIT
             IF @ABORT SET HIT=-1 QUIT
             QUIT




























                                       - 14 -



     A New Relational Operator for Comparing Name Values

          The relation [[ is called "descends  from".   If  A  and  B  are  name
     values, then A[[B is true if and only if A is a descendent of B.

          The relation A[[B has  the  same  value  as  $QNAME(@A,$QLENGTH(B))=B.
     Intuitively,  A "contains" the subscripts of B.  The value of A[[B is false
     if A is the empty string and B contains a name value.


     A New Relational Operator for Comparing Subscripts

          The relation ]] is called "sorts after".  A]]B is true if and only  if
     A  follows  B  in  the  subscript  ordering  sequence defined by the $ORDER
     function.

          This  operator  is  very  useful.   The  standard  subscript  ordering
     sequence  is  empty  string first, followed by numeric subscripts, and then
     the string subscripts.  The numeric subscripts are  ordered  in  increasing
     value,  while  the  string  subscripts  are  ordered by the ASCII collating
     sequence.  Currently, in order to determine which  subscript  sorts  first,
     one  must  determine whether the subscripts are numeric or string, and then
     use the numeric (">" or "<") operators or string collates operator ("]").





























                                       - 15 -



     Examples

     The first example is subtree copy.

     "X" contains the source reference in the form "^GLO(SS1,SS2,...,SSn,"
     "Y" contains the destination reference in the same format

     ENTRY   SET A=$E(X,1,$L(X)-1)_")",A=$QNAME(@A)
             SET C=$E(Y,1,$L(Y)-1)_")",C=$QNAME(@C)
             SET C=$E(C,1,$L(C)-1)
             SET A0=A,L=$LENGTH(A)

             FOR I=0:0 SET A=$QUERY(@A) QUIT:A'[[A0  SET B=C_$E(A,L,999),@B=@A

     EXIT    KILL A,A0,B,C,I,L QUIT

          The variable "A" runs through the source subtree nodes.  The  variable
     "A0"  is  used  for  descendency checking.  It contains the original source
     reference's name value.  All of the subtree nodes' name values  begin  with
     this  value.   The  remainder  of  the name value contains subscripts to be
     copied to  the  destination.   The  variable  "L"  is  a  pointer  to  this
     substring.

          The variable "C" contains the original  destination  reference's  name
     value  in the "^GLO(SS1,SS2,...,SSn," format.  Each destination node's name
     value is formed by concatenating this value and the  additional  descendent
     subscripts.   The  destination  node's name value is stored in the variable
     "B".  The copy is performed by simple indirection.
























                                       - 16 -




          The second example is to print out all of a global from node "X"
     to node "Y".

     "X" and "Y" contain arbitrary nodes of a global in glvn format
     Note:  "X" and "Y" can specify different numbers of subscripts

     ENTRY   SET A=$QNAME(@X),B=$QNAME(@Y),N=$QLENGTH(@B)
             DO TEST IF 'OK W !!,"The first is after the second!" GO TO EXIT
             IF $D(@A)#2 DO PRINT
             FOR I=0:0 SET A=$QUERY(@A) DO TEST QUIT:'OK  D PRINT
     EXIT    KILL A,B,I,N QUIT

     PRINT W !,A,"=",@A QUIT

     TEST    SET OK=1
             FOR I=1:1:$QLENGTH(@A) Q:I>N  IF $QS(@A,I)]]$QS(@B,I) S OK=0 Q
             QUIT




     CONCLUSIONS

------------------------------

End of std-mumps Digest
******************************
-- 
Hokey           ..ihnp4!plus5!hokey
		  314-725-9492