Hokey (The Moderator) <hokey@plus5.uucp> (03/31/85)
std-mumps Digest Sat, 30 Mar 85 Volume 1 : Issue 12 Today's Topics: New $QUERY paper ---------------------------------------------------------------------- Date: Wed, 27 Mar 85 17:13:10 est From: maryland!ihnp4!seismo!osiris!ocsplx!pete Subject: New $QUERY paper To: maryland!osiris!aplvax!umcp-cs!seismo!ihnp4!plus5!std-mumps Hokey, Here is the latest copy of the $QUERY proposal. Pete Kuzmak [I had to edit out the boldfacing and underlining - HMS] $QUERY MUMPS Function Proposal INTRODUCTION MUMPS combines the power of a simple language with the flexibility of a hierarchical database, and is successful for a large variety of applications. It has been described as a "linguistically integrated database system". Yet, for all its merits, there are some rather simple database tasks that are not easily performed in MUMPS. Two examples, copying a subtree and ad-hoc data retrieval, illustrate this point. The following example shows how hard it is to write code to copy a node and its descendents. Example -- Copy a subtree (Modified from File Manager routine ^%RCR) "X" contains the source reference in the form "^GLO(SS1,SS2,...,SSn," "Y" contains the destination reference in the same format ENTRY SET Z=1,A="",C(0)=0,B="" GO NEXT UP SET Z=Z-1,@("B="_$PIECE(A,",",Z+C(Z-1),Z+C(Z))) SET A=$PIECE(A,",",1,Z-1+C(Z-1))_$EXTRACT(",",Z>1) NEXT SET @("B=$ORDER("_X_A_"B))"),C(Z)=C(Z-1) IF B="" GO EXIT:Z=1,UP IF @("$DEFINE("_X_A_"B))#10=1") SET @(Y_A_"B)="_X_A_"B)") IF @("$DEFINE("_X_A_"B))<9") GO NEXT ;Descend the tree -- Special processing for non-numeric subscripts IF +B'=B DO COMMA:B[",",QUOTE:B["""" SET B=""""_B_"""" SET A=A_B_",",Z=Z+1,B="" GO NEXT COMMA ;Count number of embedded commas for $PIECEing of subscripts FOR C=0:0 SET C=$FIND(B,",",C) QUIT:C SET C(Z)=C(Z)+1 QUIT QUOTE ;Replace each single quote in a subscript with a double quote F C=0:0 S C=$F(B,"""",C) Q:'C S B=$E(B,1,C-1)_""""_$E(B,C,999),C=C+1 Q QUIT EXIT KILL A,B,C,Z QUIT - 2 - The above algorithm requires concatenating and $PIECEing apart of subscripts and a considerable amount of indirection. Special code is needed for non-numeric subscripts to add the enclosing quotes and to handle embedded commas and quotes. The algorithm must keep track of the level, and ascend and descent accordingly. The ad-hoc data retrieval task consists of traversing a global and finding specific nodes that satisfy an arbitrary search criteria. The code for this task is quite complicated. Not only does the tree have to be transversed, but all of the nodes have to be checked against the search criteria. These examples illustrate several reasons why some database tasks are hard in MUMPS. Both subtree copy and ad-hoc data retrieval are essentially sequential tasks. The hierarchical structure of MUMPS actually "gets in the way" when trying to perform tasks that are not hierarchical in nature. MUMPS lacks facilities for manipulating subscripts. As a result, subscripts must be treated as character strings that are concatenated together. Special care then must be given to embedded quotes, commas, and differentiating numeric from non-numeric subscripts. Finally, MUMPS can only access only a single node at a time. It has no facilities for handling multi-node operations (except to kill subtrees). All of these things make dealing with some tasks more difficult than necessary. A new set of capabilities for MUMPS can greatly simplify the programming while at the same time improving performance. What is needed is a new way to look at a hierarchical database, new facilities for traversing the database and manipulating subscripts, and functions that operate on more than one node a time. - 3 - Name Value Strings A hierarchical database can be viewed as a flat sequential file containing two items per record, a name value and a data value. The name values and data values together form a relationship. A name value is a global or local variable name (glvn) in which all the subscript values are represented by either numeric or string literals. Subscripts values which are non-numeric are represented by string literals (i.e., they are bounded by quotes and contain pairs of adjacent quotes for each embedded quote). A variable or an expression may be used for a subscript in a glvn. This is fully evaluated to a numeric or a string literal to obtain a name value. A naked reference glvn has its subscripts similarly evaluated to numeric or string literals and then is expanded to obtain a global name value. Every node in the hierarchical database that has data can be viewed as a name value/data value pair in the relationship. The name values are unique, they form an ordered set, and each name value can have one and only one data value. The data value of a node can always be referenced via its corresponding "name value" by indirection. Example SET X=1,Y="ALPHA",Z="BETA" SET ^ABC(X,Y,Z)="TEST" The name value of this node is the string ^ABC(1,"ALPHA","BETA"). The data value is "TEST". It is possible to define a "name value string" in MUMPS as a string which has the form of a name value. New functions and operators can then be defined to allow MUMPS to work with name values. The ability to sequentially traverse multi-level global or local arrays could then be provided. The new functions would allow a search criteria to be used in accessing the database. The new functions and operators would return, compare, or manipulate name value strings, making it as easy to work with the hierarchical keys as with the data. Such capabilities would provide new ways to access the database, making the global copy tasks and movement of subtrees trivial, and greatly simplifying ad-hoc data retrieval. - 4 - New Functions and Operators The new $QNAME function would return the "name value" of a glvn. Syntax: $QNAME(glvn) Abbreviation: $QN Function: Return the "name value" of the glvn. Example SET X=1,Y="ALPHA",Z="BETA" SET A=$QNAME(^ABC(X,Y,Z)) SET ^ABC(X,Y,Z)="TEST" SET B=$QNAME(^(Z)) The variables "A" and "B" both have the same value, the name value ^ABC(1,"ALPHA","BETA"). The $QNAME function would also have a two-argument form. The second argument, an integer expression (intexpr) would limit the number of subscripts of the "name value" returned by $QNAME. If "n" were the value of the limit, only the first n-subscripts of the name value would be returned. If the name value contained more subscripts than the limit, they would be ignored. This form of the $QNAME function would be useful in checking descendency in sub-trees. - 5 - Syntax: $QNAME(glvn1,intexpr2) Function: Return "name value" containing a numerically limited number of subscripts. Let $QNAME(glvn) be of the form Name(S1,S2,...Sn), having "n" subscripts, and let "m" be the value of intexpr. Then $QNAME(glvn,intexpr) is defined as follows: If m is less than 0, it is an error. If m=0, return only "Name". If m>0, m<n, return the partial name value "Name(S1,S2,...Sm)", containing the first m subscripts. Otherwise return the complete name value "Name(S1,S2,...,Sn)". Example SET X=1,Y="ALPHA",Z="BETA" SET A=$QNAME(^ABC(X,Y,Z)) SET B=$QNAME(^ABC(X,Y,Z),0) SET C=$QNAME(^ABC(X,Y,Z),1) SET D=$QNAME(^ABC(X,Y,Z),2) SET E=$QNAME(^ABC(X,Y,Z),3) SET F=$QNAME(^ABC(X,Y,Z),4) After executing, the values of variables A-F are as follows: "A" = ^ABC(1,"ALPHA","BETA") "B" = ^ABC "C" = ^ABC(1) "D" = ^ABC(1,"ALPHA") "E" = ^ABC(1,"ALPHA","BETA") "F" = ^ABC(1,"ALPHA","BETA") - 6 - The $QLENGTH function is used to determine the number of subscripts in a glvn. Syntax: $QLENGTH(glvn) Abbreviation: $QL Function: Return number of subscripts in glvn. Let $QNAME(glvn) be of the form Name(S1,S2,...,Sn), having "n" subscripts. Then, $QLENGTH(glvn) returns the integer "n". If there are no subscripts, then "n" is zero. Example SET X=1,Y="ALPHA",Z="BETA" SET N1=$QLENGTH(^ABC(X,Y,Z)) SET A=$QNAME(^ABC(X,Y,Z)) SET N2=$QLENGTH(@A) The variables N1 and N2 will both contain the value 3 after execution. - 7 - The $QSUBSCRIPT function is used to obtain the value of a subscript. Syntax: $QSUBSCRIPT(glvn1,intexpr2) Function: Returns the value of a designated subscript from glvn. Let $QNAME(glvn) be of the form Name(S1,S2,...,Sn), having "n" subscripts, and let "m" be the value of intexpr. Then $QSUBSCRIPT(glvn,intexpr) is defined as follows: If m is less than 0, it is an error. If m=0, return only Name. If m>0, m<n, or m=n, return the value of subscript Sm. Otherwise m>n and return the empty string. Note: The value of a subscript is either a numeric data value or a string data value. The $QSUBSCRIPT function returns the actual value of the subscript, and not its external literal representation. The value is not bounded by quotes and may have embedded quotes. Example SET X=1,Y="ALPHA",Z="BETA" SET A=$QSUBSCRIPT(^ABC(X,Y,Z),0) SET B=$QSUBSCRIPT(^ABC(X,Y,Z),1) SET C=$QSUBSCRIPT(^ABC(X,Y,Z),2) SET D=$QSUBSCRIPT(^ABC(X,Y,Z),3) SET E=$QSUBSCRIPT(^ABC(X,Y,Z),4) After executing, the values of variables A-E are as follows: "A" = ^ABC "B" = 1 "C" = ALPHA "D" = BETA "E" = the empty string - 8 - The $QUERY function permits the multiple-levels of hierarchical storage to be sequentially traversed. It also allows hierarchical storage to be sequentially searched. There are one, two, and three-argument forms of the $QUERY function. The one-argument form returns the name value of the next node following a glvn in the ordering sequence. The two and three-argument forms include truth-value expressions (tvexpr's) for testing the name value and/or the data value to control selection nodes according to a search criteria. Syntax: $QUERY(glvn) Abbreviation: $QU Function: Traverse multiple-levels of a hierarchical file in ordering sequence. Return name value of next node following the designated glvn. If there is none, return the empty string. If $QUERY returns a non-empty name value string, then the data value of the node can be obtained by applying indirection. The truth value switch $T will be set to 1 upon return. Note: If the name value is a global, then the $QUERY function will update the naked reference. Example SET X=1,Y="ALPHA",Z="BETA" SET ^ABC(X)=1,^ABC(X,Y)=2,^ABC(X,Y,Z)=3 SET A="^ABC" SET B=$QUERY(@A) SET C=$QUERY(@B) SET D=$QUERY(@C) SET E=$QUERY(@D) After executing, the values of variables A-E are as follows: "A" = ^ABC "B" = ^ABC(1) "C" = ^ABC(1,"ALPHA") "D" = ^ABC(1,"ALPHA","BETA") "E" = the empty string - 9 - The second argument of $QUERY is a tvexpr that selects the nodes that are returned by the function. The tvexpr is evaluated for each node as the storage is traversed. If the tvexpr is true, the name value of the node is returned. If it is false, the node is skipped and the search continues. In order to reference the node in the tvexpr during the search, a new special variable $Q[UERY] is necessary. The $Q special variable contains the name value of the node that is being tested. The third argument of $QUERY is a second tvexpr that causes the search to abnormally terminate. This tvexpr is evaluated second for each node as the storage is transversed. If this tvexpr is true, the truth value switch $T is set to 0 and the name value of the node is returned. In order to make the abnormal termination of $QUERY more useful, two more additional special variables are proposed. $QC[OUNT] would be a count of the number of nodes traversed for the invocation of the function. $QT[IME] would be the elapsed time (in seconds) for the invocation of the function. The abnormal termination can then be triggered by exceeding a designated number of nodes or a time limit. - 10 - Syntax: $QUERY(glvn1,tvexpr2) $QUERY(glvn1,tvexpr2,tvexpr3) Function: Search multiple-levels of a hierarchical file in ordering sequence. Return the name value of the next node following the glvn1 that satisfies the designated search criteria. If there is none, return the empty string. The second argument, tvexpr2, selects nodes that satisfy the search criteria. If tvexpr2 is true when evaluated for a node, the search stops and the name value of the node is returned by the function. The optional third argument, tvexpr3, controls abnormal termination. If tvexpr3 (evaluated after tvexpr2) is true for a node, stop the search, set $T to 0, and return. (The name value of the last node is returned by the function.) If $QUERY returns a non-empty name value string, then the data value of the node can be obtained by applying indirection. Normally, the truth value switch $T will be set to 1 upon return. Only in the event of an abnormal termination is it set to 0. A normal termination includes returning an empty string. Note: If the name value is a global, then the $QUERY function will update the naked reference. - 11 - Example ;Find subscripts containing the letter "A" SET X=1,Y="ALPHA",Z="BETA" SET ^ABC(X)=1,^ABC(X,Y)=2,^ABC(X,Y,Z)=3 SET TEST="$PIECE($Q,""("",2,999)[""A""" SET A="^ABC" SET B=$QUERY(@A,@TEST) SET C=$QUERY(@B,@TEST) SET D=$QUERY(@C,@TEST) After executing, the values of variables A-D are as follows: "A" = ^ABC "B" = ^ABC(1,"ALPHA") "C" = ^ABC(1,"ALPHA","BETA") "D" = the empty string Example ;Find third level nodes that are greater than one SET X=1,Y="ALPHA",Z="BETA" SET ^ABC(X)=1,^ABC(X,Y)=2,^ABC(X,Y,Z)=3 SET TEST="($QLENGTH($Q)=3)&(@$Q>1)" SET A="^ABC" SET B=$QUERY(@A,@TEST) SET C=$QUERY(@B,@TEST) After executing, the values of variables A-C are as follows: "A" = ^ABC "B" = ^ABC(1,"ALPHA","BETA") "C" = the empty string - 12 - Example ;Do a long search -- abnormally terminate after passing 1000 nodes FOR I=1:1:2000 SET ^XYZ(I)="ABCDEF" SET TEST="@$Q[""XYZ"""; Find a data value containing "XYZ" SET ABORT="$QC>1000" SET A="^XYZ" SET B=$QUERY(@A,@TEST,@ABORT) IF W !,"Search Succeeded" ELSE W !,"Search Failed" WRITE " -- ",B After executing, the values of variables A and B are as follows: "A" = ^XYZ "B" = ^XYZ(1001) The message "Search Failed -- ^XYZ(1001)" will be displayed. - 13 - A MUMPS implementation of the three-argument $QUERY function is given below. The first argument "START" is the starting point for the search. The second argument "SEARCH" contains the expression for the search criteria. The third argument "ABORT" contains the expression controlling abnormal termination. In this example the single-argument $QUERY function and the system variables "$Q", "$QC", and "$QT" are presumed to exist. QUERY ;PMK@JHH -- Three-argument $QUERY function ENTRY(START,SEARCH,ABORT) NEW ITIME,HIT SET ITIME=$PIECE($H,",",2); Initial time SET $Q=START,HIT=0 FOR $QC=0:0 SET $Q=$QUERY(@$Q) QUIT:$Q="" DO TEST QUIT:HIT IF HIT'=-1; Set $T for normal/abnormal termination QUIT $Q TEST SET $QT=$PIECE($H,",",2)-ITIME IF @SEARCH SET HIT=1 QUIT IF @ABORT SET HIT=-1 QUIT QUIT - 14 - A New Relational Operator for Comparing Name Values The relation [[ is called "descends from". If A and B are name values, then A[[B is true if and only if A is a descendent of B. The relation A[[B has the same value as $QNAME(@A,$QLENGTH(B))=B. Intuitively, A "contains" the subscripts of B. The value of A[[B is false if A is the empty string and B contains a name value. A New Relational Operator for Comparing Subscripts The relation ]] is called "sorts after". A]]B is true if and only if A follows B in the subscript ordering sequence defined by the $ORDER function. This operator is very useful. The standard subscript ordering sequence is empty string first, followed by numeric subscripts, and then the string subscripts. The numeric subscripts are ordered in increasing value, while the string subscripts are ordered by the ASCII collating sequence. Currently, in order to determine which subscript sorts first, one must determine whether the subscripts are numeric or string, and then use the numeric (">" or "<") operators or string collates operator ("]"). - 15 - Examples The first example is subtree copy. "X" contains the source reference in the form "^GLO(SS1,SS2,...,SSn," "Y" contains the destination reference in the same format ENTRY SET A=$E(X,1,$L(X)-1)_")",A=$QNAME(@A) SET C=$E(Y,1,$L(Y)-1)_")",C=$QNAME(@C) SET C=$E(C,1,$L(C)-1) SET A0=A,L=$LENGTH(A) FOR I=0:0 SET A=$QUERY(@A) QUIT:A'[[A0 SET B=C_$E(A,L,999),@B=@A EXIT KILL A,A0,B,C,I,L QUIT The variable "A" runs through the source subtree nodes. The variable "A0" is used for descendency checking. It contains the original source reference's name value. All of the subtree nodes' name values begin with this value. The remainder of the name value contains subscripts to be copied to the destination. The variable "L" is a pointer to this substring. The variable "C" contains the original destination reference's name value in the "^GLO(SS1,SS2,...,SSn," format. Each destination node's name value is formed by concatenating this value and the additional descendent subscripts. The destination node's name value is stored in the variable "B". The copy is performed by simple indirection. - 16 - The second example is to print out all of a global from node "X" to node "Y". "X" and "Y" contain arbitrary nodes of a global in glvn format Note: "X" and "Y" can specify different numbers of subscripts ENTRY SET A=$QNAME(@X),B=$QNAME(@Y),N=$QLENGTH(@B) DO TEST IF 'OK W !!,"The first is after the second!" GO TO EXIT IF $D(@A)#2 DO PRINT FOR I=0:0 SET A=$QUERY(@A) DO TEST QUIT:'OK D PRINT EXIT KILL A,B,I,N QUIT PRINT W !,A,"=",@A QUIT TEST SET OK=1 FOR I=1:1:$QLENGTH(@A) Q:I>N IF $QS(@A,I)]]$QS(@B,I) S OK=0 Q QUIT CONCLUSIONS ------------------------------ End of std-mumps Digest ****************************** -- Hokey ..ihnp4!plus5!hokey 314-725-9492