Hokey (The Moderator) <hokey@plus5.uucp> (04/15/85)
std-mumps Digest Sun, 14 Apr 85 Volume 1 : Issue 13 Today's Topics: Administrivia: $Query proposal Block Mode Terminals and MUMPS Revised $QUERY Proposal ---------------------------------------------------------------------- Date: 14 Apr 85 17:23:46 CST (Sun) From: hokey@plus5.uucp Subject: Administrivia: $Query proposal To: std-mumps Rather than format the $query proposal, I left it unformatted. Hokey ------------------------------ Date: 7 Apr 85 22:10:53 CST (Sun) From: hokey@plus5.uucp Subject: Block Mode Terminals and MUMPS To: std-mumps Does anybody have any ideas on what it would take to make MUMPS work well with block mode devices? I suspect several areas would have to be changed, and furthermore, that many existing software packages would not work really well on block mode terminals. ------------------------------ Date: Mon, 8 Apr 85 17:07:06 est From: maryland!ihnp4!seismo!osiris!ocsplx!pete Subject: Revised $QUERY Proposal To: std-mumps@plus5.uucp Hokey, Here is the revised $QUERY proposal. It inclued source for nroff (use "nroff -mm -rN2 ..."). The major revision is in the area of the two and three argument $QUERY function. This function now has "qexpr" arguments which evaluate to "tvexpr's" for testing the nodes. Please make yourself a copy of the paper and send it along. Pete Kuzmak (301) 955-6185 .SA 1 .ce 7 .B Proposed MUMPS Functions for Extending Database Capabilities .R .sp 2 Peter M. Kuzmak The Johns Hopkins Hospital and Kevin O'Gorman Independent Consultant .sp 5 ABSTRACT .sp MUMPS is a simple language with a built-in hierarchical database access method. Although it is successful for a large variety of applications, there are some database tasks for which MUMPS is not well suited. Two examples are sequentially accessing hierarchically stored data and performing ad-hoc data retrievals. .P 1 This paper describes several new proposed functions that would improve the ability of MUMPS to handle complex database tasks. In particular, four new capabilities would be provided: .AL 1 .LI Sequentially traverse hierarchical storage .LI Access the name and data values of a storage reference .LI Operate on the name of storage references and subscripts .LI Perform ad-hoc retrievals .LE .bp \fBINTRODUCTION\fR .P 1 MUMPS combines the power of a simple language with the flexibility of a hierarchical database, and is successful for a large variety of applications. It has been described as a "linguistically integrated database system". Yet, for all its merits, there are some rather simple database tasks that are not easily performed in MUMPS. Two examples, copying a subtree and ad-hoc data retrieval, illustrate this point. The following example shows how hard it is to write code to copy a node and its descendents. .DS \fIExample\fR -- Copy a subtree (Modified from File Manager routine ^%RCR) "X" contains the source reference in the form "^GLO(SS1,SS2,...,SSn," "Y" contains the destination reference in the same format ENTRY SET Z=1,A="",C(0)=0,B="" GO NEXT UP SET Z=Z-1,@("B="_$PIECE(A,",",Z+C(Z-1),Z+C(Z))) SET A=$PIECE(A,",",1,Z-1+C(Z-1))_$EXTRACT(",",Z>1) NEXT SET @("B=$ORDER("_X_A_"B))"),C(Z)=C(Z-1) IF B="" GO EXIT:Z=1,UP IF @("$DEFINE("_X_A_"B))#10=1") SET @(Y_A_"B)="_X_A_"B)") IF @("$DEFINE("_X_A_"B))<9") GO NEXT ;Descend the tree -- Special processing for non-numeric subscripts IF +B'=B DO COMMA:B[",",QUOTE:B["""" SET B=""""_B_"""" SET A=A_B_",",Z=Z+1,B="" GO NEXT COMMA ;Count number of embedded commas for $PIECEing of subscripts FOR C=0:0 SET C=$FIND(B,",",C) QUIT:C SET C(Z)=C(Z)+1 QUIT QUOTE ;Replace each single quote in a subscript with a double quote F C=0:0 S C=$F(B,"""",C) Q:'C S B=$E(B,1,C-1)_""""_$E(B,C,999),C=C+1 Q QUIT EXIT KILL A,B,C,Z QUIT .DE .P 1 The above algorithm requires concatenating and $PIECEing apart of subscripts and a considerable amount of indirection. Special code is needed for non-numeric subscripts to add the enclosing quotes and to handle embedded commas and quotes. The algorithm must keep track of the level, and ascend and descent accordingly. .P 1 The ad-hoc data retrieval task consists of traversing a global and finding specific nodes that satisfy an arbitrary search criteria. The code for this task is quite complicated. Not only does the tree have to be transversed, but all of the nodes have to be checked against the search criteria. .P 1 These examples illustrate several reasons why some database tasks are hard in MUMPS. Both subtree copy and ad-hoc data retrieval are essentially sequential tasks. The hierarchical structure of MUMPS actually "gets in the way" when trying to perform tasks that are not hierarchical in nature. MUMPS lacks facilities for manipulating subscripts. As a result, subscripts must be treated as character strings that are concatenated together. Special care then must be given to embedded quotes, commas, and differentiating numeric from non-numeric subscripts. Finally, MUMPS can access only a single node at a time. It has no facilities for handling multi-node operations (except to kill subtrees). .P 1 All of these things make dealing with some tasks more difficult than necessary. A new set of capabilities for MUMPS can greatly simplify the programming while at the same time improving performance. What is needed is a new way to look at a hierarchical database, new facilities for traversing the database and manipulating subscripts, and functions that operate on more than one node a time. .bp \fBName Value Strings\fR .P 1 A hierarchical database can be viewed as a flat sequential file containing two items per record, a name value and a data value. The name values and data values together form a relationship. .P 1 A \fBname value\fR is a global or local variable name (\fIglvn\fR) in which all the subscript values are represented by either \fInumeric data values\fR or \fIstring literals\fR. Subscripts that are numeric are represented by their unique numeric data values. Subscripts values which are non-numeric are represented by string literals (i.e., they are bounded by quotes and contain pairs of adjacent quotes for each embedded quote). .P 1 A variable or an expression may be used for a subscript in a \fIglvn\fR. This is fully evaluated to a numeric data value or a string literal to obtain a name value. A naked reference \fIglvn\fR has its subscripts similarly evaluated to numeric or string literals and then is expanded to obtain a global name value. .P 1 Every node in the hierarchical database that has data can be viewed as a name value/data value pair in the relationship. The name values are unique, they form an ordered set, and each name value can have one and only one data value. The data value of a node can always be referenced via its corresponding "name value" by indirection. .DS .sp 1 .ul 1 Example SET X=1,Y="ALPHA",Z="BETA" SET ^ABC(X,Y,Z)="TEST" The name value of this node is the string ^ABC(1,"ALPHA","BETA"). The data value is "TEST". .DE .sp 1 .P 1 It is possible to define a "name value string" in MUMPS as a string which has the form of a name value. New functions and operators can then be defined to allow MUMPS to work with name values. The ability to sequentially traverse multi-level global or local arrays could then be provided. The new functions would allow a search criteria to be used in accessing the database. The new functions and operators would return, compare, or manipulate name value strings, making it as easy to work with the hierarchical keys as with the data. Such capabilities would provide new ways to access the database, making the global copy tasks and movement of subtrees trivial, and greatly simplifying ad-hoc data retrieval. .sp 3 \fBNew Functions and Operators\fR .P 1 The new $QNAME function would return the "name value" of a \fIglvn\fR. .DS Syntax: $QNAME(\fIglvn\fR) Abbreviation: $QN Function: Return the "name value" of the \fIglvn\fR. Note: The name value is unique and is defined for every \fIglvn\fR. .sp 1 .ul 1 Example SET X=1,Y="ALPHA",Z="BETA" SET A=$QNAME(^ABC(X,Y,Z)) SET ^ABC(X,Y,Z)="TEST" SET B=$QNAME(^(Z)) .DE The variables A and B both have the same value, the name value ^ABC(1,"ALPHA","BETA"). .P 1 The $QNAME function would also have a two-argument form. The second argument, an integer expression (\fIintexpr\fR) would limit the number of subscripts of the "name value" returned by $QNAME. If n were the value of the limit, only the first n subscripts of the name value would be returned. If the name value contained more subscripts than the limit, they would be ignored. This form of the $QNAME function would be useful in checking descendency in sub-trees. .DS Syntax: $QNAME(\fIglvn1\fR,\fIintexpr2\fR) Function: Return "name value" containing a numerically limited number of subscripts. Let $QNAME(\fIglvn1\fR) be of the form Name(s\fI1\fR,s\fI2\fR,...,s\fIn\fR), having n subscripts, and let m be the value of \fIintexpr2\fR. Then $QNAME(\fIglvn1\fR,\fIintexpr2\fR) is defined as follows: If m is less than 0, it is an error. If m=0, return only "Name". If m>0, m<n, return the partial name value "Name(s\fI1\fR,s\fI2\fR,...,s\fIm\fR)", containing the first m subscripts. Otherwise return the complete name value "Name(s\fI1\fR,s\fI2\fR,...,s\fIn\fR)". .DE .DS .ul 1 Example SET X=1,Y="ALPHA",Z="BETA" SET A=$QNAME(^ABC(X,Y,Z)) SET B=$QNAME(^ABC(X,Y,Z),0) SET C=$QNAME(^ABC(X,Y,Z),1) SET D=$QNAME(^ABC(X,Y,Z),2) SET E=$QNAME(^ABC(X,Y,Z),3) SET F=$QNAME(^ABC(X,Y,Z),4) After executing, the values of variables A-F are as follows: A = ^ABC(1,"ALPHA","BETA") B = ^ABC C = ^ABC(1) D = ^ABC(1,"ALPHA") E = ^ABC(1,"ALPHA","BETA") F = ^ABC(1,"ALPHA","BETA") .DE .bp .P 1 The $QLENGTH function is used to determine the number of subscripts in a \fIglvn\fR. .sp 1 .DS Syntax: $QLENGTH(\fIglvn\fR) Abbreviation: $QL Function: Return number of subscripts in \fIglvn\fR. Let $QNAME(\fIglvn\fR) be of the form Name(s\fI1\fR,s\fI2\fR,...,s\fIn\fR), having n subscripts. If there are no subscripts, then consider n to be zero. Then, $QLENGTH(\fIglvn\fR) returns the integer n. .ul 1 Example SET X=1,Y="ALPHA",Z="BETA" SET N1=$QLENGTH(^ABC(X,Y,Z)) SET A=$QNAME(^ABC(X,Y,Z)) SET N2=$QLENGTH(@A) The variables N1 and N2 will both contain the value 3 after execution. .DE .bp .P 1 The $QSUBSCRIPT function is used to obtain the value of a subscript. .DS Syntax: $QSUBSCRIPT(\fIglvn1\fR,\fIintexpr2\fR) Function: Returns the value of a designated subscript from \fIglvn1\fR. Note: This functions may also be used to return the name of the global or the local array. Let $QNAME(\fIglvn\fR) be of the form Name(s\fI1\fR,s\fI2\fR,...,s\fIn\fR), having n subscripts, and let m be the value of \fIintexpr2\fR. Then $QSUBSCRIPT(\fIglvn1\fR,\fIintexpr2\fR) is defined as follows: If m is less than 0, it is an error. If m=0, return only Name. If m>0, m<n, or m=n, return the value of subscript Sm. Otherwise m>n and return the empty string. Note: The value of a subscript is either a numeric data value or a string data value. The $QSUBSCRIPT function returns the actual value of the subscript, and not its external literal representation. .DE .DS .ul 1 Example SET X=1,Y="ALPHA",Z="BETA" SET A=$QSUBSCRIPT(^ABC(X,Y,Z),0) SET B=$QSUBSCRIPT(^ABC(X,Y,Z),1) SET C=$QSUBSCRIPT(^ABC(X,Y,Z),2) SET D=$QSUBSCRIPT(^ABC(X,Y,Z),3) SET E=$QSUBSCRIPT(^ABC(X,Y,Z),4) After executing, the values of variables A-E are as follows: A = ^ABC B = 1 C = ALPHA D = BETA E = the empty string .DE .bp .P 1 The $QUERY function permits the multiple-levels of hierarchical storage to be sequentially traversed. It also allows hierarchical storage to be sequentially searched. There are one, two, and three-argument forms of the $QUERY function. The one-argument form returns the name value of the next node following a \fIglvn\fR in the ordering sequence. .sp .DS Syntax: $QUERY(\fIglvn\fR) Abbreviation: $QU Function: Traverse multiple-levels of a hierarchical file in ordering sequence. Return name value of next node following the designated \fIglvn\fR. If there is none, return the empty string. If $QUERY returns a non-empty name value string, then the data value of the node can be obtained by applying indirection. Note: If the name value is a global, then the $QUERY function will update the naked reference. .ul 1 Example SET X=1,Y="ALPHA",Z="BETA" KILL ^ABC SET ^ABC(X)=1,^ABC(X,Y)=2,^ABC(X,Y,Z)=3 SET A="^ABC" SET B=$QUERY(@A) SET C=$QUERY(@B) SET D=$QUERY(@C) SET E=$QUERY(@D) .DE .DS After executing, the values of variables A-E are as follows: A = ^ABC B = ^ABC(1) C = ^ABC(1,"ALPHA") D = ^ABC(1,"ALPHA","BETA") E = the empty string .DE .sp 2 .P 1 The two and three-argument forms of the $QUERY function provide ad-hoc data retrieval capabilities. The second and third arguments contain "truth-value expressions" (\fItvexpr\fR's) which test the name value and/or the data value of each node and control selection. .P 1 The second and third arguments evaluate to "truth-value expressions". Syntactically, the second and third arguments are defined as "query expressions which evaluate to truth-value expressions". .sp 1 .DS The "query expression" \fIqexpr\fR is defined as an expression that evaluates to a "truth-value expression". \fIqexpr ::= expr V tvexpr\fR A \fIqexpr\fR argument of $QUERY is evaluated for each node. It is used to control selection of nodes according to a search criteria. .DE .sp 1 .P 1 The second argument of $QUERY is a \fIqexpr\fR that selects the nodes that are returned by the function. The \fIqexpr\fR is evaluated for each node as the storage is traversed. If the \fIqexpr\fR is true, the name value of the node is returned. If it is false, the node is skipped and the search continues. .P 1 In order to reference the node in the \fIqexpr\fR during the search, a new special variable \fB$Q[UERY]\fR is necessary. The $Q special variable contains the name value of the node that is being tested. .P 1 The third argument of $QUERY is a second \fIqexpr\fR that causes the search to abnormally terminate. This \fIqexpr\fR is evaluated second for each node as the storage is transversed. If this \fIqexpr\fR is true, the truth value switch $T is set to 0 and the name value of the node is returned. .P 1 In order to make the abnormal termination of $QUERY more useful, two more additional special variables are proposed. \fB$QC[OUNT]\fR would be a count of the number of nodes traversed for the invocation of the function. \fB$QT[IME]\fR would be the elapsed time (in seconds) for the invocation of the function. The abnormal termination can then be triggered by exceeding a designated number of nodes or a time limit. .DS Syntax: $QUERY(\fIglvn1\fR,\fIqexpr2\fR) $QUERY(\fIglvn1\fR,\fIqexpr2\fR,\fIqexpr3\fR) Function: Search multiple-levels of a hierarchical file in ordering sequence. Return the name value of the next node following the \fIglvn1\fR that satisfies the designated search criteria. If there is none, return the empty string. The second argument, \fIqexpr2\fR, selects nodes that satisfy the search criteria. If \fIqexpr2\fR is true when evaluated for a node, the search stops and the name value of the node is returned by the function. The optional third argument, \fIqexpr3\fR, is only evaluated when the second argument is false and controls abnormal termination. If \fIqexpr3\fR is true for a node, stop the search and return the empty string. (The name value of the last node remains in $Q.) If $QUERY returns a non-empty name value string, then the data value of the node can be obtained by applying indirection. Note: If the name value is a global, then the $QUERY function will update the naked reference. .DE .bp .DS .ul 1 Example ;Find subscripts containing the letter "A" SET X=1,Y="ALPHA",Z="BETA" KILL ^ABC SET ^ABC(X)=1,^ABC(X,Y)=2,^ABC(X,Y,Z)=3 SET TEST="$PIECE($Q,""("",2,999)[""A""" SET A="^ABC" SET B=$QUERY(@A,TEST) SET C=$QUERY(@B,TEST) SET D=$QUERY(@C,TEST) .DE .DS After executing, the values of variables A-D are as follows: A = ^ABC B = ^ABC(1,"ALPHA") C = ^ABC(1,"ALPHA","BETA") D = the empty string .DE .sp 2 .DS .ul 1 Example ;Find third level nodes that are greater than one SET X=1,Y="ALPHA",Z="BETA" KILL ^ABC SET ^ABC(X)=1,^ABC(X,Y)=2,^ABC(X,Y,Z)=3 SET TEST="($QLENGTH($Q)=3)&(@$Q>1)" SET A="^ABC" SET B=$QUERY(@A,TEST) SET C=$QUERY(@B,TEST) .DE .DS After executing, the values of variables A-C are as follows: A = ^ABC B = ^ABC(1,"ALPHA","BETA") C = the empty string .DE .DS .ul 1 Example ;Do a long search -- abnormally terminate after passing 1000 nodes FOR I=1:1:2000 SET ^XYZ(I)="ABCDEF" SET TEST="@$Q[""XYZ"""; Find a data value containing "XYZ" SET ABORT="$QC>1000" SET A="^XYZ" SET B=$QUERY(@A,@TEST,@ABORT) IF B'="" WRITE !,"Search Succeeded -- Found ",B,"=",@B QUIT IF $Q="" WRITE !,"Search Completed" QUIT WRITE !,"Search Failed -- Aborted at ",$Q QUIT .DE .DS After executing, the values of variables A and B are as follows: A = ^XYZ B = ^XYZ(1001) The message "Search Failed -- Aborted at ^XYZ(1001)" will be displayed. .DE .bp .P 1 A MUMPS implementation of the three-argument $QUERY function is given below. The first argument "START" is the starting point for the search. The second argument "SEARCH" contains the expression for the search criteria. The third argument "ABORT" contains the expression controlling abnormal termination. In this example the single-argument $QUERY function and the system variables "$Q", "$QC", and "$QT" are presumed to exist. For the purpose of this example, we allow the assumption that $Q, $QC, and $QT can be set from MUMPS. (We also assume that the date does not change to simplify the time interval testing.) .DS QUERY ;PMK@JHH -- Three-argument $QUERY function ENTRY(START,SEARCH,ABORT) NEW ITIME,HIT SET ITIME=$PIECE($H,",",2); Initial time SET $Q=START,HIT=0 FOR $QC=0:0 SET $Q=$QUERY(@$Q) QUIT:$Q="" DO TEST QUIT:HIT IF HIT=-1 QUIT ""; Abnormal termination -- return empty string QUIT $Q; Normal termination -- return name value TEST SET $QT=$PIECE($H,",",2)-ITIME IF @SEARCH SET HIT=1 QUIT IF @ABORT SET HIT=-1 QUIT QUIT .DE .bp .ul 1 A New Relational Operator for Comparing Name Values .P 1 The relation [[ is called "descends from". If A and B are name values, then A[[B is true if and only if A is a descendent of B. .P 1 The relation A[[B has the same value as $QNAME(@A,$QLENGTH(B))=B. Intuitively, A "contains" the subscripts of B. The value of A[[B is false if A is the empty string and B contains a name value. If either A or B are not name values, the test will fail. .sp 2 .ul 1 A New Relational Operator for Comparing Subscripts .P 1 The relation ]] is called "sorts after". A]]B is true if and only if A follows B in the subscript ordering sequence defined by the $ORDER function. .P 1 This operator is very useful. The standard subscript ordering sequence is empty string first, followed by numeric subscripts, and then the string subscripts. The numeric subscripts are ordered in increasing value, while the string subscripts are ordered by the ASCII collating sequence. Currently, in order to determine which subscript sorts first, one must determine whether the subscripts are numeric or string, and then use the numeric (">" or "<") operators or string collates operator ("]"). .bp \fBExamples\fR .P 1 .DS The first example is subtree copy. "X" contains the source reference in the form "^GLO(SS1,SS2,...,SSn," "Y" contains the destination reference in the same format ENTRY NEW A,A0,B,B0,I,L SET A=$E(X,1,$L(X)-1) SET:A["(" A=A_")" SET A=$QNAME(@A) SET B=$E(Y,1,$L(Y)-1) SET:B["(" B=B_")" SET B=$QNAME(@B) IF $D(@A) SET @B=@A IF B'["(" SET B0=B_"(" ELSE SET B0=$E(B,1,$L(B)-1)_"," SET A0=A,L=$LENGTH(A)+1 FOR I=0:0 SET A=$QUERY(@A) QUIT:A'[[A0 SET B=B0_$E(A,L,999),@B=@A EXIT QUIT .DE .P 1 The variable A runs through the source subtree nodes. The variable A0 is used for descendency checking. It contains the original source reference's name value. All of the subtree nodes' name values begin with this value. The remainder of the name value contains subscripts to be copied to the destination. The variable L is a pointer to this substring. .P 1 The variable B0 contains the original destination reference's name value in the "^GLO(SS1,SS2,...,SSn" format. Each destination node's name value is formed by concatenating this value and the additional descendent subscripts. The destination node's name value is stored in the variable B. The copy is performed by simple indirection. .sp 3 .DS .P 1 The second example is to print out all of a global from node "X" to node "Y". "X" and "Y" contain arbitrary nodes of a global in \fIglvn\fR format Note: "X" and "Y" can specify subscript levels ENTRY NEW A,B,I,N SET A=$QNAME(@X),B=$QNAME(@Y),N=$QLENGTH(@B) DO TEST IF 'OK WRITE !!,"The first is after the second!" GOTO EXIT IF $D(@A)#2 DO PRINT FOR I=0:0 SET A=$QUERY(@A) DO TEST QUIT:'OK DO PRINT EXIT QUIT PRINT WRITE !,A,"=",@A QUIT TEST SET OK=1 FOR I=1:1:$QLENGTH(@A) Q:I>N IF $QS(@A,I)]]$QS(@B,I) SET OK=0 Q QUIT .DE .sp 3 \fBCONCLUSIONS\fR .P 1 The new proposed functions would provide MUMPS with the capability to sequentially access a hierarchical database and perform ad-hoc data retrievals. They would provide additional capabilities for manipulating name values and subscripts. These are important improvements that are would be very useful in supporting more complex database tasks. ------------------------------ End of std-mumps Digest ****************************** -- Hokey ..ihnp4!plus5!hokey 314-725-9492