Hokey (The Moderator) <hokey@plus5.uucp> (03/31/85)
std-mumps Digest Sat, 30 Mar 85 Volume 1 : Issue 12
Today's Topics:
New $QUERY paper
----------------------------------------------------------------------
Date: Wed, 27 Mar 85 17:13:10 est
From: maryland!ihnp4!seismo!osiris!ocsplx!pete
Subject: New $QUERY paper
To: maryland!osiris!aplvax!umcp-cs!seismo!ihnp4!plus5!std-mumps
Hokey,
Here is the latest copy of the $QUERY proposal.
Pete Kuzmak [I had to edit out the boldfacing and underlining - HMS]
$QUERY MUMPS Function Proposal
INTRODUCTION
MUMPS combines the power of a simple language with the flexibility of
a hierarchical database, and is successful for a large variety of
applications. It has been described as a "linguistically integrated
database system". Yet, for all its merits, there are some rather simple
database tasks that are not easily performed in MUMPS. Two examples,
copying a subtree and ad-hoc data retrieval, illustrate this point. The
following example shows how hard it is to write code to copy a node and its
descendents.
Example -- Copy a subtree (Modified from File Manager routine ^%RCR)
"X" contains the source reference in the form "^GLO(SS1,SS2,...,SSn,"
"Y" contains the destination reference in the same format
ENTRY SET Z=1,A="",C(0)=0,B="" GO NEXT
UP SET Z=Z-1,@("B="_$PIECE(A,",",Z+C(Z-1),Z+C(Z)))
SET A=$PIECE(A,",",1,Z-1+C(Z-1))_$EXTRACT(",",Z>1)
NEXT SET @("B=$ORDER("_X_A_"B))"),C(Z)=C(Z-1)
IF B="" GO EXIT:Z=1,UP
IF @("$DEFINE("_X_A_"B))#10=1") SET @(Y_A_"B)="_X_A_"B)")
IF @("$DEFINE("_X_A_"B))<9") GO NEXT
;Descend the tree -- Special processing for non-numeric subscripts
IF +B'=B DO COMMA:B[",",QUOTE:B["""" SET B=""""_B_""""
SET A=A_B_",",Z=Z+1,B="" GO NEXT
COMMA ;Count number of embedded commas for $PIECEing of subscripts
FOR C=0:0 SET C=$FIND(B,",",C) QUIT:C SET C(Z)=C(Z)+1
QUIT
QUOTE ;Replace each single quote in a subscript with a double quote
F C=0:0 S C=$F(B,"""",C) Q:'C S B=$E(B,1,C-1)_""""_$E(B,C,999),C=C+1 Q
QUIT
EXIT KILL A,B,C,Z QUIT
- 2 -
The above algorithm requires concatenating and $PIECEing apart of
subscripts and a considerable amount of indirection. Special code is
needed for non-numeric subscripts to add the enclosing quotes and to handle
embedded commas and quotes. The algorithm must keep track of the level,
and ascend and descent accordingly.
The ad-hoc data retrieval task consists of traversing a global and
finding specific nodes that satisfy an arbitrary search criteria. The code
for this task is quite complicated. Not only does the tree have to be
transversed, but all of the nodes have to be checked against the search
criteria.
These examples illustrate several reasons why some database tasks are
hard in MUMPS. Both subtree copy and ad-hoc data retrieval are essentially
sequential tasks. The hierarchical structure of MUMPS actually "gets in
the way" when trying to perform tasks that are not hierarchical in nature.
MUMPS lacks facilities for manipulating subscripts. As a result,
subscripts must be treated as character strings that are concatenated
together. Special care then must be given to embedded quotes, commas, and
differentiating numeric from non-numeric subscripts. Finally, MUMPS can
only access only a single node at a time. It has no facilities for
handling multi-node operations (except to kill subtrees).
All of these things make dealing with some tasks more difficult than
necessary. A new set of capabilities for MUMPS can greatly simplify the
programming while at the same time improving performance. What is needed
is a new way to look at a hierarchical database, new facilities for
traversing the database and manipulating subscripts, and functions that
operate on more than one node a time.
- 3 -
Name Value Strings
A hierarchical database can be viewed as a flat sequential file
containing two items per record, a name value and a data value. The name
values and data values together form a relationship.
A name value is a global or local variable name (glvn) in which all
the subscript values are represented by either numeric or string literals.
Subscripts values which are non-numeric are represented by string literals
(i.e., they are bounded by quotes and contain pairs of adjacent quotes for
each embedded quote).
A variable or an expression may be used for a subscript in a glvn.
This is fully evaluated to a numeric or a string literal to obtain a name
value. A naked reference glvn has its subscripts similarly evaluated to
numeric or string literals and then is expanded to obtain a global name
value.
Every node in the hierarchical database that has data can be viewed as
a name value/data value pair in the relationship. The name values are
unique, they form an ordered set, and each name value can have one and only
one data value. The data value of a node can always be referenced via its
corresponding "name value" by indirection.
Example
SET X=1,Y="ALPHA",Z="BETA"
SET ^ABC(X,Y,Z)="TEST"
The name value of this node is the string ^ABC(1,"ALPHA","BETA").
The data value is "TEST".
It is possible to define a "name value string" in MUMPS as a string
which has the form of a name value. New functions and operators can then
be defined to allow MUMPS to work with name values. The ability to
sequentially traverse multi-level global or local arrays could then be
provided. The new functions would allow a search criteria to be used in
accessing the database. The new functions and operators would return,
compare, or manipulate name value strings, making it as easy to work with
the hierarchical keys as with the data. Such capabilities would provide
new ways to access the database, making the global copy tasks and movement
of subtrees trivial, and greatly simplifying ad-hoc data retrieval.
- 4 -
New Functions and Operators
The new $QNAME function would return the "name value" of a glvn.
Syntax: $QNAME(glvn) Abbreviation: $QN
Function: Return the "name value" of the glvn.
Example
SET X=1,Y="ALPHA",Z="BETA"
SET A=$QNAME(^ABC(X,Y,Z))
SET ^ABC(X,Y,Z)="TEST"
SET B=$QNAME(^(Z))
The variables "A" and "B" both have the same value, the name value
^ABC(1,"ALPHA","BETA").
The $QNAME function would also have a two-argument form. The second
argument, an integer expression (intexpr) would limit the number of
subscripts of the "name value" returned by $QNAME. If "n" were the value
of the limit, only the first n-subscripts of the name value would be
returned. If the name value contained more subscripts than the limit, they
would be ignored. This form of the $QNAME function would be useful in
checking descendency in sub-trees.
- 5 -
Syntax: $QNAME(glvn1,intexpr2)
Function: Return "name value" containing a numerically limited number
of subscripts.
Let $QNAME(glvn) be of the form Name(S1,S2,...Sn), having
"n" subscripts, and let "m" be the value of intexpr.
Then $QNAME(glvn,intexpr) is defined as follows:
If m is less than 0, it is an error.
If m=0, return only "Name".
If m>0, m<n, return the partial name value "Name(S1,S2,...Sm)",
containing the first m subscripts.
Otherwise return the complete name value "Name(S1,S2,...,Sn)".
Example
SET X=1,Y="ALPHA",Z="BETA"
SET A=$QNAME(^ABC(X,Y,Z))
SET B=$QNAME(^ABC(X,Y,Z),0)
SET C=$QNAME(^ABC(X,Y,Z),1)
SET D=$QNAME(^ABC(X,Y,Z),2)
SET E=$QNAME(^ABC(X,Y,Z),3)
SET F=$QNAME(^ABC(X,Y,Z),4)
After executing, the values of variables A-F are as follows:
"A" = ^ABC(1,"ALPHA","BETA")
"B" = ^ABC
"C" = ^ABC(1)
"D" = ^ABC(1,"ALPHA")
"E" = ^ABC(1,"ALPHA","BETA")
"F" = ^ABC(1,"ALPHA","BETA")
- 6 -
The $QLENGTH function is used to determine the number of subscripts in
a glvn.
Syntax: $QLENGTH(glvn) Abbreviation: $QL
Function: Return number of subscripts in glvn.
Let $QNAME(glvn) be of the form Name(S1,S2,...,Sn), having "n"
subscripts. Then, $QLENGTH(glvn) returns the integer "n".
If there are no subscripts, then "n" is zero.
Example
SET X=1,Y="ALPHA",Z="BETA"
SET N1=$QLENGTH(^ABC(X,Y,Z))
SET A=$QNAME(^ABC(X,Y,Z))
SET N2=$QLENGTH(@A)
The variables N1 and N2 will both contain the value 3 after execution.
- 7 -
The $QSUBSCRIPT function is used to obtain the value of a subscript.
Syntax: $QSUBSCRIPT(glvn1,intexpr2)
Function: Returns the value of a designated subscript from glvn.
Let $QNAME(glvn) be of the form Name(S1,S2,...,Sn), having "n"
subscripts, and let "m" be the value of intexpr.
Then $QSUBSCRIPT(glvn,intexpr) is defined as follows:
If m is less than 0, it is an error.
If m=0, return only Name.
If m>0, m<n, or m=n, return the value of subscript Sm.
Otherwise m>n and return the empty string.
Note: The value of a subscript is either a numeric data value or
a string data value. The $QSUBSCRIPT function returns the
actual value of the subscript, and not its external literal
representation. The value is not bounded by quotes and may
have embedded quotes.
Example
SET X=1,Y="ALPHA",Z="BETA"
SET A=$QSUBSCRIPT(^ABC(X,Y,Z),0)
SET B=$QSUBSCRIPT(^ABC(X,Y,Z),1)
SET C=$QSUBSCRIPT(^ABC(X,Y,Z),2)
SET D=$QSUBSCRIPT(^ABC(X,Y,Z),3)
SET E=$QSUBSCRIPT(^ABC(X,Y,Z),4)
After executing, the values of variables A-E are as follows:
"A" = ^ABC
"B" = 1
"C" = ALPHA
"D" = BETA
"E" = the empty string
- 8 -
The $QUERY function permits the multiple-levels of hierarchical
storage to be sequentially traversed. It also allows hierarchical storage
to be sequentially searched. There are one, two, and three-argument forms
of the $QUERY function. The one-argument form returns the name value of
the next node following a glvn in the ordering sequence. The two and
three-argument forms include truth-value expressions (tvexpr's) for testing
the name value and/or the data value to control selection nodes according
to a search criteria.
Syntax: $QUERY(glvn) Abbreviation: $QU
Function: Traverse multiple-levels of a hierarchical file in ordering
sequence. Return name value of next node following the
designated glvn. If there is none, return the empty
string.
If $QUERY returns a non-empty name value string, then the data
value of the node can be obtained by applying indirection.
The truth value switch $T will be set to 1 upon return.
Note: If the name value is a global, then the $QUERY function
will update the naked reference.
Example
SET X=1,Y="ALPHA",Z="BETA"
SET ^ABC(X)=1,^ABC(X,Y)=2,^ABC(X,Y,Z)=3
SET A="^ABC"
SET B=$QUERY(@A)
SET C=$QUERY(@B)
SET D=$QUERY(@C)
SET E=$QUERY(@D)
After executing, the values of variables A-E are as follows:
"A" = ^ABC
"B" = ^ABC(1)
"C" = ^ABC(1,"ALPHA")
"D" = ^ABC(1,"ALPHA","BETA")
"E" = the empty string
- 9 -
The second argument of $QUERY is a tvexpr that selects the nodes that
are returned by the function. The tvexpr is evaluated for each node as the
storage is traversed. If the tvexpr is true, the name value of the node is
returned. If it is false, the node is skipped and the search continues.
In order to reference the node in the tvexpr during the search, a new
special variable $Q[UERY] is necessary. The $Q special variable contains
the name value of the node that is being tested.
The third argument of $QUERY is a second tvexpr that causes the search
to abnormally terminate. This tvexpr is evaluated second for each node as
the storage is transversed. If this tvexpr is true, the truth value switch
$T is set to 0 and the name value of the node is returned.
In order to make the abnormal termination of $QUERY more useful, two
more additional special variables are proposed. $QC[OUNT] would be a count
of the number of nodes traversed for the invocation of the function.
$QT[IME] would be the elapsed time (in seconds) for the invocation of the
function. The abnormal termination can then be triggered by exceeding a
designated number of nodes or a time limit.
- 10 -
Syntax: $QUERY(glvn1,tvexpr2)
$QUERY(glvn1,tvexpr2,tvexpr3)
Function: Search multiple-levels of a hierarchical file in ordering
sequence. Return the name value of the next node following
the glvn1 that satisfies the designated search criteria.
If there is none, return the empty string.
The second argument, tvexpr2, selects nodes that
satisfy the search criteria. If tvexpr2 is true
when evaluated for a node, the search stops and the
name value of the node is returned by the function.
The optional third argument, tvexpr3, controls
abnormal termination. If tvexpr3 (evaluated after
tvexpr2) is true for a node, stop the search, set
$T to 0, and return. (The name value of the last node
is returned by the function.)
If $QUERY returns a non-empty name value string, then the data
value of the node can be obtained by applying indirection.
Normally, the truth value switch $T will be set to 1 upon
return. Only in the event of an abnormal termination is it
set to 0. A normal termination includes returning an empty
string.
Note: If the name value is a global, then the $QUERY function
will update the naked reference.
- 11 -
Example
;Find subscripts containing the letter "A"
SET X=1,Y="ALPHA",Z="BETA"
SET ^ABC(X)=1,^ABC(X,Y)=2,^ABC(X,Y,Z)=3
SET TEST="$PIECE($Q,""("",2,999)[""A"""
SET A="^ABC"
SET B=$QUERY(@A,@TEST)
SET C=$QUERY(@B,@TEST)
SET D=$QUERY(@C,@TEST)
After executing, the values of variables A-D are as follows:
"A" = ^ABC
"B" = ^ABC(1,"ALPHA")
"C" = ^ABC(1,"ALPHA","BETA")
"D" = the empty string
Example
;Find third level nodes that are greater than one
SET X=1,Y="ALPHA",Z="BETA"
SET ^ABC(X)=1,^ABC(X,Y)=2,^ABC(X,Y,Z)=3
SET TEST="($QLENGTH($Q)=3)&(@$Q>1)"
SET A="^ABC"
SET B=$QUERY(@A,@TEST)
SET C=$QUERY(@B,@TEST)
After executing, the values of variables A-C are as follows:
"A" = ^ABC
"B" = ^ABC(1,"ALPHA","BETA")
"C" = the empty string
- 12 -
Example
;Do a long search -- abnormally terminate after passing 1000 nodes
FOR I=1:1:2000 SET ^XYZ(I)="ABCDEF"
SET TEST="@$Q[""XYZ"""; Find a data value containing "XYZ"
SET ABORT="$QC>1000"
SET A="^XYZ"
SET B=$QUERY(@A,@TEST,@ABORT)
IF W !,"Search Succeeded"
ELSE W !,"Search Failed"
WRITE " -- ",B
After executing, the values of variables A and B are as follows:
"A" = ^XYZ
"B" = ^XYZ(1001)
The message "Search Failed -- ^XYZ(1001)" will be displayed.
- 13 -
A MUMPS implementation of the three-argument $QUERY function is given
below. The first argument "START" is the starting point for the search.
The second argument "SEARCH" contains the expression for the search
criteria. The third argument "ABORT" contains the expression controlling
abnormal termination. In this example the single-argument $QUERY function
and the system variables "$Q", "$QC", and "$QT" are presumed to exist.
QUERY ;PMK@JHH -- Three-argument $QUERY function
ENTRY(START,SEARCH,ABORT)
NEW ITIME,HIT
SET ITIME=$PIECE($H,",",2); Initial time
SET $Q=START,HIT=0
FOR $QC=0:0 SET $Q=$QUERY(@$Q) QUIT:$Q="" DO TEST QUIT:HIT
IF HIT'=-1; Set $T for normal/abnormal termination
QUIT $Q
TEST SET $QT=$PIECE($H,",",2)-ITIME
IF @SEARCH SET HIT=1 QUIT
IF @ABORT SET HIT=-1 QUIT
QUIT
- 14 -
A New Relational Operator for Comparing Name Values
The relation [[ is called "descends from". If A and B are name
values, then A[[B is true if and only if A is a descendent of B.
The relation A[[B has the same value as $QNAME(@A,$QLENGTH(B))=B.
Intuitively, A "contains" the subscripts of B. The value of A[[B is false
if A is the empty string and B contains a name value.
A New Relational Operator for Comparing Subscripts
The relation ]] is called "sorts after". A]]B is true if and only if
A follows B in the subscript ordering sequence defined by the $ORDER
function.
This operator is very useful. The standard subscript ordering
sequence is empty string first, followed by numeric subscripts, and then
the string subscripts. The numeric subscripts are ordered in increasing
value, while the string subscripts are ordered by the ASCII collating
sequence. Currently, in order to determine which subscript sorts first,
one must determine whether the subscripts are numeric or string, and then
use the numeric (">" or "<") operators or string collates operator ("]").
- 15 -
Examples
The first example is subtree copy.
"X" contains the source reference in the form "^GLO(SS1,SS2,...,SSn,"
"Y" contains the destination reference in the same format
ENTRY SET A=$E(X,1,$L(X)-1)_")",A=$QNAME(@A)
SET C=$E(Y,1,$L(Y)-1)_")",C=$QNAME(@C)
SET C=$E(C,1,$L(C)-1)
SET A0=A,L=$LENGTH(A)
FOR I=0:0 SET A=$QUERY(@A) QUIT:A'[[A0 SET B=C_$E(A,L,999),@B=@A
EXIT KILL A,A0,B,C,I,L QUIT
The variable "A" runs through the source subtree nodes. The variable
"A0" is used for descendency checking. It contains the original source
reference's name value. All of the subtree nodes' name values begin with
this value. The remainder of the name value contains subscripts to be
copied to the destination. The variable "L" is a pointer to this
substring.
The variable "C" contains the original destination reference's name
value in the "^GLO(SS1,SS2,...,SSn," format. Each destination node's name
value is formed by concatenating this value and the additional descendent
subscripts. The destination node's name value is stored in the variable
"B". The copy is performed by simple indirection.
- 16 -
The second example is to print out all of a global from node "X"
to node "Y".
"X" and "Y" contain arbitrary nodes of a global in glvn format
Note: "X" and "Y" can specify different numbers of subscripts
ENTRY SET A=$QNAME(@X),B=$QNAME(@Y),N=$QLENGTH(@B)
DO TEST IF 'OK W !!,"The first is after the second!" GO TO EXIT
IF $D(@A)#2 DO PRINT
FOR I=0:0 SET A=$QUERY(@A) DO TEST QUIT:'OK D PRINT
EXIT KILL A,B,I,N QUIT
PRINT W !,A,"=",@A QUIT
TEST SET OK=1
FOR I=1:1:$QLENGTH(@A) Q:I>N IF $QS(@A,I)]]$QS(@B,I) S OK=0 Q
QUIT
CONCLUSIONS
------------------------------
End of std-mumps Digest
******************************
--
Hokey ..ihnp4!plus5!hokey
314-725-9492