[mod.std.mumps] std-mumps Digest V1 #13

Hokey (The Moderator) <hokey@plus5.uucp> (04/15/85)
std-mumps Digest            Sun, 14 Apr 85       Volume  1 : Issue  13

Today's Topics:
                    Administrivia: $Query proposal
                    Block Mode Terminals and MUMPS
                       Revised $QUERY Proposal
----------------------------------------------------------------------

Date: 14 Apr 85 17:23:46 CST (Sun)
From: hokey@plus5.uucp
Subject: Administrivia: $Query proposal
To: std-mumps

Rather than format the $query proposal, I left it unformatted.

Hokey

------------------------------

Date: 7 Apr 85 22:10:53 CST (Sun)
From: hokey@plus5.uucp
Subject: Block Mode Terminals and MUMPS
To: std-mumps

Does anybody have any ideas on what it would take to make MUMPS work well
with block mode devices?  I suspect several areas would have to be changed,
and furthermore, that many existing software packages would not work really
well on block mode terminals.

------------------------------

Date: Mon, 8 Apr 85 17:07:06 est
From: maryland!ihnp4!seismo!osiris!ocsplx!pete
Subject: Revised $QUERY Proposal
To: std-mumps@plus5.uucp

Hokey,

Here is the revised $QUERY proposal.  It inclued source for nroff
(use "nroff -mm -rN2 ...").  The major revision is in the area of
the two and three argument $QUERY function.  This function now has
"qexpr" arguments which evaluate to "tvexpr's" for testing the nodes.

Please make yourself a copy of the paper and send it along.

					Pete Kuzmak (301) 955-6185


.SA 1 
.ce 7
.B
Proposed MUMPS Functions for Extending Database Capabilities
.R
.sp 2
Peter M. Kuzmak
The Johns Hopkins Hospital
and
Kevin O'Gorman
Independent Consultant
.sp 5
ABSTRACT
.sp
MUMPS is a simple language with a built-in hierarchical database
access method.  Although it is successful for a large variety of
applications, there are some database tasks for which MUMPS is not
well suited.  Two examples are sequentially accessing hierarchically
stored data and performing ad-hoc data retrievals.
.P 1
This paper describes several new proposed functions that would improve
the ability of MUMPS to handle complex database tasks.  In particular,
four new capabilities would be provided:
.AL 1
.LI
Sequentially traverse hierarchical storage
.LI
Access the name and data values of a storage reference
.LI
Operate on the name of storage references and subscripts
.LI
Perform ad-hoc retrievals
.LE
.bp
\fBINTRODUCTION\fR
.P 1
MUMPS combines the power of a simple language with the flexibility
of a hierarchical database, and is successful for a large variety of
applications.  It has been described as a "linguistically integrated
database system".  Yet, for all its merits, there are some rather
simple database tasks that are not easily performed in MUMPS.  Two
examples, copying a subtree and ad-hoc data retrieval, illustrate
this point.  The following example shows how hard it is to write code
to copy a node and its descendents.
.DS
\fIExample\fR -- Copy a subtree (Modified from File Manager routine ^%RCR)

"X" contains the source reference in the form "^GLO(SS1,SS2,...,SSn,"
"Y" contains the destination reference in the same format

ENTRY	SET Z=1,A="",C(0)=0,B="" GO NEXT

UP	SET Z=Z-1,@("B="_$PIECE(A,",",Z+C(Z-1),Z+C(Z)))
	SET A=$PIECE(A,",",1,Z-1+C(Z-1))_$EXTRACT(",",Z>1)

NEXT	SET @("B=$ORDER("_X_A_"B))"),C(Z)=C(Z-1)
	IF B="" GO EXIT:Z=1,UP
	IF @("$DEFINE("_X_A_"B))#10=1") SET @(Y_A_"B)="_X_A_"B)")
	IF @("$DEFINE("_X_A_"B))<9") GO NEXT

	;Descend the tree -- Special processing for non-numeric subscripts
	IF +B'=B DO COMMA:B[",",QUOTE:B["""" SET B=""""_B_""""
	SET A=A_B_",",Z=Z+1,B="" GO NEXT

COMMA	;Count number of embedded commas for $PIECEing of subscripts
	FOR C=0:0 SET C=$FIND(B,",",C) QUIT:C  SET C(Z)=C(Z)+1
	QUIT

QUOTE	;Replace each single quote in a subscript with a double quote
	F C=0:0 S C=$F(B,"""",C) Q:'C  S B=$E(B,1,C-1)_""""_$E(B,C,999),C=C+1 Q
	QUIT

EXIT	KILL A,B,C,Z QUIT
.DE
.P 1
The above algorithm requires concatenating and $PIECEing apart of
subscripts and a considerable amount of indirection.  Special code is
needed for non-numeric subscripts to add the enclosing quotes and to
handle embedded commas and quotes.  The algorithm must keep track of
the level, and ascend and descent accordingly.  
.P 1
The ad-hoc data retrieval task consists of traversing a global and
finding specific nodes that satisfy an arbitrary search criteria.
The code for this task is quite complicated.  Not only does the tree
have to be transversed, but all of the nodes have to be checked
against the search criteria.
.P 1
These examples illustrate several reasons why some database tasks are
hard in MUMPS.  Both subtree copy and ad-hoc data retrieval are
essentially sequential tasks.  The hierarchical structure of MUMPS
actually "gets in the way" when trying to perform tasks that are not
hierarchical in nature.  MUMPS lacks facilities for manipulating
subscripts.  As a result, subscripts must be treated as character
strings that are concatenated together.  Special care then must be
given to embedded quotes, commas, and differentiating numeric from
non-numeric subscripts.  Finally, MUMPS can access only a single
node at a time.  It has no facilities for handling multi-node
operations (except to kill subtrees).
.P 1
All of these things make dealing with some tasks more difficult than
necessary.  A new set of capabilities for MUMPS can greatly simplify
the programming while at the same time improving performance.  What
is needed is a new way to look at a hierarchical database, new
facilities for traversing the database and manipulating subscripts,
and functions that operate on more than one node a time.
.bp
\fBName Value Strings\fR
.P 1
A hierarchical database can be viewed as a flat sequential file
containing two items per record, a name value and a data value.
The name values and data values together form a relationship.
.P 1
A \fBname value\fR is a global or local variable name (\fIglvn\fR)
in which all the subscript values are represented by either
\fInumeric data values\fR or \fIstring literals\fR.  Subscripts that
are numeric are represented by their unique numeric data values.
Subscripts values which are non-numeric are represented by string
literals (i.e., they are bounded by quotes and contain pairs of
adjacent quotes for each embedded quote).
.P 1
A variable or an expression may be used for a subscript in a \fIglvn\fR.
This is fully evaluated to a numeric data value or a string literal to
obtain a name value.  A naked reference \fIglvn\fR has its subscripts
similarly evaluated to numeric or string literals and then is expanded
to obtain a global name value.
.P 1
Every node in the hierarchical database that has data can be viewed
as a name value/data value pair in the relationship.  The name values
are unique, they form an ordered set, and each name value can have one
and only one data value.  The data value of a node can always be
referenced via its corresponding "name value" by indirection.
.DS
.sp 1
.ul 1
Example
SET X=1,Y="ALPHA",Z="BETA"
SET ^ABC(X,Y,Z)="TEST"

The name value of this node is the string ^ABC(1,"ALPHA","BETA").
The data value is "TEST".
.DE
.sp 1
.P 1
It is possible to define a "name value string" in MUMPS as a string
which has the form of a name value.  New functions and operators can
then be defined to allow MUMPS to work with name values.  The ability
to sequentially traverse multi-level global or local arrays could then
be provided.  The new functions would allow a search criteria to be
used in accessing the database.  The new functions and operators
would return, compare, or manipulate name value strings, making it as
easy to work with the hierarchical keys as with the data.  Such
capabilities would provide new ways to access the database, making the
global copy tasks and movement of subtrees trivial, and greatly
simplifying ad-hoc data retrieval.
.sp 3
\fBNew Functions and Operators\fR
.P 1
The new $QNAME function would return the "name value" of a \fIglvn\fR.

.DS
Syntax:   $QNAME(\fIglvn\fR)		Abbreviation: $QN

Function: Return the "name value" of the \fIglvn\fR.

	  Note: The name value is unique and is defined
		for every \fIglvn\fR.
.sp 1
.ul 1
Example
SET X=1,Y="ALPHA",Z="BETA"
SET A=$QNAME(^ABC(X,Y,Z))
SET ^ABC(X,Y,Z)="TEST"
SET B=$QNAME(^(Z))
.DE
The variables A and B both have the same value, the name value
^ABC(1,"ALPHA","BETA").

.P 1
The $QNAME function would also have a two-argument form. The second
argument, an integer expression (\fIintexpr\fR) would limit the number
of subscripts of the "name value" returned by $QNAME.  If n were the
value of the limit, only the first n subscripts of the name value
would be returned.  If the name value contained more subscripts than
the limit, they would be ignored.  This form of the $QNAME function
would be useful in checking descendency in sub-trees.
.DS
Syntax:   $QNAME(\fIglvn1\fR,\fIintexpr2\fR)

Function: Return "name value" containing a numerically limited number
	  of subscripts.

    Let $QNAME(\fIglvn1\fR) be of the form Name(s\fI1\fR,s\fI2\fR,...,s\fIn\fR), having
    n subscripts, and let m be the value of \fIintexpr2\fR.
    Then $QNAME(\fIglvn1\fR,\fIintexpr2\fR) is defined as follows:

	If m is less than 0, it is an error.

	If m=0, return only "Name".

	If m>0, m<n, return the partial name value "Name(s\fI1\fR,s\fI2\fR,...,s\fIm\fR)",
	  containing the first m subscripts.

	Otherwise return the complete name value "Name(s\fI1\fR,s\fI2\fR,...,s\fIn\fR)".
.DE
.DS

.ul 1
Example
SET X=1,Y="ALPHA",Z="BETA"
SET A=$QNAME(^ABC(X,Y,Z))
SET B=$QNAME(^ABC(X,Y,Z),0)
SET C=$QNAME(^ABC(X,Y,Z),1)
SET D=$QNAME(^ABC(X,Y,Z),2)
SET E=$QNAME(^ABC(X,Y,Z),3)
SET F=$QNAME(^ABC(X,Y,Z),4)

After executing, the values of variables A-F are as follows:
	A = ^ABC(1,"ALPHA","BETA")
	B = ^ABC
	C = ^ABC(1)
	D = ^ABC(1,"ALPHA")
	E = ^ABC(1,"ALPHA","BETA")
	F = ^ABC(1,"ALPHA","BETA")
.DE 
.bp
.P 1
The $QLENGTH function is used to determine the number of subscripts
in a \fIglvn\fR.
.sp 1
.DS
Syntax:   $QLENGTH(\fIglvn\fR)			Abbreviation: $QL

Function: Return number of subscripts in \fIglvn\fR.

   Let $QNAME(\fIglvn\fR) be of the form Name(s\fI1\fR,s\fI2\fR,...,s\fIn\fR), having n
   subscripts.  If there are no subscripts, then consider n to be zero.
   Then, $QLENGTH(\fIglvn\fR) returns the integer n.

.ul 1
Example
SET X=1,Y="ALPHA",Z="BETA"
SET N1=$QLENGTH(^ABC(X,Y,Z))
SET A=$QNAME(^ABC(X,Y,Z))
SET N2=$QLENGTH(@A)

The variables N1 and N2 will both contain the value 3 after execution.
.DE
.bp
.P 1
The $QSUBSCRIPT function is used to obtain the value of a subscript.
.DS
Syntax:   $QSUBSCRIPT(\fIglvn1\fR,\fIintexpr2\fR)

Function: Returns the value of a designated subscript from \fIglvn1\fR.

	  Note: This functions may also be used to return the
		name of the global or the local array.

   Let $QNAME(\fIglvn\fR) be of the form Name(s\fI1\fR,s\fI2\fR,...,s\fIn\fR), having n 
   subscripts, and let m be the value of \fIintexpr2\fR.
   Then $QSUBSCRIPT(\fIglvn1\fR,\fIintexpr2\fR) is defined as follows:

	If m is less than 0, it is an error.

	If m=0, return only Name.

	If m>0, m<n, or m=n, return the value of subscript Sm.

	Otherwise m>n and return the empty string.

Note: The value of a subscript is either a numeric data value or
      a string data value.  The $QSUBSCRIPT function returns the
      actual value of the subscript, and not its external literal
      representation.
.DE
.DS

.ul 1
Example
SET X=1,Y="ALPHA",Z="BETA"
SET A=$QSUBSCRIPT(^ABC(X,Y,Z),0)
SET B=$QSUBSCRIPT(^ABC(X,Y,Z),1)
SET C=$QSUBSCRIPT(^ABC(X,Y,Z),2)
SET D=$QSUBSCRIPT(^ABC(X,Y,Z),3)
SET E=$QSUBSCRIPT(^ABC(X,Y,Z),4)

After executing, the values of variables A-E are as follows:
	A = ^ABC
	B = 1
	C = ALPHA
	D = BETA
	E = the empty string
.DE
.bp
.P 1
The $QUERY function permits the multiple-levels of hierarchical storage
to be sequentially traversed.  It also allows hierarchical storage to be
sequentially searched.  There are one, two, and three-argument forms of
the $QUERY function.  The one-argument form returns the name value of
the next node following a \fIglvn\fR in the ordering sequence.
.sp
.DS
Syntax:   $QUERY(\fIglvn\fR)		Abbreviation: $QU

Function: Traverse multiple-levels of a hierarchical file in ordering
	  sequence.  Return name value of next node following the
	  designated \fIglvn\fR.  If there is none, return the empty
	  string.

	  If $QUERY returns a non-empty name value string, then the data
	  value of the node can be obtained by applying indirection.

	  Note: If the name value is a global, then the $QUERY function
		will update the naked reference.
		

.ul 1
Example
SET X=1,Y="ALPHA",Z="BETA"
KILL ^ABC
SET ^ABC(X)=1,^ABC(X,Y)=2,^ABC(X,Y,Z)=3
SET A="^ABC"
SET B=$QUERY(@A)
SET C=$QUERY(@B)
SET D=$QUERY(@C)
SET E=$QUERY(@D)
.DE
.DS
After executing, the values of variables A-E are as follows:
	A = ^ABC
	B = ^ABC(1)
	C = ^ABC(1,"ALPHA")
	D = ^ABC(1,"ALPHA","BETA")
	E = the empty string
.DE
.sp 2
.P 1
The two and three-argument forms of the $QUERY function provide ad-hoc
data retrieval capabilities.  The second and third arguments contain 
"truth-value expressions" (\fItvexpr\fR's) which test the name value
and/or the data value of each node and control selection.
.P 1
The second and third arguments evaluate to "truth-value expressions".
Syntactically, the second and third arguments are defined as "query
expressions which evaluate to truth-value expressions".
.sp 1
.DS
	The "query expression" \fIqexpr\fR is defined as an
	expression that evaluates to a "truth-value expression".

		\fIqexpr ::= expr V tvexpr\fR

	A \fIqexpr\fR argument of $QUERY is evaluated for each
	node.  It is used to control selection of nodes according
	to a search criteria.
.DE
.sp 1
.P 1
The second argument of $QUERY is a \fIqexpr\fR that selects the
nodes that are returned by the function.  The \fIqexpr\fR is
evaluated for each node as the storage is traversed.  If the
\fIqexpr\fR is true, the name value of the node is returned.
If it is false, the node is skipped and the search continues.
.P 1
In order to reference the node in the \fIqexpr\fR during the search,
a new special variable \fB$Q[UERY]\fR is necessary.  The $Q special
variable contains the name value of the node that is being tested.
.P 1
The third argument of $QUERY is a second \fIqexpr\fR that causes the
search to abnormally terminate.  This \fIqexpr\fR is evaluated second
for each node as the storage is transversed.  If this \fIqexpr\fR is
true, the truth value switch $T is set to 0 and the name value of the
node is returned.
.P 1
In order to make the abnormal termination of $QUERY more useful, two
more additional special variables are proposed.  \fB$QC[OUNT]\fR would
be a count of the number of nodes traversed for the invocation of the
function.  \fB$QT[IME]\fR would be the elapsed time (in seconds) for
the invocation of the function.  The abnormal termination can then
be triggered by exceeding a designated number of nodes or a time limit.
.DS

Syntax:   $QUERY(\fIglvn1\fR,\fIqexpr2\fR)
	  $QUERY(\fIglvn1\fR,\fIqexpr2\fR,\fIqexpr3\fR)

Function: Search multiple-levels of a hierarchical file in ordering
	  sequence.  Return the name value of the next node following
	  the \fIglvn1\fR that satisfies the designated search criteria.
	  If there is none, return the empty string.

	  The second argument, \fIqexpr2\fR, selects nodes that
	  satisfy the search criteria.  If \fIqexpr2\fR is true
	  when evaluated for a node, the search stops and the
	  name value of the node is returned by the function.

	  The optional third argument, \fIqexpr3\fR, is only evaluated
	  when the second argument is false and controls abnormal
	  termination.  If \fIqexpr3\fR is true for a node, stop the
	  search and return the empty string.  (The name value of the
	  last node remains in $Q.)

	  If $QUERY returns a non-empty name value string, then the data
	  value of the node can be obtained by applying indirection.

	  Note: If the name value is a global, then the $QUERY function
		will update the naked reference.
.DE
.bp
.DS
.ul 1
Example
;Find subscripts containing the letter "A"
SET X=1,Y="ALPHA",Z="BETA"
KILL ^ABC
SET ^ABC(X)=1,^ABC(X,Y)=2,^ABC(X,Y,Z)=3

SET TEST="$PIECE($Q,""("",2,999)[""A"""

SET A="^ABC"
SET B=$QUERY(@A,TEST)
SET C=$QUERY(@B,TEST)
SET D=$QUERY(@C,TEST)
.DE
.DS
After executing, the values of variables A-D are as follows:
	A = ^ABC
	B = ^ABC(1,"ALPHA")
	C = ^ABC(1,"ALPHA","BETA")
	D = the empty string
.DE
.sp 2
.DS
.ul 1
Example
;Find third level nodes that are greater than one
SET X=1,Y="ALPHA",Z="BETA"
KILL ^ABC
SET ^ABC(X)=1,^ABC(X,Y)=2,^ABC(X,Y,Z)=3

SET TEST="($QLENGTH($Q)=3)&(@$Q>1)"

SET A="^ABC"
SET B=$QUERY(@A,TEST)
SET C=$QUERY(@B,TEST)
.DE
.DS
After executing, the values of variables A-C are as follows:
	A = ^ABC
	B = ^ABC(1,"ALPHA","BETA")
	C = the empty string
.DE
.DS
.ul 1
Example
;Do a long search -- abnormally terminate after passing 1000 nodes
FOR I=1:1:2000 SET ^XYZ(I)="ABCDEF"

SET TEST="@$Q[""XYZ"""; Find a data value containing "XYZ"
SET ABORT="$QC>1000"

SET A="^XYZ"
SET B=$QUERY(@A,@TEST,@ABORT)
IF B'="" WRITE !,"Search Succeeded -- Found ",B,"=",@B QUIT
IF $Q="" WRITE !,"Search Completed" QUIT
WRITE !,"Search Failed -- Aborted at ",$Q
QUIT
.DE
.DS
After executing, the values of variables A and B are as follows:
	A = ^XYZ
	B = ^XYZ(1001)
The message "Search Failed -- Aborted at ^XYZ(1001)" will be displayed.
.DE
.bp
.P 1
A MUMPS implementation of the three-argument $QUERY function is given
below.  The first argument "START" is the starting point for the search.
The second argument "SEARCH" contains the expression for the search
criteria.  The third argument "ABORT" contains the expression
controlling abnormal termination.  In this example the single-argument
$QUERY function and the system variables "$Q", "$QC", and "$QT"
are presumed to exist.  For the purpose of this example, we allow
the assumption that $Q, $QC, and $QT can be set from MUMPS.  (We also
assume that the date does not change to simplify the time interval
testing.)

.DS
QUERY	;PMK@JHH -- Three-argument $QUERY function

ENTRY(START,SEARCH,ABORT)

	NEW ITIME,HIT

	SET ITIME=$PIECE($H,",",2); Initial time
	SET $Q=START,HIT=0
	FOR $QC=0:0 SET $Q=$QUERY(@$Q) QUIT:$Q=""  DO TEST QUIT:HIT
	IF HIT=-1 QUIT ""; Abnormal termination -- return empty string
	QUIT $Q; Normal termination -- return name value

TEST	SET $QT=$PIECE($H,",",2)-ITIME
	IF @SEARCH SET HIT=1 QUIT
	IF @ABORT SET HIT=-1 QUIT
	QUIT
.DE
.bp
.ul 1
A New Relational Operator for Comparing Name Values
.P 1
The relation [[ is called "descends from".  If A and B are name values,
then A[[B is true if and only if A is a descendent of B.
.P 1
The relation A[[B has the same value as $QNAME(@A,$QLENGTH(B))=B.
Intuitively, A "contains" the subscripts of B.  The value of A[[B
is false if A is the empty string and B contains a name value.
If either A or B are not name values, the test will fail.
.sp 2
.ul 1
A New Relational Operator for Comparing Subscripts
.P 1
The relation ]] is called "sorts after".  A]]B is true if and only if
A follows B in the subscript ordering sequence defined by the $ORDER
function.
.P 1
This operator is very useful.  The standard subscript ordering sequence
is empty string first, followed by numeric subscripts, and then the
string subscripts.  The numeric subscripts are ordered in increasing
value, while the string subscripts are ordered by the ASCII collating
sequence.  Currently, in order to determine which subscript sorts first,
one must determine whether the subscripts are numeric or string, and
then use the numeric (">" or "<") operators or string collates operator
("]").
.bp
\fBExamples\fR
.P 1
.DS
The first example is subtree copy.

"X" contains the source reference in the form "^GLO(SS1,SS2,...,SSn,"
"Y" contains the destination reference in the same format

ENTRY	NEW A,A0,B,B0,I,L

	SET A=$E(X,1,$L(X)-1) SET:A["(" A=A_")" SET A=$QNAME(@A)
	SET B=$E(Y,1,$L(Y)-1) SET:B["(" B=B_")" SET B=$QNAME(@B)

	IF $D(@A) SET @B=@A

	IF B'["(" SET B0=B_"("
	ELSE SET B0=$E(B,1,$L(B)-1)_","

	SET A0=A,L=$LENGTH(A)+1


	FOR I=0:0 SET A=$QUERY(@A) QUIT:A'[[A0  SET B=B0_$E(A,L,999),@B=@A 

EXIT	QUIT
.DE
.P 1
The variable A runs through the source subtree nodes.  The variable
A0 is used for descendency checking.  It contains the original source
reference's name value.  All of the subtree nodes' name values begin
with this value.  The remainder of the name value contains subscripts
to be copied to the destination.  The variable L is a pointer to this
substring.
.P 1
The variable B0 contains the original destination reference's name
value in the "^GLO(SS1,SS2,...,SSn" format.  Each destination node's
name value is formed by concatenating this value and the additional
descendent subscripts.  The destination node's name value is stored
in the variable B.  The copy is performed by simple indirection.
.sp 3
.DS
.P 1
The second example is to print out all of a global from node "X"
to node "Y".

"X" and "Y" contain arbitrary nodes of a global in \fIglvn\fR format
Note:  "X" and "Y" can specify subscript levels

ENTRY	NEW A,B,I,N	
	SET A=$QNAME(@X),B=$QNAME(@Y),N=$QLENGTH(@B)
	DO TEST IF 'OK WRITE !!,"The first is after the second!" GOTO EXIT
	IF $D(@A)#2 DO PRINT
	FOR I=0:0 SET A=$QUERY(@A) DO TEST QUIT:'OK  DO PRINT
EXIT	QUIT

PRINT WRITE !,A,"=",@A QUIT

TEST	SET OK=1
	FOR I=1:1:$QLENGTH(@A) Q:I>N  IF $QS(@A,I)]]$QS(@B,I) SET OK=0 Q
	QUIT
.DE
.sp 3
\fBCONCLUSIONS\fR
.P 1
The new proposed functions would provide MUMPS with the capability to
sequentially access a hierarchical database and perform ad-hoc data
retrievals.  They would provide additional capabilities for manipulating
name values and subscripts.  These are important improvements that are
would be very useful in supporting more complex database tasks.

------------------------------

End of std-mumps Digest
******************************
-- 
Hokey           ..ihnp4!plus5!hokey
		  314-725-9492