DIAMOND.JON%forum.va.gov (04/11/91)
Pattern Definition Proposal|TAB||TAB|X11/SC1/TG1/91-2
Thursday, April 4, 1991|TAB||TAB|Page 1 of 5
1. |TAB|IDENTIFICATION
1.1 |TAB|Title
Pattern definitions
1.2|TAB|MDC proposer and sponsor
Proposer:|TAB|Jon Diamond, Hoskyns Group, 130 Shaftesbury Avenue,
2. |TAB|JUSTIFICATION
2.1|TAB|Needs
The major need is to be able to define (re-define) pattern
codes for processing text in non-English languages. Currently
there is no mechanism for being able to specify or extend
pattern matches. See separate document for further
information.
2.2|TAB|Existing practice in the area of the proposed change
As far as is known there are no current implementations which
allow for any capability for user-definability of patterns,
or extandability above those defined by the implementation.
3.|TAB|DESCRIPTION
3.1|TAB|General description of the proposed change
There are three parts to this proposal. The first addresses
the issue of being able to have access to extended pattern
capabilities by using structured system variables to define
the meaning of patcodes in an explicit fashion.
The second extends the definition of patatom in a similar
fashion to the way glvn has been extended to allow an
environment specification. This will allow applications to be
able to select between different sets of pattern match
tables, in a similar fashion to the way a global is selected
from a different environment in networking. This will allow
for the switching between different languages definitions of
the same patcodes.
The third allows an application to force usage of the
patcodes in the base ASCII set that we currently use, to
override the current default set, without having to place a
reserved name in an environment specification. This will
allow programs to be able to verify that, for example,
entered characters are alphabetic in all possible character
sets used.
3.2|TAB|Annotated examples of use
The current pattern codes can be set up in the following
fashion (for 7-bit characters), subject to access control:-
KILL ^$PATTERN
FOR I=0:1:31,127 SET ^$PATTERN("C","MEMBER",$CHAR(I))=""
FOR I=32:1:47 SET ^$PATTERN("P","MEMBER",$CHAR(I))=""
FOR I=48:1:57 SET ^$PATTERN("N","MEMBER",$CHAR(I))=""
FOR I=58:1:64 SET ^$PATTERN("P","MEMBER",$CHAR(I))=""
FOR I=65:1:90 SET ^$PATTERN("U","MEMBER",$CHAR(I))=""
FOR I=91:1:96 SET ^$PATTERN("P","MEMBER",$CHAR(I))=""
FOR I=97:1:122 SET ^$PATTERN("L","MEMBER",$CHAR(I))=""
FOR I=123:1:126 SET ^$PATTERN("P","MEMBER",$CHAR(I))=""
MERGE
^$PATTERN("A")=^$PATTERN("U"),^$PATTERN("A")=^$PATTERN("L")
MERGE
^$PATTERN("E")=^$PATTERN("C"),^$PATTERN("E")=^$PATTERN("P")
MERGE
^$PATTERN("E")=^$PATTERN("A"),^$PATTERN("E")=^$PATTERN("N")
To add additional characters to these pattern codes for other
languages would require coding like:-
SET ^$PATTERN("U","MEMBER",$A(""))=""
Programs would then perform pattern matches in exactly the
same way as they do now and get
the expected results, eg the following:-
SET A=""
WRITE A?1"U"
would produce the result 0 currently and 1 with the above SET
having taken place.
In an environment which normally runs in English, but also
has pattern code tables set up for German the previous
example would need modifying since the SET would only apply
to the ^$PATTERN for German.
SET A=""
WRITE A?1"U"
produces 0 in the normal (English) case, but
SET A=""
WRITE A?1|"GERMAN"|"U"
would produce the value 1, as expected.
Given the logical extension to environments then the German
table might be set up by, say,
KILL ^|"GERMAN"|$PATTERN
MERGE ^|"GERMAN"|$PATTERN=^$PATTERN
SET ^|"GERMAN"|$PATTERN("U","MEMBER",$A(""))=""
SET ^|"GERMAN"|$PATTERN("L","MEMBER",$A(""))=""
...
MERGE ^|"GERMAN"|$PATTERN("A")=^|"GERMAN"|$PATTERN("L")
MERGE ^|"GERMAN"|$PATTERN("A")=^|"GERMAN"|$PATTERN("U")
and a complete switch to German from English by
SET ^$JOB($J,"PATTERN")="GERMAN"
This last action is analogous to changing the environment for
networking for globals, which
according to current proposals would be achieved by
SET ^$JOB($J,"GLOBAL")="XYZ"
NOTE Whether any change of environment is possible is an
access security issue. See
X11/SC7/TG1/91-3 for more details.
The final change proposed would be used to check whether a
character was ASCII etc. Therefore if an application was
running in the German environment, but needed to know whether
the character was portable to a non-German environment, the
coding would be:-
IF A?1~A
3.3|TAB|Formalization
1. Extended pattern code definition
In part I section 2.2 add the definition of ^$PATTERN after
the $LOCK structured system variable:
^$P[ATTERN]
|TAB|will provide information regarding pattern codes and
their definition
In section 2.3.3 replace the definition of patcode with
patcode ::= | alpha | ...
At the end of the following paragraph replace
", as follows."
with
". A character "char" belongs to a patcode class if
$DATA(^$PATTERN(patcode,"MEMBER",$ASCII(char))) is true. The
initial values for the following patcodes are defined:"
Add another section to Part II
x. Pattern codes
The only pattern codes that are required to be provided are
A, C, E, L, N, P, U with the definitions as per section 2.3.3
of Part I. Portable programs cannot rely on changing the
default environment specification.
Add another section (not sure where to)
x. ssvn semantics
The following ssvns are defined:-
^$PATTERN(patcode,"MEMBER",intexpr) = ""
The SET command can be used to assign a value to this ssvn.
The KILL command can be used to delete individual nodes, sub-
trees or the entire ssvn. The meaning is that implied by
section 2.3.3.
^$JOB($J,"PATTERN") = default pattern environment
specification
(The remaining text from X11/SC7/TG1/91-3) section 3.3 also
applies to this entry in ^$JOB.)
2. Alternate pattern code access
In part I section 2.3.3 replace the definition of patatom
with
|TAB||TAB||TAB||TAB||TAB||TAB|| [ environment ] patcode|TAB||
|TAB|patatom ::= repcount|TAB|||TAB||TAB||TAB||TAB||
|TAB||TAB||TAB||TAB||TAB||TAB|| |TAB|strlit|TAB||TAB||
See section 3.2.2.2 for a definition of environment.
3. ASCII pattern access
In part I section 2.3.3 add to the definition of patatom
another option (after the repcount)
|TAB||TAB||TAB||TAB|| |TAB|~ patcode|TAB||
making the definition of patatom
|TAB||TAB||TAB||TAB||TAB||TAB|| [ environment ] patcode|TAB||
|TAB|patatom ::= repcount|TAB|| |TAB|~ patcode|TAB||TAB||
|TAB||TAB||TAB||TAB||TAB||TAB|| |TAB|strlit|TAB||TAB||
and add a new paragraph
Where the form ~ patcode is used then the characters which
match the patcode are defined to be those described below,
irrespective of the current definition of the patcode.
4.|TAB|IMPLEMENTATION IMPACTS
4.1|TAB|Impact on existing user practices and investments
Existing applications written for the English language would
be easier to apply to other languages. No existing
applications should be affected.
4.2|TAB|Impact on Existing Vendor Practices and Investments
The impact on vendors is not insignificant, although some
vendors have experience with the problems of different
languages. A new table mechanism will need to be set up
within implementations to allow for the variability of
pattern match codes. This will need to be definable in a
similar way to the existing UCI/directory/namespace concepts
for globals, modifiable by a (restricted class of) user etc.
--
Hokey We are Space Guys. We know what we are doing.