DIAMOND.JON%forum.va.gov (04/11/91)
Pattern Definition Proposal|TAB||TAB|X11/SC1/TG1/91-2 Thursday, April 4, 1991|TAB||TAB|Page 1 of 5 1. |TAB|IDENTIFICATION 1.1 |TAB|Title Pattern definitions 1.2|TAB|MDC proposer and sponsor Proposer:|TAB|Jon Diamond, Hoskyns Group, 130 Shaftesbury Avenue, 2. |TAB|JUSTIFICATION 2.1|TAB|Needs The major need is to be able to define (re-define) pattern codes for processing text in non-English languages. Currently there is no mechanism for being able to specify or extend pattern matches. See separate document for further information. 2.2|TAB|Existing practice in the area of the proposed change As far as is known there are no current implementations which allow for any capability for user-definability of patterns, or extandability above those defined by the implementation. 3.|TAB|DESCRIPTION 3.1|TAB|General description of the proposed change There are three parts to this proposal. The first addresses the issue of being able to have access to extended pattern capabilities by using structured system variables to define the meaning of patcodes in an explicit fashion. The second extends the definition of patatom in a similar fashion to the way glvn has been extended to allow an environment specification. This will allow applications to be able to select between different sets of pattern match tables, in a similar fashion to the way a global is selected from a different environment in networking. This will allow for the switching between different languages definitions of the same patcodes. The third allows an application to force usage of the patcodes in the base ASCII set that we currently use, to override the current default set, without having to place a reserved name in an environment specification. This will allow programs to be able to verify that, for example, entered characters are alphabetic in all possible character sets used. 3.2|TAB|Annotated examples of use The current pattern codes can be set up in the following fashion (for 7-bit characters), subject to access control:- KILL ^$PATTERN FOR I=0:1:31,127 SET ^$PATTERN("C","MEMBER",$CHAR(I))="" FOR I=32:1:47 SET ^$PATTERN("P","MEMBER",$CHAR(I))="" FOR I=48:1:57 SET ^$PATTERN("N","MEMBER",$CHAR(I))="" FOR I=58:1:64 SET ^$PATTERN("P","MEMBER",$CHAR(I))="" FOR I=65:1:90 SET ^$PATTERN("U","MEMBER",$CHAR(I))="" FOR I=91:1:96 SET ^$PATTERN("P","MEMBER",$CHAR(I))="" FOR I=97:1:122 SET ^$PATTERN("L","MEMBER",$CHAR(I))="" FOR I=123:1:126 SET ^$PATTERN("P","MEMBER",$CHAR(I))="" MERGE ^$PATTERN("A")=^$PATTERN("U"),^$PATTERN("A")=^$PATTERN("L") MERGE ^$PATTERN("E")=^$PATTERN("C"),^$PATTERN("E")=^$PATTERN("P") MERGE ^$PATTERN("E")=^$PATTERN("A"),^$PATTERN("E")=^$PATTERN("N") To add additional characters to these pattern codes for other languages would require coding like:- SET ^$PATTERN("U","MEMBER",$A(""))="" Programs would then perform pattern matches in exactly the same way as they do now and get the expected results, eg the following:- SET A="" WRITE A?1"U" would produce the result 0 currently and 1 with the above SET having taken place. In an environment which normally runs in English, but also has pattern code tables set up for German the previous example would need modifying since the SET would only apply to the ^$PATTERN for German. SET A="" WRITE A?1"U" produces 0 in the normal (English) case, but SET A="" WRITE A?1|"GERMAN"|"U" would produce the value 1, as expected. Given the logical extension to environments then the German table might be set up by, say, KILL ^|"GERMAN"|$PATTERN MERGE ^|"GERMAN"|$PATTERN=^$PATTERN SET ^|"GERMAN"|$PATTERN("U","MEMBER",$A(""))="" SET ^|"GERMAN"|$PATTERN("L","MEMBER",$A(""))="" ... MERGE ^|"GERMAN"|$PATTERN("A")=^|"GERMAN"|$PATTERN("L") MERGE ^|"GERMAN"|$PATTERN("A")=^|"GERMAN"|$PATTERN("U") and a complete switch to German from English by SET ^$JOB($J,"PATTERN")="GERMAN" This last action is analogous to changing the environment for networking for globals, which according to current proposals would be achieved by SET ^$JOB($J,"GLOBAL")="XYZ" NOTE Whether any change of environment is possible is an access security issue. See X11/SC7/TG1/91-3 for more details. The final change proposed would be used to check whether a character was ASCII etc. Therefore if an application was running in the German environment, but needed to know whether the character was portable to a non-German environment, the coding would be:- IF A?1~A 3.3|TAB|Formalization 1. Extended pattern code definition In part I section 2.2 add the definition of ^$PATTERN after the $LOCK structured system variable: ^$P[ATTERN] |TAB|will provide information regarding pattern codes and their definition In section 2.3.3 replace the definition of patcode with patcode ::= | alpha | ... At the end of the following paragraph replace ", as follows." with ". A character "char" belongs to a patcode class if $DATA(^$PATTERN(patcode,"MEMBER",$ASCII(char))) is true. The initial values for the following patcodes are defined:" Add another section to Part II x. Pattern codes The only pattern codes that are required to be provided are A, C, E, L, N, P, U with the definitions as per section 2.3.3 of Part I. Portable programs cannot rely on changing the default environment specification. Add another section (not sure where to) x. ssvn semantics The following ssvns are defined:- ^$PATTERN(patcode,"MEMBER",intexpr) = "" The SET command can be used to assign a value to this ssvn. The KILL command can be used to delete individual nodes, sub- trees or the entire ssvn. The meaning is that implied by section 2.3.3. ^$JOB($J,"PATTERN") = default pattern environment specification (The remaining text from X11/SC7/TG1/91-3) section 3.3 also applies to this entry in ^$JOB.) 2. Alternate pattern code access In part I section 2.3.3 replace the definition of patatom with |TAB||TAB||TAB||TAB||TAB||TAB|| [ environment ] patcode|TAB|| |TAB|patatom ::= repcount|TAB|||TAB||TAB||TAB||TAB|| |TAB||TAB||TAB||TAB||TAB||TAB|| |TAB|strlit|TAB||TAB|| See section 3.2.2.2 for a definition of environment. 3. ASCII pattern access In part I section 2.3.3 add to the definition of patatom another option (after the repcount) |TAB||TAB||TAB||TAB|| |TAB|~ patcode|TAB|| making the definition of patatom |TAB||TAB||TAB||TAB||TAB||TAB|| [ environment ] patcode|TAB|| |TAB|patatom ::= repcount|TAB|| |TAB|~ patcode|TAB||TAB|| |TAB||TAB||TAB||TAB||TAB||TAB|| |TAB|strlit|TAB||TAB|| and add a new paragraph Where the form ~ patcode is used then the characters which match the patcode are defined to be those described below, irrespective of the current definition of the patcode. 4.|TAB|IMPLEMENTATION IMPACTS 4.1|TAB|Impact on existing user practices and investments Existing applications written for the English language would be easier to apply to other languages. No existing applications should be affected. 4.2|TAB|Impact on Existing Vendor Practices and Investments The impact on vendors is not insignificant, although some vendors have experience with the problems of different languages. A new table mechanism will need to be set up within implementations to allow for the variability of pattern match codes. This will need to be definable in a similar way to the existing UCI/directory/namespace concepts for globals, modifiable by a (restricted class of) user etc. -- Hokey We are Space Guys. We know what we are doing.