thomson@utah-cs.UUCP (Richard A Thomson) (03/08/88)
Amiga FORTH Newsletter
Volume 1
Number 2
March 8th, 1988
[ I'm posting just the contents of AFN V1, #2 along with the first document
since the total length is 157K. If you wish to obtain the whole thing,
you can contact me (Rich Thomson), or Marcus Gabriel. Comments to AFN V1
numbers 1 and 2 will appear in AFN V1, #3 -- RT ]
Contents: Introduction
Forth-83.Txt (Offered but not included, 200114 bytes)
FStrings Package
Files: FStrings.Txt (56796 bytes)
FStrings.Scr (26432 bytes)
FSTest.Scr (28543 bytes)
MFStrings.Scr (27381 bytes)
GStrings Proposal
File: GStrings.Pro (10981 bytes)
WARNING CONCERNING THE MULTI-FORTH ASSEMBLER AND MFSTRINGS.SCR
THANK YOU Richard Thomson
Sources: marcus@newton.physics.purdue.edu (Marcus D. Gabriel)
East Coast Forth Board (ECFB)
(703-442-8695)(SYSOP: Jerry Shifrin, nice PCBoard)
___________________________________________________________________________
Subject: FStrings Package by George T. Hawkins
From: marcus@newton.physics.purdue.edu (Marcus D. Gabriel)
To One and All:
Allow me to begin by explaining the one offered file which is not
included here. The file Forth-83.Txt is the 89 page "FORTH-83 STANDARD,"
a publication of the FORTH STANDARDS TEAM, and it is reproducible
in whole or in part. Their address is
FORTH STANDARDS TEAM
P.O. BOX 4545
MOUNTAIN VIEW, CA 94040
USA
If you would like this document, please e-mail a note to this effect to
marcus@newton.physics.purdue.edu (Marcus D. Gabriel), and I will respond either
by sending it to you e-mail, or if there is sufficient response and when
Richard Thomson has time to create a completed mailing list, I will
propagate it to the group.
Last January I downloaded from the ECFB George T. Hawkins' FSTRINGS
package, adapted it to Multi-Forth in January, and put some more work into
it in late February. The file MFStrings.Scr is the result, more on this
file later. The file FStrings.Txt is George Hawkins' original narrative
on his FSTRINGS package, and this file is highly recommended reading.
The File FStrings.Scr is George Hawkins' original source file (FORTH-83
Standard) for the implementation of the abstract string operators of
FSTRINGS, although I have converted it from a block file to a stream file.
This file is included for three reasons:
i) George Hawkins' code is enjoyable and, I believe, instructive
to read.
ii) For those with JFORTH or MVP Forth, this will give additional
aid in adapting FSTRINGS.
iii) You may wish to adapt this package differently than myself,
and thus you have the original source.
The file FSTest.Scr is George Hawkins' Test/Validation file for
FStrings.Scr or, in this case, for MFStrings.Scr also. It has been
minimally modified in order to compile under Multi-Forth, and you should
have no difficulty in seeing where I have made my changes, there documented
or obvious. This file will help if you are adapting or re-adapting
FSTRINGS. The file MFStrings.Scr is my adaptation of FSTRINGS to
Multi-Forth.
I have coded all of the words that George Hawkins recommened
and many others besides these. I have defined two words for "dynamically"
allocating a temporary string storage buffer, eliminated some internal
words through the use of local variables, factored some words of generic
interest beyond this package, etc. If you do not like the way I did
something, please feel free to change it or bring it up for discussion.
I must admit that upon review, in some instances, but not all, I may have
gotten carried away with the use of local variables. In these cases, I
believe I was trying to make myself understand each and every word without
skimming over them by simply reworking them, even to excess.
C'est la vie :-).
If you have questions or problems, you can e-mail me a note if you do
not think it is of generic interest to the Amiga FORTH Newsletter,
otherwise please feel free to participate.
The file GStrings.Pro (I gather the pun was intended) was posted to the
ECFB by Rj Brown, and it is a proposed incremental improvement over
FSTRINGS. The abstract string operators would stay the same, it is the
implementation of these that would change. In essence, at the ECFB,
they are discussing lists and list processing as applied to strings.
See THE ART OF COMPUTER PROGRAMMING, FUNDAMENTAL ALGORIHMS, by Donald
Knuth. I am sure that Richard Thomson or others of the Amiga FORTH
Newsletter could supply additional or better references. I include this
file as an example of where we might take this discussion of strings,
as a group, if we wished, although it is not necessarily specific to the
Amiga. On the other hand, one does use strings with the Amiga.
WARNING AND SOLUTION
--------------------
The Multi-Forth Assembler has an interesting "feature." On page
43 of Chapter 21, see
CMPM Aa Ab size CMPM, test:<(Ab)-(Aa)>
whereas in fact the test <(Aa)-(Ab)> is performed, opposite of CMP, ,
CMPA, , or the Motorola convention. However note that on page 35 of
Chapter 21, -TEXT is coded correctly, that is, given that CMPM, functions
oppositely from the documentation. Note that I am refering to an old
manuel, but after calling CSI, I gather that the new manuel has the same
discrepency. At least they, and you, have been notified.
Find CMPM, in the the Multi-Forth Assembler and change the
original CMPM,
: CMPM, ( Aa\Ab\sz -- MEMORY MEMORY COMPARE)
B108 ,OP 3 AND !SIZE 6 SCALE
SWAP ?AREG OR SWAP ?AREG
SWAP 9 SCALE OR <OROP -OS ;
to
: CMPM, ( Aa\Ab\sz -- MEMORY MEMORY COMPARE)
( Modified by MDG Feb 1988 )
>R 2SWAP R>
B108 ,OP 3 AND !SIZE 6 SCALE
SWAP ?AREG OR SWAP ?AREG
SWAP 9 SCALE OR <OROP -OS ;
Remember that an Effective Address (ea) places TWO items on the Top Of
the Stack (TOS), hence the 2SWAP .
I made this change so that CMPM, functioned as CMP, , CMPA, , and
respected the Motorala convention as did CMP, and CMPA, . YOU MUST
DO THIS IN ORDER FOR COMPARE , ?SAME , AND _$$COMPARE OF MFSTRINGS.SCR
TO WORK AS ADVERTISED, OTHERWISE YOU MUST RECODE THEM.
END WARNING AND SOLUTION
------------------------
Finally, THANK YOU Richard Thomson for such a nice idea!
Enjoy
Marcus D. Gabriel
marcus@newton.physics.purdue.edu
==[ File: FStrings.Txt (56796 bytes) ]==CUT=HERE=========================
FORTH STRINGS
TABLE OF CONTENTS:
1. OVERVIEW
2. SYNTAX
3. PARAMETERS/DATA STACK CONVENTIONS
4. INTERNAL REPRESENTATION/STRUCTURE OF STRINGS
5. DATA ABSTRACTION
6. MAJOR STRING FUNCTIONS
6.0. A NOTE ON THE SYMBOLOGY USED HEREIN
6.1. STRING LITERALS
6.2. STRING DEFINITION
6.3. STRING CONSTANTS
6.4. STRING VARIABLES
6.5. STRING REFERENCE
6.6. BASIC STRING MANIPULATION
6.7. STRING/CHARACTER FETCHES/STORES
6.8. STRING INSERTIONS
6.9. STRING DELETIONS
6.10. STRING REPLACEMENTS
6.11. STRING ROTATIONS
6.12. STRING COMPARISONS
6.13. STRING PATTERN MATCHING
6.14. STRING SET OPERATIONS
6.15. STRING TRANSLATION
6.16. STRING ENCODING/DECODING
7. AREAS NOT COVERED
7.1. INPUT/OUTPUT
7.2. STRING STACK
7.3. PARSING STRINGS
8. THE FORTH SOURCE FILE
9. THE VALIDATION FILE
10. FORTH-83 DEVIATIONS
11. INTERNALS
11.1. INTERNAL STRING STRUCTURE
11.2. VISIBILITY
11.3. ERROR CHECKING
11.4. MEMORY OPERATORS
11.5. THE ISSUE OF STANDARDS
11.6. NULL STRINGS
11.7. CONVERSION PROBLEMS
12. GLOSSARY
1. OVERVIEW
This file discusses the concept of string manipulation primitives
within the Forth language.
The formulation attempts to be as general as possible in both problem
approach and solution, that is, in general, no a priori assumptions are
made about what is "best" nor about any imposed constraints, and in
general, I strive for the most complete, minimal solution.
2. SYNTAX
Following the general Forth philosophy of making word names short yet
meaningful, the convention of using the dollar sign "$" to represent
string operations is used throughout. All string words begin with an
initial "$" with two exceptions. Additionally, where possible, other
Forth naming conventions are "blended in".
For example, the Forth word chosen to copy a string is merely "$$!"
(pronounced "s-s-store"). Which means to copy/store (!) one string ($)
into another string ($).
In a like fashion, the word for string concatenation is "$$+" and the
word for typing a string is "$.". This seems, to me, more meaningful (in
the context of Forth) than, say, the words "STRCAT" or "STRPRINT".
As a caveat, I would like to add that I am NOT an advocate of BASIC
and the use of the "$" prefix I do not regard as giving the string
operators presented here a "BASIC-like" flavor. In the Forth source code
provided (and in the this documentation as well) a "pronunciation guide"
is given as a suggestion for pronouncing the string manipulation
primitives provided herein. It can be found (in double quotes) to the far
right of the block on the line which begins the colon definition.
Two successive dollar signs "$$" as a word prefix indicate that two
string references are needed (e.g., the concatenation word $$+ previously
given). In general there is no restriction on the two string references so
long as they are validly defined strings. The two references may, in fact,
be the same. For example, if the string variable "STV" contains the value
"Hello there!", then the code: "STV STV $$+" would give STV the value
"Hello there!Hello there!". (Of course the "insert string in string" word
($$INS) and the "replace string in string" word ($$REP) - to be introduced
later - should not contain the same reference for both strings.) Whenever
a string primitive begins with "$$" the first string is always understood
to be the "source" string and the second string is always the "target"
string. So, using the concatenation example again, the code:
"STV1 STV2 $$+" would add/concatenate STV1 to the end of STV2.
A prefix of "$C" indicates that a string and a character reference are
needed/returned. For example, the string operator "$C!" (pronounced
"s-c-store") takes a character, a string, and an index (which starts at 0)
into the string and stores the character in the string at the designated
index. Using the previous variable STV, then the code: "65 STV 1 $C!"
would give STV the value "HAllo there!Hello there!".
A single "$" indicates that only a single string reference is needed.
For example, the string operator "$NULL" (pronounced "string-null") sets a
given string to null.
Hopefully the pattern and logic behind the syntax chosen for Forth
string operators will become more apparent as additional examples are
given.
3. PARAMETERS/DATA STACK CONVENTIONS
The method selected to pass parameters to/from the string operators
provided herein follows a regular, consistent form. Operands (e.g.,
strings and characters) always precede operators (e.g., indices, lengths,
and counts). The general form of parameter passing is:
{operands} {operators}
where {operands} =
{addr,length|char|string} {string}
and {operators} =
{index} {length} {number} {status} {flag}
The enclosing brackets (i.e., "{" and "}") indicate the optional
occurrence of a parameter and the parameters themselves are:
addr = a machine address,
char = an 8 bit ASCII character,
string = an abstract string reference which may be further
qualified if needed (e.g., "string1"),
index = an index into a string (0..n),
length = the length of the (sub)string in question,
number = a number or character count,
status = a set of values indicating possible outcomes, and
flag = a TRUE/FALSE flag (a returned value, designated
as "t | f").
Finally, the general Forth usage of "consuming" a data stack item is
followed.
4. INTERNAL REPRESENTATION/STRUCTURE OF STRINGS
As a start, I feel one should clearly separate the actual internal
representation of Forth strings from their external appearance.
That is, the string data type (just as any other data type) should be
abstract. The Forth programmer should be given a rich enough set of
string primitives so that he/she need not have to be concerned with the
internal representation.
The implementation of the string manipulation capability itself must,
however, be closely concerned with the internal representation/structure.
In general, two major classes of representation for strings have
tended to evolve: counted strings and (null) terminated strings.
Terminated strings (also called ASCIIZ strings) are simply strings
which are terminated by a special symbol (generally the ASCII zero, or nul
character). This is the representation chosen, for example, by the C
language.
Counted strings are strings which reserve some initial portion of the
data structure itself to contain the following physical string length.
There are, of course, advantages to both string types.
The chief advantage of the terminated string type is that the string
may be arbitrarily long.
The chief advantage of the counted string is that it is fast (i.e.,
many manipulations on strings require knowing the string length and this
can be determined immediately in the case of counted strings).
A disadvantage with the terminated string is that the special
terminating symbol cannot be used within the body of the string and a
disadvantage with the counted string is that there is always some fixed
(system imposed) limitation on the maximum string size.
All in all, I feel the counted string representation is the superior
choice for Forth. In Forth, speed is always a consideration and the
problem of having a fixed string size limitation is resolved to a large
extent by ensuring that the implementation treat strings as an abstract
data type.
As a consequence, the string implementation given here uses counted
strings. I did not however select the current Forth usage (e.g., as
reflected in the word COUNT) of representing a counted string with a
preceding count byte. My feeling was that in our (current) era of 32-bit
micros, a single byte limiting the string length to 255 characters is
simply too restrictive. I have, rather, used an initial 16-bit word to
represent the current string length. Note also that this internal
representation can be easily changed without, in general, requiring
application program modifications since the string is implemented as an
abstract data type.
Let me make the emphatic point that the string operators defined here
are *totally* independent of implementation considerations. That is, there
is nothing in the definition of the string primitives provided which, in
any way, requires any assumptions about the underlying string structure.
This point may seem minor, but it is an extremely important one.
5. DATA ABSTRACTION
Having just discussed some of the considerations involved in the
choice of the internal representation/structure of strings, I would now
ask that the reader (from this point forward) ignore these considerations
entirely! The reason is that such considerations should only be a factor
for implementation (i.e., a vendor concern or a concern if you implement a
string package yourself). The string type, from the programmer's
perspective, should be an abstract data type.
The power and advantage of data abstraction, in general, is well
known, so I shall not go into further detail here. The key issue is,
rather, how one reflects the abstraction at the programming level.
A natural choice would seem that of implementing strings as an
address/pointer type. That is, any reference to a string merely leaves an
address upon the data stack. This address is meaningless (per se) to the
programmer, but is the "hook" used by all of the underlying string
primitives to perform string operations.
Of course, this implies, that the set of string primitives must be
sufficiently powerful not to require the direct programming access of the
abstract data type.
Not only does the use of a pointer type keep the underlying
implementation hidden, but it (as well) allows string references to be
treated as single data elements (since the FORTH-83 standard defines the
address type to be the same as an unsigned number which is represented as
a single stack element). Thus, if two string references are on TOS they
may be swapped via "SWAP" rather than having to introduce a new word such
as "$SWAP". I should add that I do not agree with any number of the
particulars of the FORTH-83 definition in this regard, but (at least for
this subcase) it serves the purpose well.
Although the definition and implementation of string types as abstract
data structures provides immeasurable benefit in the ability to write
clean, clear code using strings - it provides another major benefit as
well. Portability. That is, if strings (or any other data type) are
abstract, then they are prohibited (at the programming/application level)
from assuming anything about the underlying system. The underlying
details must, of course, be handled by the Forth system - but, they more
properly belong at this "hidden" level anyway. Since the code/application
can assume nothing about the underlying implementation, then - by
definition - the code is portable.
The only "wrinkle" in the above logic is that, unfortunately, some
sort of "interface" may be needed between strings and some other vendor
supplied package/subsystem (e.g., I/O) which is non-portable. To this end,
an "import" and "export" capability has been added which "translates"
between (to/from) the internal/abstract string representation and an
external/memory sequential one (which seems to be a common format across
vendors for the data content of strings). Note that (ideally) this
import/export capability should only be temporary. That is, once other
extensions to the Forth language have been defined which achieve true
portability/abstraction, then the need for importing and exporting will go
away.
6. MAJOR STRING FUNCTIONS
The major string functions proposed are discussed next. In general
they fall into the categories of:
- String literals
- String constants
- String variables
- String reference
- Basic string manipulation
- String/character fetches/stores
- String insertions
- String deletions
- String replacements
- String rotations
- String comparisons
- String pattern matching
- String set operators
- String translation
- String encoding/decoding
Admittedly the taxonomy is arbitrary but it should, hopefully,
represent a good presentation framework and it should, at least, be
exhaustive. All reasonable string functions I can think of are either
represented within the primitives given or capable of being built from
these primitives.
6.0. A NOTE ON THE SYMBOLOGY USED HEREIN
In discussing the major string functions to follow, excerpts from the
code in "FSTRINGS.SCR" will be used. Section 3 (i.e., PARAMETERS/DATA
STACK CONVENTIONS) explains the symbology used here.
6.1. STRING LITERALS
As a way of introducing the string data type into a program, a string
literal capability must be provided.
Since a string may be any arbitrary sequence of characters, some way
of delimiting the string must be found which does not interfere with the
capability to define an arbitrary sequence of characters.
The string literal (interpretative) is introduced via:
$LIT "This is a string"
It's action is to read the input stream until a non-blank character
(i.e., a "delimiter") is encountered and to return the read string (until
a subsequent delimiter is encountered) as an abstract string reference
upon the stack. The delimiters are not included as part of the returned
string. However, embedded delimiters may be included in the string by
simply using two consecutive occurrences of the delimiter.
The definition of $LIT (as it occurs within the code) is:
\ : $LIT ( -- string ) \ "s-lit"
\ Two string literal primitives are provided - "$LIT" for
\ interpretation and "[$LIT]" for colon definitions - otherwise
\ their actions are the same.
\ The string in the input stream following $LIT (or [$LIT]) is
\ delimited by the first following non-blank character. The
\ string is defined by all characters immediately following the
\ delimiting character until a subsequent delimiter is
\ encountered in the input stream. The delimiter, itself, may
\ also be included in the string by using two successive
\ delimiters to represent a single delimiter. For example:
\ $LIT "The double quote character is """
Within a colon definition "[$LIT]" at run time is equivalent to $LIT
interpretatively. Thus the word "X", defined
as:
: X [$LIT] ' This is a string' ;
leaves the string " This is a string" as a reference upon TOS when
executed, and is equivalent to the interpretative sequence:
$LIT | This is a string|
The use of "$LIT" leaves the abstract string data type on TOS. This
data type is used in the remainder of the string primitives provided
herein.
The above use of [$LIT] within a compiled word cannot be used when run
time interactive input is needed. For this reason, two primitives are
provided which read typed keyboard characters directly into a string until
a <RETURN> character (i.e., ASCII decimal value 13) is encountered. The
two primitives are:
: $KEY ( string -- ) \ "s-key"
\ Reads keyboard characters into "string" until <RETURN>.
and
: $$KEY ( string -- ) \ "s-s-key"
\ Same as $KEY except input echoed back to console.
The string data type may be compiled directly into the current
dictionary via the use of the "$," word once it is placed upon the stack
via the $LIT word. Alternatively, the ",$" word may be used to read a
string from the Forth input stream and place it into the current
dictionary. Thus the two segments of code:
$LIT "This is a string" $,
and
,$ *This is a string*
are equivalent.
The source code description of the above discussed words follows:
: $, ( string -- ) \ "s-comma"
\ Compiles a string into the dictionary.
: ,$ ( -- ) \ "comma-s"
\ Compiles the following word string into the dictionary.
A point to keep in mind with $LIT is that only one string literal may
be defined concurrently. That is, $LIT returns a "standard" string
reference address, so if, for example, "$LIT 'hello' $LIT 'there'" is
typed in, then the two string references left on TOS both "point to" the
string "there". In general, this type of situation should not occur.
6.2. STRING DEFINITION
Once a string literal capability is defined, then string data types
may be defined. The following subsections discusses the two possible
cases of string constants and string variables.
6.3. STRING CONSTANTS
No provision is made for implementing string constants. The primary
reason is that since the string is implemented as an abstract data type,
then a string constant is no different in reference from a string
variable. Thus the only meaning of a string constant is that assignments
to string constants would be caught at compile (or run) time and the
corresponding compilation (or run) aborted. This would have the negative
effect of introducing additional complexity in the package with a possible
run time speed penalty as well. Since the author felt this direction to
be counter to his perceived "philosophy" for Forth, the capability was not
implemented. In a like fashion, no error checking has been provided
either. The user can, of course, modify this package as he/she desires to
introduce a constant and/or error checking capability if so desired.
In some Forth systems I've seen, it appears that the only difference
between a string constant and a string variable is that the former allows
initialization (i.e., preassigning a value) and the latter does not. In
such systems a reference to a string constant or a string variable *both*
leave identical stack data types. Thus in these systems there is nothing
(save the programmer's own good judgment) to prevent the run time
alteration of string constants. (Note that this is entirely different
from the use of standard Forth scalar constants and variables where the
former leave a value and the latter leave an address on the stack.)
Such usage of the terms "constant" and "variable" seem, to the author
at least, to be artificial, contrived, and misleading. That is, a string
constant, in such systems, is not a constant but, rather, an initialized
variable - and this is merely a minor syntactical distinction.
The implementation given here allows allocating a string variable in
either an initialized or a NULL state with equal facility and avoids the
use of string constants altogether, since (as previously mentioned) this
would invariably extract a performance penalty.
For example, if a string constant were provided, then the appropriate
definition might be something like:
$LIT "This is a string constant" $CON STR-CON
However this same capability is provided with an initialized variable
via:
CREATE STR-CON ,$ "This is a string constant"
I fail to see much difference between the two nor any advantage to the
use of a "constant".
6.4. STRING VARIABLES
String variables are "CREATE"d just as are regular Forth variables.
This is due to the implementation of the abstract string data type as a
pointer/address.
For example, to create the string variable SV1 with the initial value:
" Hello world!" - one could code:
$LIT "Hello world!" CREATE SV1 $,
or
CREATE SV1 ,$ "Hello world!"
or
CREATE SV1 12 $ALLOT $LIT "Hello world!" SV1 $$!
As shown, if a string variable is needed, but its initial contents are
not known, then the variable (plus the dictionary space required) can be
defined via the CREATE and $ALLOT words.
For example, to reserve a string variable of 97 characters, one could
code:
CREATE SV1 97 $ALLOT
or
97 CREATE SV1 $ALLOT
The use of $ALLOT not only reserves the necessary dictionary space but
initializes the string to NULL (i.e., zero length) as well. This is
important since a null string is different from an unintialized string (and
the string package provided here incorporates null strings). All strings
are initialized with this package.
The source code definition of $ALLOT is:
: $ALLOT ( number -- ) \ "s-allot"
\ Reserves "number" characters in the dictionary for a string
\ and sets the string to NULL.
Although initialization is provided, error checking is not. This
means that it is the responsibility of the Forth programmer to ensure that
the dynamic (i.e., run time) size of strings never exceed their initially
(implicitly or explicitly) allocated sizes. (Of course, error checking
can be added if so desired, at a performance cost.)
6.5. STRING REFERENCE
Once strings are defined as constants or immediately available as
literals, then they may be referenced. "Referencing", in this sense,
means an immediate operation with a string without altering its value.
Four basic reference operators are provided: 1) determining the length
of a string, 2) printing a string, 3) "importing" a string from another
package (e.g., I/O), and 4) "exporting" a string to another package.
String length is determined via:
: $LEN ( string -- length ) \ "s-length"
\ Returns the length of string.
Printing a string is accomplished via:
: $. ( string -- ) \ "s-dot"
\ Prints a string.
Their actions should be self-explanatory.
Since almost all Forth packages of which I am aware treat the data
portion of a string via contiguous memory locations and since it may be
necessary to "send/export" a string to some other vendor provided Forth
package (e.g., I/O) and/or to "receive/import" a string from some other
vendor provided package - an import/export ability is provided.
Note that exporting and importing strings are necessary (but hopefully
temporary) evils. That is, once the "string data type" has been
established within the framework of the Forth standard and once all
packages recognize (and act appropriately upon) this data type - then the
export/import artifact will no longer be needed.
The definitions of import and export are:
: $IMPORT ( addr length string -- ) \ "s-import"
\ Imports a string from addr, length.
and
: $EXPORT ( string addr -- ) \ "s-export"
\ Exports a string to addr.
Note that is is the user's responsibility when using $EXPORT to ensure
that enough contiguous memory is available/reserved at "addr" to hold the
data portion of the string.
6.6. BASIC STRING MANIPULATION
The use of the definition "basic string manipulation" (as was the
previous case of string reference) is somewhat arbitrary. What I am trying
to capture here are those essential actions necessary for *any* string
package.
Two basic string manipulation operators are provided; they are:
: $NULL ( string -- ) \ "s-null"
\ Forces a string to NULL.
and
: $$+ ( string1 string2 -- ) \ "s-s-plus"
\ Adds (concatenates) string1 onto string2.
Again, the actions should be self-explanatory.
6.7. STRING/CHARACTER FETCHES/STORES
Following the Forth fetch/store convention, a number of operators are
provided which perform fetches/stores across strings/characters. They are:
: $C! ( char string index -- ) \ "s-c-store"
\ Stores "char" in "string" at position "index".
: $C@ ( string index -- char ) \ "s-c-fetch"
\ Fetches "char" from position "index" in "string".
: $$! ( string1 string2 -- ) \ "s-s-store"
\ Stores string1 in string2. Original contents of string2 are
\ lost.
and
: $$@ ( string1 string2 index length -- ) \ "s-s-fetch"
\ "string2" is built/fetched using the "string1" substring
\ starting at position "index" for "length" characters.
The $C@ and $LEN primitives provide, essentially, the capability to
"examine" any string in terms of size/content, and the $C! primitive
provides the primary method of altering a string.
The $$@ operator is, in effect, the "substring" operator used in other
languages. When using $$@, the original contents of string2 are lost.
6.8. STRING INSERTIONS
Often it is necessary to dynamically "add" information into a string;
the string insertion operators provide this capability, and they are:
: $CINS ( char string index -- ) \ "s-c-ins"
\ Inserts "char" in "string" with "char" at position "index".
\ Remaining characters, if any, are moved right.
and
: $$INS ( string1 string2 index -- ) \ "s-s-ins"
\ Inserts "string1" into "string2" starting at position "index"
\ of "string2". Remaining characters, if any, are moved right.
When a character, or string, is inserted into another string, the
original character(s) - if any - beginning at the point of insertion are
right shifted past the inserted character(s).
Note that:
"string1 string2 DUP $LEN $$INS"
is equivalent to:
"string1 string2 $$+"
The latter usage is to be preferred since it is simpler to understand
and faster in execution.
6.9. STRING DELETIONS
The converse of string insertion (just presented) is string deletion.
One general purpose and several special purpose string deletion primitives
are provided.
The general purpose string deletion primitive is:
: $DEL ( string index number -- ) \ "s-del"
\ Deletes "number" characters from "string" starting at
\ position "index".
Although any of the special purpose string deletion primitives
presented next can be built using $DEL, $C@ and $LEN, they are provided
based on both the perceived frequency of use and in order to standardize
naming conventions. They are:
: $|TRIM ( string number -- ) \ "s-left-trim"
\ Deletes "number" characters from the left/start of "string".
: $TRIM| ( string number -- ) \ "s-trim-right"
\ Deletes "number" characters from the right/end of "string".
: $|SPACES ( string -- ) \ "s-left-spaces"
\ Trims leading spaces from string.
and
: $SPACES| ( string -- ) \ "s-spaces-right"
\ Trims trailing spaces from string.
6.10. STRING REPLACEMENTS
Having now considered string insertions and deletions, string
replacements are now addressed. A single, general purpose string
replacement primitive is provided; it is:
: $$REP ( string1 string2 index -- ) \ "s-s-rep"
\ Replaces current substring in "string2" with "string1"
\ starting at position "index" of "string2".
6.11. STRING ROTATIONS
The ability to circularly rotate strings left/right is also provided
via four primitives. The single shift string primitives are:
: $<ROT ( string -- ) \ "s-left-rote"
\ Rotates a string left one character.
and
: $>ROT ( string -- ) \ "s-right-rote"
\ Rotates a string right one character.
The multiple shift string primitives are:
: $<<ROT ( string number -- ) \ "s-many-left-rote"
\ Rotates "string" left "number" characters.
and
: $>>ROT ( string number -- ) \ "s-many-right-rote"
\ Rotates "string" right "number" characters.
Note that neither $<<ROT and $>>ROT are necessary since $<<ROT is
merely:
: $<<ROT 0 DO $<ROT LOOP ;
And, of course, $>>ROT can be similarly defined.
The rationale behind providing $<<ROT and $>>ROT was simply that of
execution time efficiency.
6.12. STRING COMPARISONS
The determination of string ordinality (i.e., the relative
lexicographic order of two strings) can be satisfied with a single, all
purpose, string comparator (i.e., "$$COMPARE") which compares two strings
and returns -1, 0, or +1 depending on whether the first string is less
than, equal to, or greater than the second string, respectively.
This implementation treats the null string as having the lowest
possible ordinality (and there is valid mathematical reason for doing
so). Further, the ordinality of a string is based upon the ASCII
collating sequence.
Although the "$$COMPARE" primitive is adequate to resolve all
questions concerning string ordinality, from a coding/programming
perspective, it is often more expedient to ask a specific question (such
as: Is string1 less than string2?). For this reason, and also from the
standpoint of standardizing string comparator terminology, a number of
additional, special purpose string comparators are introduced (e.g., $$=,
$$<, $$<=, etc.). As might be expected, these comparators are all simple
words invoking the $$COMPARE comparator.
The single general purpose string comparator is:
: $$COMPARE ( string1 string2 -- status ) \ "s-s-compare"
\ Returns -1, 0, or +1 depending on whether string1 is
\ lexicographically less than, equal to, or greater than
\ string2, respectively.
And the six special purpose string comparators are:
: $$= ( string1 string2 -- t | f ) \ "s-s-equal"
\ Returns -1 if string1 = string2, else returns 0.
: $$< ( string1 string2 -- t | f ) \ "s-s-less-than"
\ Returns -1 if string1 < string2, else returns 0.
: $$<= ( string1 string2 -- t | f ) \ "s-s-less-than-or-equal"
\ Returns -1 if string1 <= string2, else returns 0.
: $$> ( string1 string2 -- t | f ) \ "s-s-greater-than"
\ Returns -1 if string1 > string2, else returns 0.
: $$>= ( string1 string2 -- t | f) \ "s-s-greater-than-or-equal"
\ Returns -1 if string1 >= string2, else returns 0.
and
: $$<> ( string1 string2 -- t | f ) \ "s-s-not-equal"
\ Returns -1 if string1 <> string2, else returns 0.
6.13. STRING PATTERN MATCHING
The ability to locate arbitrary characters and substrings within a
string is provided. The two primitives for locating an arbitrary character
within a string are:
: $CFIND ( char string -- index | -1 ) \ "s-c-find"
\ Searches for leftmost occurrence of "char" in "string".
\ Returns "index" if found, else returns -1.
and
: $CFIND< ( char string -- index | -1 ) \ "s-c-find-back"
\ Searches for rightmost occurrence of "char" in "string".
\ Returns "index" if found, else returns -1.
The primitive which searches for a substring (to include the searched
string itself) within a string is:
: $$FIND ( string1 string2 index length -- index | -1 )
\ "s-s-find"
\ Searches for the first occurrence of the string1 substring
\ starting at "index" for "length" characters in string2.
\ Returns index if found, else returns -1.
Note that "index" and "length" identify the substring within "string2"
to be used for matching purposes.
6.14. STRING SET OPERATIONS
Two set theoretic string primitives are provided. In both cases a
character is considered a set element and a string is considered a set.
The two set theoretic string primitives are:
For single element set membership:
: $CMEM ( char string -- t | f ) \ "s-c-mem"
\ Returns -1 if "char" is in "string", else returns 0.
Note that the $CMEM primitive is sufficient (along with other primitives
introduced here such as $NULL, $C@, $$+, etc.) to build any necessary
higher order set theoretic string operators.
For multiple element set membership:
: $$VER ( string1 string2 -- index | -1 ) \ "s-s-ver"
\ Verifies that string2 contains only those characters in
\ string1 by returning a -1. Otherwise the index of the first
\ character in string2 not contained in string1 is returned.
The $$VER (verify) operator is borrowed from PL/I and it is more
powerful, and practical, that might at first be expected.
As an example if we define:
CREATE NUMBERS ,$ "0123456789"
Then any string can be tested to see if it contains only numbers with
the word:
\ Returns -1 if "string" contains only the ASCII characters 0..9,
\ else returns 0.
: NUMBERS? ( string -- flag )
NUMBERS SWAP $$VER 0< ;
6.15. STRING TRANSLATION
Two straightforward string translation primitives are provided based,
primarily, on frequency of use. They translate lower-case characters in a
string to upper-case, and vice-versa. They are:
: $>UPPER ( string -- ) \ "s-to-upper"
\ Converts any lower-case characters in "string" to upper-case.
and
: $>LOWER ( string -- ) \ "s-to-lower"
\ Converts any upper-case characters in "string" to lower-case.
6.16. STRING ENCODING/DECODING
String encoding (i.e., translating a number to its ASCII string
representation) and string decoding (i.e., translating an ASCII string
number representation to a number) are handled by three primitives which
are modeled after the currently defined FORTH-83 word set which performs
double number conversion (i.e., "<#", "#", "#>", etc.).
String encoding, or decoding, is initiated via the use of the
"$CONVERT" word. Its definition is:
: $CONVERT ( string index -- ) \ "s-convert"
\ Defines a string and index within the string to be used for
\ subsequent string-to-number or number-to-string conversions.
An initial call to $CONVERT, in effect, defines a specific string and
a specific place/index within that string to be used for subsequent
encoding or decoding.
String encoding is accomplished via successive calls to the word
"$<N". Its definition is:
: $<N ( number -- ) \ "s-from-n"
\ Converts "number" to the appropriate ASCII representation
\ character and inserts it into the string and index defined
\ by the last call to $CONVERT. Note that successive calls to
\ $<N "builds" the number string in reverse sequence.
\ For example, the code:
\
\ CREATE SNUMBER 10 $ALLOT
\ SNUMBER 0 $CONVERT 1 $<N 2 $<N 3 $<N
\
\ would build the image "321" in string SNUMBER.
Note that no "editing" is done on the number passed to $<N. It is
assumed that the user has already written some sort of "single digit from
number" extraction routine (e.g., the standard "divide number by current
base" passing remainder and saving quotient) and will "feed" this number
sequence to $<N. Any alphabetic characters added to a string by $<N are
always upper-case.
Finally, string decoding is accomplished via successive calls to the
word "$>N". Its definition is:
\ : $>N ( -- number | -1 ) \ "s-to-n"
\ Converts the character at the string/index position defined
\ by the last call to $CONVERT to a number if possible. An
\ error (i.e., -1) is returned if the character is non-numeric
\ (i.e., does not lie between 0..BASE-1). The index position
\ established in the call to $CONVERT is always incremented
\ after a call to $>N.
The fact that $>N always increments the index position is important
since this allows $>N to be used to "search" a string for the first valid
numeric character (and then proceed with the decode operations).
The primitives given here are quite simple (more so than the "<#",
"#>", and associated word set) and it is assumed that the user will tailor
the "higher levels" of string encoding/decoding to his/her taste.
7. AREAS NOT COVERED
Since this paper tries to concentrate exclusively on Forth strings, a
number of related topics were purposely excluded. In all cases the primary
reason for exclusion was that the topic area would necessarily introduce
other major considerations which would lay well beyond the area of strings
and thereby cause a considerable loss of focus.
7.1. INPUT/OUTPUT
Input/output, per se, has not been addressed in this file. The
reasoning is straightforward. Forth, itself, provides minimal input/output
facilities. The Forth machine (upon which the Forth language itself is
based) is a virtual machine with an on-line console (for interactive
input/output) and a block file system for everything else. This obviously
falls far short of an input/output system in the traditional sense.
Correspondingly, the string functions given will work with the console
and the block file (i.e., the Forth input stream). No attempt has been
made to extend this concept. The reasoning is simply that input/output
considerations for Forth are a separate issue, of which, string structures
are a subcomponent (e.g., any binary read capability could easily input
Forth strings - along with any other internal Forth data type for a given
system).
In truth, as long as string primitives are provided to read string
literals from a standard Forth block file (as they are in this
implementation) - then this, in itself, handles the issue of portability.
So, in essence, there is no need to augment string primitives beyond this
point (at least from the perspective of portability).
Note: this is not to say that many Forth systems do not have excellent
input/output facilities (well beyond the FORTH-83 standard) - it is rather
to say that this is an independent subject and more properly treated as
input/output instead of strings.
7.2. STRING STACK
The idea and use of a string stack will also be considered outside the
bounds of this discussion. This, in no way, indicates that I feel the
idea of a string stack is bad (I think it is an excellent idea). It is
rather because it introduces additional issues and the line has to be
drawn somewhere. For example, if a string stack were introduced then
string operations could proceed independently of data stack operations
with no confusion between the two. This is excellent except that it
completely changes the semantics/effect of the same code when string and
data stack operations are intermixed.
For example, if "SR1" is a string reference, then the code:
"3 SR1 7 + $."
Could be expected to produce entirely different results depending on
whether or not a string stack were implemented.
Of course this same logic applies if a floating stack (etc.) is
introduced. The point, however, is that this is really more of a stack
(and semantics) issue than one of a string manipulation issue. I have long
argued for multiple stacks in Forth (and even *God forbid* for a type
stack), but the arguments have fallen upon deaf ears. I'm getting very
tired of arguing at this point.
7.3. PARSING STRINGS
Most C language implementations (for example) provide a healthy
serving of string parsing functions. This is nice except that often
underlying assumptions must be made (e.g., what characters constitute a
"whitespace") which may not apply in all/most situations - and, of course,
if you don't need it - why have it? I can think of no parsing functions
which cannot be built upon the primitives provided here if needed - so
I've left them out.
Additionally, since the string package provided herein is primarily
targeted toward writing application code (rather than, for example,
parsing Forth code) - no provisions have been made for parsing functions
as they relate to the Forth language itself.
8. THE FORTH SOURCE FILE
The file "FSTRINGS.SCR" contains all of the Forth string primitives
discussed in this document.
9. THE VALIDATION FILE
The file "FSTEST.SCR" contains a set of validation tests to ensure
that the string primitives provided in FSTRINGS.SCR function properly.
Although not exhaustive, the validation tests should give reasonable
confidence that this package will work on your particular Forth system.
(The validation package will also attempt to isolate malfunctions down to
a specific primitive if possible.)
10. FORTH-83 DEVIATIONS
As far as I am able to determine FSTRINGS.SCR is all kosher FORTH-83
Standard. I use words from the double number extension word set (i.e.,
2DUP, 2DROP, 2SWAP, etc.). Also I use the BRANCH word from the system
extension word set in [$LIT].
I would suspect any reasonable FORTH-83 implementation would provide
these words (along with the "\" word which is also used).
I have tested the system (although not extensively) under both
MasterFORTH and Laxen & Perry's F83 - no problems.
I would be glad to assist anyone who encounters difficulty in getting
this system to run - but, remember, the whole idea is just to provide
working FORTH-83 code to demonstrate the functions themselves so that they
can then be rewritten consistent with the underlying Forth system and
extended to your taste.
11. INTERNALS
Some of the key points which may be of assistance in understanding the
code/logic in file FSTRINGS.SCR follows:
11.1. INTERNAL STRING STRUCTURE
Although nothing in the use of the primitives provided here makes any
assumptions about the underlying internal string structure used, this
information is required for the actual implementation itself.
The internal representation selected for strings in this
implementation was that of counted strings. The initial string *word*
contains the count (which may be 0 or null). Thus the limit on the
working size of a string word is 0..32,767 bytes due to the use of signed
number arithmetic.
The reserve scratch area "_BUFFER" has been allocated at 1024+2 bytes.
The only words which reference _BUFFER are $LIT, $<<ROT, and $>>ROT.
Other than this everything is done "core-to-core" via CMOVEs.
_BUFFER's initial allocation (and use) should be thought out carefully
when implementing this system since it represents a (potentially) large
permanent allocation of space. For example, it may be more expedient to
temporarily "grab" unused dictionary space for this purpose and avoid any
permanent loss of space. How one does this is dependent upon the
particular Forth system being used.
Another point to keep in mind is that only one _BUFFER is provided.
Thus one cannot have two string literals simultaneously defined. (Also,
for example, a call to $<<ROT would destroy the contents of a string
literal if it is also active.) Normally, these considerations would not
occur.
11.2. VISIBILITY
All routines meant to be "hidden" begin with an initial underscore "_"
character. Ideally these words should be headerless if your Forth system
supports this capability.
11.3. ERROR CHECKING
Simple. There isn't any!! (Well, $<<ROT and $>>ROT will translate
the "number" to the modulus of the length of the string given, but that's
hardly editing!)
11.4. MEMORY OPERATORS
The string primitives provided herein do most of their real "work"
through the underlying "memory" primitives (i.e, CMOVE, CMOVE>, COMPARE,
CFIND, CFIND<, and -optionally- SAME?).
For this reason alone it would be advisable to code these memory
operators in assembler. (Still another reason is that these memory
operators should also be useful independent of this string package.)
The "interface" operators (i.e., _CF+, _LEN, and _$>AL) provide the
connection/interface between the high-level string operators and the
low-level memory operators. They also make it easier to modify this
package for stack widths of other than 16 bits and/or allow using other
internal representations than the one selected here. Since the use of the
interface operators is pervasive as well, it is recommended that they be
coded in assembler also (in addition they are very short words).
So, in general, the basic modus operandi of these string operators is
to call upon the interface operators; to perform the appropriate "stack
juggling"; and to call upon the appropriate memory operators.
In the file FSTRINGS.SCR, block 1 is the load block for a full
FORTH-83 implementation. Block 2 is the load block for a MasterFORTH
system (i.e., the words: COMPARE, CFIND, CFIND<, SAME?, _CF+, _LEN, and
_$>AL) are written in assembler for a MasterFORTH system (80XX chip).
Even if you don't have a MasterFORTH system (in fact does *anyone*
besides me!?), I would strongly recommend you examine the assembler
implementation of these words and rewrite them (in assembler) for your own
particular Forth system (they really make a rather dramatic difference in
performance).
The words are documented and the underlying algorithmic approach
should be valid regardless of the system you're running. As an aid the
understanding the MasterFORTH implementation for the 80XX series chip (so
that you may convert these words):
- The MasterFORTH assembler is based upon the Laxen/Perry F83 public
domain model with the following major differences:
- The mode #) is simply );
- The mode S#) has disappeared;
- Local labels are available for unconditional jumps;
- REPZ and REPNZ are REPE and REPNE instead;
- JZ, JNZ, JC, and JNC are added;
- The string instructions use BYTE and WORD rather than AL and
AX.
- The MasterFORTH implementation uses the following machine
resources:
- The four segment register must be maintained at the same value
(i.e., CS = DS = SS = ES);
- The following three registers are used internally by
MasterFORTH and must be restored prior to returning to Forth:
- SI -- the Forth instruction pointer (IP)
- SP -- the Forth parameter stack pointer
- BP -- the Forth return stack pointer.
The definitions of the memory operators are:
: COMPARE ( a1 a2 n -- status )
\ Compares a1-a2, (a1+1)-(a2+1), ... (a1+n-1)-(a2+n-1)
\ as required returning -1, 0, or +1 depending on lexicographic
\ order. If n=0, then 0 (i.e., =) is returned.
: SAME? ( a1 a2 n -- t | f )
\ Compares a1-a2, (a1+1)-(a2+1), ... (a1+n-1)-(a2+n-1)
\ as required returning -1 if all "n" bytes match, else returns
\ 0. Note that this operator is not strictly required since one
\ may define SAME? as
\ : SAME? COMPARE 0= ;
\ It is included primarily for reasons of speed.
\ Memory character/byte FIND GTH 08/22/87
: CFIND ( c a1 n -- a2 | 0 ) \ "c-find"
\ Searches a1, a1+1, ..., a1+n-1 for c returning first
\ address of match or 0 if no match. NOTE: n=0 returns 0.
: CFIND< ( c a1 n -- a2 | 0 ) \ "c-find-back"
\ Same as CFIND except search is high-to-low memory.
\ NOTE: Memory search begins at address: a1+n-1.
The words "CFIND" and "CFIND<" were selected to match the current
FORTH-83 words "CMOVE" and "CMOVE>".
Also note that no attempt was made to code a general-purpose assembler
word which searches memory attempting to locate a substring. The primary
reason for this is that this operation, although frequently used, is far
more complex (when done efficiently) than is generally realized. The
reader is referred to the excellent article "Searching for Strings with
Boyer-Moore" by Richard Wiggins and Paul Walberg in the November, 1986
(Volume 3, Number 11) edition of Computer Language (pp. 28-42) just to see
how harry things can get!
11.5. THE ISSUE OF STANDARDS
The string operators provided herein will hopefully be of benefit to
some of you in string processing applications. The true test, however, of
the utility of any programming language extension is its proven utility
over time. There is thus no good measure at present of what string
operators, as provided in FSTRINGS, are needed, are defined well, or are
missing.
It is more reasonable to expect that FSTRINGS will serve as a catalyst
(or perhaps seed) to encourage members of the Forth community to
investigate the fundamental nature of and need for string operations in
Forth.
In addition to the issue of functionality addressed above, there are
as well the matters of stack/data conventions and naming conventions. All
of these issues need to be throughly investigated and an informal standard
developed and used for an extended period before any sort of string
standards proposal is warranted (at least in this author's opinion).
This has been a rather long-winded way of stating that I am in no way
proposing FSTRINGS as a standards extension.
The development of FSTRINGS did, however, provide reasonably good
assurance that some more fundamental, underlying operators (upon which any
contiguous memory string package could easily be built) could be good
candidates as standards extensions. These are just the "memory operators"
earlier mentioned. Thus I would like to propose that the words: COMPARE,
CFIND, CFIND<, and -optionally- SAME? be considered as standards
extensions to augment the CMOVE and CMOVE> words.
11.6. NULL STRINGS
This package allows the use of null strings. You may wish to ignore
them altogether, but I would caution against this. The null string
processing introduces very little overhead for the capability gained.
Null string handling is greatly simplified by the fact that all of the
underlying memory operators are defined to accept null (or zero) counts.
For example, in the null case:
CMOVE and CMOVE> do not move anything;
COMPARE returns 0 (i.e., "equal");
CFIND and CFIND< return 0 (i.e., "not found");
and -optionally-
SAME? returns -1 (i.e., "equal").
Thus the majority of the string primitives herein can be written
without concern for the special case of null (or zero length) strings.
For those primitives which must explicitly test for the null string
case, this action is always performed initially. If the null string case
applies, then the return stack is set appropriately and the primitive is
immediately EXITed. In all cases, this amounts to a single line of
(commented) code at the beginning of the word definition.
11.7. CONVERSION PROBLEMS
As mentioned, this system has been tested with both MasterFORTH and
F83. Jerry Shifrin was kind enough to allow me/us to test this package
under LMI's Forth as well.
The primary difficulty encountered was the underlying implementation
of the "BRANCH" word.
Both MasterFORTH and F83 use absolute addresses for BRANCH. LMI seems
to use a relative offset instead.
BRANCH is only used in the definition of [$LIT] as follows:
: [$LIT] ( -- string ) \ "bracket-s-lit"
\ Compiled string literal
COMPILE BRANCH HERE 0 , \ forward branch around string
HERE ,$ HERE ROT ! [COMPILE] LITERAL ; IMMEDIATE
It may be necessary to modify [$LIT] prior to the "!" in order to
adjust the "address/offset" which follows BRANCH in the compiled word as
necessary. It may also be necessary to perform "aligns" as well.
Alternatively, if your Forth system supports the ">MARK" and
">RESOLVE" words from the system extension word set, then it may be more
expedient to simply recode [$LIT] as:
: [$LIT] ( -- string ) \ "bracket-s-lit"
\ Compiled string literal
COMPILE BRANCH >MARK
HERE ,$ SWAP >RESOLVE [COMPILE] LITERAL ; IMMEDIATE
Finally, it may be necessary to modify the "_CF+" and "_$>AL" words to
use some value other than "2+" if your Forth system's word size is other
than 16 bits.
12. GLOSSARY
A short glossary of all string primitives follows:
: COMPARE ( a1 a2 n -- status )
: SAME? ( a1 a2 n -- t | f )
: CFIND ( c a1 n -- a2 | 0 ) \ "c-find"
: CFIND< ( c a1 n -- a2 | 0 ) \ "c-find-back"
: $LIT ( -- string ) \ "s-lit"
: $, ( string -- ) \ "s-comma"
: ,$ ( -- ) \ "comma-s"
: [$LIT] ( -- string ) \ "bracket-s-lit"
: $KEY ( string -- ) \ "s-key"
: $$KEY ( string -- ) \ "s-s-key"
: $ALLOT ( number -- ) \ "s-allot"
: $IMPORT ( addr length string -- ) \ "s-import"
: $EXPORT ( string addr -- ) \ "s-export"
: $LEN ( string -- length ) \ "s-length"
: $. ( string -- ) \ "s-dot"
: $NULL ( string -- ) \ "s-null"
: $$+ ( string1 string2 -- ) \ "s-s-plus"
: $C! ( char string index -- ) \ "s-c-store"
: $C@ ( string index -- char ) \ "s-c-fetch"
: $$! ( string1 string2 -- ) \ "s-s-store"
: $$@ ( string1 string2 index length --) \ "s-s-fetch"
: $CINS ( char string index -- ) \ "s-c-ins"
: $$INS ( string1 string2 index -- ) \ "s-s-ins"
: $|TRIM ( string number -- ) \ "s-left-trim"
: $TRIM| ( string number -- ) \ "s-trim-right"
: $|SPACES ( string -- ) \ "s-left-spaces"
: $SPACES| ( string -- ) \ "s-spaces-right"
: $DEL ( string index number -- ) \ "s-del"
: $$REP ( string1 string2 index -- ) \ "s-s-rep"
: $<ROT ( string -- ) \ "s-left-rote"
: $>ROT ( string -- ) \ "s-right-rote"
: $<<ROT ( string number -- ) \ "s-many-left-rote"
: $>>ROT ( string number -- ) \ "s-many-right-rote"
: $$COMPARE ( string1 string2 -- status ) \ "s-s-compare"
: $$= ( string1 string2 -- t | f ) \ "s-s-equal"
: $$< ( string1 string2 -- t | f ) \ "s-s-less-than"
: $$<= ( string1 string2 -- t | f ) \ "s-s-less-than-or-equal"
: $$> ( string1 string2 -- t | f ) \ "s-s-greater-than"
: $$>= ( string1 string2 -- t | f) \ "s-s-greater-than-or-equal"
: $$<> ( string1 string2 -- t | f ) \ "s-s-not-equal"
: $CFIND ( char string -- index | -1 ) \ "s-c-find"
: $CFIND< ( char string -- index | -1 ) \ "s-c-find-back"
: $$FIND ( string1 string2 index length -- index | -1 )
\ "s-s-find"
: $CMEM ( char string -- t | f ) \ "s-c-mem"
: $$VER ( string1 string2 -- index | -1 ) \ "s-s-ver"
: $>UPPER ( string -- ) \ "s-to-upper"
: $>LOWER ( string -- ) \ "s-to-lower"
: $CONVERT ( string index -- ) \ "s-convert"
: $<N ( number -- ) \ "s-from-n"
: $>N ( -- number | -1 ) \ "s-to-n"