kemp@noscvax.UUCP (Stephen P. Kemp) (09/28/84)
The following appeared in the Naval Ocean Systems Center
newsletter COMPUTING HIGHLIGHTS. I thought it would be
useful to USENETers. If you have comments, please mail
them to ME and NOT to Mike.
--------------------------- * ------------------------------
by Mike Bloomberg
*** Regular expressions in Unix ***
Theme and Variations
A regular expression can be considered to be a string of
characters with certain characters having special meanings
as defined below. These special characters enable patterns
to be defined in an efficient and general manner.
The grep family (consisting of grep, egrep for expres-
sions, and fgrep for fixed strings), as well as awk, ed and
sed make heavy use of regular expressions. Greps search
for the defined regular expression within the input text
file reporting any occurrences found.
Awk is a pattern processing language and is a generaliza-
tion of grep. Awk uses regular expressions, which are
enclosed in slashes (), to locate the lines in the text file
to perform actions upon.
Ed and sed are line editors. Ed interacts with the
user while sed (stream editor) works in a "non-
interactive" mode. Both processors use regular expressions
as context addresses. A context address is the abso-
lute position determined by the next location of the
character string that the regular expression matches
within the text file.
These special characters are:
-----------
. (period)
matches any single character (wildcard character)
EXAMPLE: the expression un.x will correspond to where
the third character can be ANY character. So, unax,
unbx, .... unzx, un3x, un&x, un;x etc. will all match.
-----------
[] any one of the characters or range of characters within
these square brackets will match.
EXAMPLE: un[a3z]x will match unax or un3x or unzx only.
EXAMPLE: un[a-z]x will match unax, unbx ... unzx.
However, un3x will NOT match.
NOTE: placing a ^ (caret) preceding a group of
characters will match the COMPLEMENT of those letters.
EXAMPLE: un[^abcde] will match unfx ... unzx un1x ...
un9x un!x un$x etc.
-----------
() Used for grouping of an expression. The expression
enclosed in the parentheses can be operated by such
operators as the * or + operators.
EXAMPLE: (unix)+ would match unix or unixunix or
unixunixunix etc.
NB: Only for egrep,awk,ed,sed
-----------
| an "or" conditional. Matches EITHER of the expressions
to the left or right of the | (vertical bar) symbol.
EXAMPLE: unix|grep will find the line that has either
(or both) unix OR grep in it.
NB: Only for egrep,awk,ed,sed
-----------
^ (caret)
placed at the beginning of the expression, means to
match the expression ONLY if it is at the beginning of
the line (column 1). Otherwise, the ^ is taken as a
literal.
EXAMPLE: ^unix will match only if the line begins with
the word unix
-----------
$ (dollar sign)
placed at the end of the expression means to match the
expression ONLY if it is at the end of the line.
Otherwise, the $ is taken as a literal.
EXAMPLE: unix$ will match only if the line ends with the
word unix.
-----------
\ disables the special characters.
EXAMPLE: \. would look for a period in the text. \\
would look for a backslash in the text.
============================
= The following symbols =
= apply to the character =
= immediately PRECEDING =
= the symbol. =
============================
* (asterisk)
Matches on any number (including zero) of occurrences of
the character immediately preceding.
EXAMPLE: un*x would match ux, unx, unnx, unnnx etc.
-----------
+ Similar to * but matches one or more occurrences.
EXAMPLE: un+x would match unx, unnx, unnnx, unnnnx etc.
-----------
\< matches expression that follows anything but a letter,
digit or underscore. Normally used to find expressions
at the front of the word.
EXAMPLE: \<abc will find all words beginning with the
letters abc
NB: Only for grep,fgrep,egrep
-----------
\> matches expression that precedes anything but a letter,
digit or underscore. Normally used to find expressions
at the end of the word.
EXAMPLE: abc\> will find all words ending with the
letters abc
NB: Only for grep,egrep,fgrep
-----------
\( and \)
Enclosing an expression with a "\(" on the left and a
"\)" on the right makes it referable later in the
expression by the syntax "\n". "n" is the numeric order
of the enclosed expression.
EXAMPLE: \(abc\)def\1 will match the string abcdefabc
EXAMPLE: \(abc\)\(def\)ghi\2\1 would match
abcdefghidefabc
NB: Only for ed,sed
======================
For more more documentation about regular expressions, see
the "Unix Programmer's Manual, Seventh Edition, November
1980, Computer Science Division, Univ. of California at
Berkeley" which contains hardcopy version of "man"
description of processors.
Articles of Interest:
1. Tutorial Introduction to the Unix Text Editor (Ed)
2. Advanced Editing on Unix (Ed)
3. SED - non-interactive text editor.
4. AWK - A pattern scanning and processing language
--------------------------- * ------------------------------
Steve Kemp {ihnp4, decvax, akgua, dcdwest, ucbvax}!sdcsvax!noscvax!kemp
Computer Sciences Corp. kemp@nosc
Naval Ocean Systems Center
San Diego, CA