[gnu.emacs] Matching multiple lines with regex

jkh@pcsbst.UUCP (jkh) (03/12/89)

[ something *is* actually eating this line these days.. ]


I'm using the regex code out of 18.52 and would like to construct a
regular expression that will match *all* of the characters in a string
of text containing newlines. I tried ".*" (the most obvious), but it
just matches up to the first newline. I haven't got newlines being treated
specially in any way (I.E. as "or"s) so I can't figure out why this is
happening.

Any suggestions?

					Jordan Hubbard
					PCS Computer Systems
					pyramid!pcsbst!jkh

pinkas@hobbit.intel.com (Israel Pinkas ~) (03/16/89)

In article <768@pcsbst.UUCP> jkh@pcsbst.UUCP (jkh) writes:

> I'm using the regex code out of 18.52 and would like to construct a
> regular expression that will match *all* of the characters in a string
> of text containing newlines. I tried ".*" (the most obvious), but it
> just matches up to the first newline. I haven't got newlines being treated
> specially in any way (I.E. as "or"s) so I can't figure out why this is
> happening.

The regular expression for . is defined to match any single character
except a newline.  See the GNU Emacs manual, section 13.5, "Syntax of
Regular Expressions."

Also see the man page for grep, vi, sed, or any other Unix program that
uses regular expressions.

-Israel Pinkas

--------------------------------------
Disclaimer: The above are my personal opinions, and in no way represent
the opinions of Intel Corporation.  In no way should the above be taken
to be a statement of Intel.

UUCP:	{amdcad,decwrl,hplabs,oliveb,pur-ee,qantel}!intelca!mipos3!cadev4!pinkas
ARPA:	pinkas%cadev4.intel.com@relay.cs.net
CSNET:	pinkas@cadev4.intel.com
--
--------------------------------------
Disclaimer: The above are my personal opinions, and in no way represent
the opinions of Intel Corporation.  In no way should the above be taken
to be a statement of Intel.

UUCP:	{amdcad,decwrl,hplabs,oliveb,pur-ee,qantel}!intelca!mipos3!cadev4!pinkas
ARPA:	pinkas%cadev4.intel.com@relay.cs.net
CSNET:	pinkas@cadev4.intel.com

gaynor@athos.rutgers.edu (Silver) (03/16/89)

"[^]" fails with an error (it would be nice if this were fixed even if just
this purpose, and "[]" for completeness), but "[\0-\255]" did the trick.  The
same could have been performed at extra cost with "\(\n\|.\)".

Regards, [Ag] gaynor@rutgers.edu

piet@ruuinf (Piet van Oostrum) (03/16/89)

In article <768@pcsbst.UUCP>, jkh@pcsbst (jkh) writes:
 `
 `I'm using the regex code out of 18.52 and would like to construct a
 `regular expression that will match *all* of the characters in a string
 `of text containing newlines. I tried ".*" (the most obvious), but it
 `just matches up to the first newline.

The definition of '.' in r.e's is any character except newline:

`. (Period)'     
     is a special character that matches any single character except a
     newline.  Using concatenation, we can make regular expressions like
     `a.b' which matches any three-character string which begins with `a'
     and ends with `b'.

So use (in Lisp syntax):

	"\\(.\\|\n\\)*"
-- 
Piet van Oostrum, Dept of Computer Science, University of Utrecht
Padualaan 14, P.O. Box 80.089, 3508 TB Utrecht, The Netherlands
Telephone: +31-30-531806. piet@cs.ruu.nl (mcvax!hp4nl!ruuinf!piet)

worley@EDDIE.MIT.EDU (Dale Worley) (03/16/89)

   I'm using the regex code out of 18.52 and would like to construct a
   regular expression that will match *all* of the characters in a string
   of text containing newlines. I tried ".*" (the most obvious), but it
   just matches up to the first newline. I haven't got newlines being treated
   specially in any way (I.E. as "or"s) so I can't figure out why this is
   happening.

The reason it is happening is because '.' is defined to not match
newlines.  (See Info node "Regexps", or search for 'anychar' (the
internal code for the '.'  regexp) in regex.c and examine the code in
those areas.)  If you want to match newline also, you will have to say
'\(.\|^J\)', or in C, "\\(.\\|\n\\)".

Dale

merlyn@intelob.intel.com (Randal L. Schwartz @ Stonehenge) (03/17/89)

In article <Mar.15.20.52.02.1989.11741@athos.rutgers.edu>, gaynor@athos (Silver) writes:
| "[^]" fails with an error (it would be nice if this were fixed even if just
| this purpose, and "[]" for completeness), but "[\0-\255]" did the trick.  The
| same could have been performed at extra cost with "\(\n\|.\)".

Ahh, but "[^]]" is a valid reg-ex, and matches any single character
*but* the right bracket.  Similarly, "[]]" matches *just* the right
bracket.  So, there is *nothing* to fix.  True, there is no trivial
way to say "everything" and "nothing", but so what.

Now, how to match just a "["? :-)
-- 
Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095
on contract to BiiN (for now :-), Hillsboro, Oregon, USA.
ARPA: <@intel-iwarp.arpa:merlyn@intelob> (fastest!)
MX-Internet: <merlyn@intelob.intel.com> UUCP: ...[!uunet]!tektronix!biin!merlyn
Standard disclaimer: I *am* my employer!
Cute quote: "Welcome to Oregon... home of the California Raisins!"