[comp.unix.questions] Regular Expression delimiters

lwv@n8emr.UUCP (Larry W. Virden) (04/11/89)

Using sed as a simple example of a complex situation, how would one code
a shell program to invoke sed so that the arguments passed would always be
valid ?  That is, lets say that I wanted to write a script which would read
the arguments to sed in from the keyboard.  Also, assume that I was dealing
with users who wouldnt know sed from a hole in the ground ;-0.  I want
to use delimiters around the shell variable containing their input which
wont clash:

sed -e /$ans/ filename

doesnt work - what if there are multiple /'s in the user's answer.

Using ^, !, etc - basically a hard coded printable character - have the
same problems.  

Do folks just ignore this possibility, or is there a standard trick to this?

-- 
Larry W. Virden	 674 Falls Place, Reynoldsburg, OH 43068 (614) 864-8817
75046,606 (CIS) ; LVirden (ALPE) ; osu-cis!n8emr!lwv (UUCP) 
osu-cis!n8emr!lwv@TUT.CIS.OHIO-STATE.EDU (INTERNET)
The world's not inherited from our parents, but borrowed from our children.

mchinni@pica.army.mil (Michael J. Chinni, SMCAR-CCS-E) (04/11/89)

Larry,

	I don't know if there is a "standard" trick for this, but I had to
write a shell script where this was going to be a problem. I got around this by
using "tr" to translate all the possible characters that could mess up sed into
characters that are ok to sed.

/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/
			    Michael J. Chinni
	US Army Armament Research, Development, and Engineering Center
 User to skeleton sitting at cobweb    () Picatinny Arsenal, New Jersey  
   and dust covered terminal and desk  () ARPA: mchinni@pica.army.mil
    "System been down long?"           () UUCP: ...!uunet!pica.army.mil!mchinni
/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/

dhesi@bsu-cs.bsu.edu (Rahul Dhesi) (04/12/89)

In article <993@n8emr.UUCP> lwv@n8emr.UUCP (Larry W. Virden) writes:
>
>Using sed as a simple example of a complex situation, how would one code
>a shell program to invoke sed so that the arguments passed would always be
>valid?...
>sed -e /$ans/ filename


First quote those slashes:

   # works from /bin/sh
   pattern="`echo "$ans" | sed -e 's/\//\\\\\\//g'`"
   sed -e "/$pattern/d" filename

Count those backslashes carefully.

What UNIX badly needs is a way of specifying out-of-band characters.
The current quoting scheme causes problems because a level of quoting
is removed at each level of interpretation.
-- 
Rahul Dhesi <dhesi@bsu-cs.bsu.edu>
UUCP:    ...!{iuvax,pur-ee}!bsu-cs!dhesi

tale@pawl.rpi.edu (David C Lawrence) (04/12/89)

In article <993@n8emr.UUCP> lwv@n8emr.UUCP (Larry W. Virden) writes:
Larry> Using sed as a simple example of a complex situation, how would
Larry> one code a shell program to invoke sed so that the arguments
Larry> passed would always be valid?...
Larry> sed -e /$ans/ filename

In article <6710@bsu-cs.bsu.edu> dhesi@bsu-cs.bsu.edu (Rahul Dhesi) writes:
Rahul> First quote those slashes:
Rahul>    # works from /bin/sh
Rahul>    pattern="`echo "$ans" | sed -e 's/\//\\\\\\//g'`"
Rahul>    sed -e "/$pattern/d" filename
Rahul> Count those backslashes carefully.

pattern="`echo $ans | sed 'sX/X\\\/Xg'`"

a) quoting $ans is only necessary if "-n" could lead it.  If you want
to allow for that you need to quote it as "`echo \"$ans\" ...`"

b) it is generally better to not muddle sed script lines with
characters that are appearing inside delimited strings; chosing such a
character (as "X", above) means you don't need to add additional
quoting levels inside the strings.  The above example should make
$pattern parseable by sed using / as a delimiter.
--
      tale@rpitsmts.bitnet, tale%mts@itsgw.rpi.edu, tale@pawl.rpi.edu

chris@mimsy.UUCP (Chris Torek) (04/12/89)

>In article <6710@bsu-cs.bsu.edu> dhesi@bsu-cs.bsu.edu (Rahul Dhesi)
>suggests something like
>>    pattern="`echo "$ans" | sed -e 's/\//\\\\\\//g'`"

In article <TALE.89Apr11192859@imagine.pawl.rpi.edu> tale@pawl.rpi.edu
(David C Lawrence) writes:
>pattern="`echo $ans | sed 'sX/X\\\/Xg'`"

Actually, sed 'sX/X\\/Xg' suffices (two backslashes, not three).
(I also prefer a comma as the separator: sed 's,/,\\/,g'.)

>a) quoting $ans is only necessary if "-n" could lead it.  If you want
>to allow for that you need to quote it as "`echo \"$ans\" ...`"

This makes little difference.  Quoting $ans is necessary if and only
if it contains special characters (space, tab, newline, and globbing).
If the string is "-n", echo will still break; worse, if it contains
backslashes, it will break on SysV, where echo does escape interpretation
(echo should never have acquired a -n flag either; that job should
have been left to printf(1).)

It is true that if "$ans" is "-n foo", then

	echo $ans

produces

	foo

without a newline, while

	echo "$ans"

produces

	-n foo

(with a newline).  But

	echo "-n"

produces nothing.  There is no workaround for this.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

bernsten@phoenix.Princeton.EDU (Dan Bernstein) (04/13/89)

So far, nobody has given an answer to this question that handles more
than a few special cases. Now Chris Torek gives up completely.

Here are some solutions, all thoroughly tested, from the devious mind
that brought you the aliases `quote' and `makealias'. Each example
munges $ans into $pattern, so that sed "s/$pattern/whatever/g" acts
as if it had a literal $ans in the first position.

sh, if you have printenv:
  export ans ; pattern="`printenv ans | sed 's-\([\.\*\[\\\^\$\/]\)-\\\\\1-g'`"

sh, if you have a working echo (whose only caveat is -n):
  pattern="`(echo -n \"$ans\";echo '') | sed 's-\([\.\*\[\\\^\$\/]\)-\\\\\1-g'`"
  NOTE: Chris, want to take back that ``no workaround''?

csh, if you have a working /bin/echo (whose only caveat is -n):
  set pattern="`(echo -n "\"\$ans\"";echo '')
	          | sed 's-\([\.\*\[\\\^\"\$"\/]\)-\\\1-g'`"

csh, if you have a working builtin echo (whose only caveat is -n):
  set pattern="`echo "\"\$ans\"" | sed 's-\([\.\*\[\\\^\"\$"\/]\)-\\\1-g'`"
  NOTE: csh parses builtins strangely, so this works even if ans is "-n ...".

sh, on any machine (put it all on one line):
  pattern="`sed \"$ans\" 2>&1 | sed 's/^Unrecognized command: //'
             | sed 's-\([\.\*\[\\\^\$\/]\)-\\\\\1-g'`"
  CAVEAT: Does not work if $ans contains newlines.

csh, on any machine (put it all on one line):
  set pattern="`sed "\"\$ans\"" |& sed 's/^Unrecognized command: //'
	     | sed 's-\([\.\*\[\\\^\$\/]\)-\\\1-g'`"
  ANTI-CAVEAT: Because csh is csh, this one works if $ans contains newlines.

Some notes about the last two: The sequence ^A (appearing twice in
each---but not the ^U) can be any (identical) string upon which sed
will choke; I use the control character. The general idea of

  sed "^A$ans" 2>&1 | sed 's/^Unrecognized command: ^A//'

is to somehow manage to get that environment variable into the
input-output stream, which is difficult if both echo and printenv
are screwed. Other similar replacements include using ls imaginatively
and then stripping off the `file not found', etc.

It wouldn't take much work to make a `literal' alias, by feeding the
above ideas through `makealias', so that all you'd have to do for this
problem is type

  sed "s/`literal ans`/whatever/g"

Those who have seen my csh aliases `quote' and `makealias' know both
that I have a masochistic enjoyment of these problems and that my
solutions work. So unless I've screwed up, let's cut the discussion.

---Dan Bernstein, bernsten@phoenix.princeton.edu

bernsten@phoenix.Princeton.EDU (Dan Bernstein) (04/14/89)

In article <7702@phoenix.Princeton.EDU> I write:
> It wouldn't take much work to make a `literal' alias,

so I'll now put my money where my mouth is.

  alias literal 'sed '\''s-\([\.\*\[\\\^\$\/]\)-\\\1-g'\'
  alias show 'echo "$\!:1"'

The acid test, of course, is

  show ans | sed "s/`show ans | literal`/xxx/g"

which, no matter what $ans is, will output xxx.

If your csh doesn't have a working builtin echo, you'll have to rewrite
show along the lines of my previous article.

---Dan Bernstein, bernsten@phoenix.princeton.edu