[comp.unix.questions] sed behaves differently when run in backquotes/subshell

stever@tree.UUCP (Steve Rudek) (08/07/89)

I needed a shell script which would take strings of letters and alphabetize
them into a single line and sed + sort seemed like the best choice.  But I find 
that sed performs differently when part of a pipeline not explicitly in a
subshell than it does when run in backquotes.  What's the problem?

Given a file ".TST" containing:
e
s
m
l
o09
ux123
35
y8
anrt467
iwhp5z
h

and the following script:
DATA_DIR=.TST
#----------------------------#
echo "exploding and alphabetizing without subshell works fine"
cat .TST | sed 's/\(.\)/\1\
/g'|sort|paste -s -d"\0" -
#----------------------------#
echo "can't explode the strings when the same pipeline is run in backquotes"
GUESSES=`cat .TST | sed 's/\(.\)/\1\
/g'|sort|paste -s -d"\0" -`
echo "GUESSES==$GUESSES"

I get the output:
exploding and alphabetizing without subshell works fine
012334556789aehhilmnoprstuwxyz
can't explode the strings when the same pipeline is run in backquotes
GUESSES==35anrt467ehiwhp5zlmo09sux123y8

What is happening here is that strings such as "abc" are being properly
split into
a
b
c
in the first case while they pass through unchanged in the second case.
-- 
----------
Steve Rudek  {ucbvax!ucdavis!csusac OR ames!pacbell!sactoh0} !tree!stever

maart@cs.vu.nl (Maarten Litmaath) (08/08/89)

stever@tree.UUCP (Steve Rudek) writes:
\...
\GUESSES=`cat .TST | sed 's/\(.\)/\1\
\/g'|sort|paste -s -d"\0" -`

Inside backquotes escaped newlines are removed...
Now sed `sees' the following argument:

	s/\(.\)/\1/g

To protect the escaped newline:

	... | sed 's/\(.\)/\1\\

That's right: ONE extra backslash suffices - newlines are left undisturbed
inside single quotes.
In general, to debug pipe lines try something like:

	GUESSES=`cat .TST | echo sed 's/\(.\)/\1\
	/g' > /dev/tty | sort | ...`

i.e. just insert "echo" and "> /dev/tty" to see what the funny stuff expands
to.
-- 
"Mom! Eric Newton broke the day! In 24   |Maarten Litmaath @ VU Amsterdam:
  parts!" (Mike Schmitt in misc.misc)    |maart@cs.vu.nl, mcvax!botter!maart

cpcahil@virtech.UUCP (Conor P. Cahill) (08/08/89)

In article <350@tree.UUCP>, stever@tree.UUCP (Steve Rudek) writes:
> I needed a shell script which would take strings of letters and alphabetize
> them into a single line and sed + sort seemed like the best choice.  But I find 
> that sed performs differently when part of a pipeline not explicitly in a
> subshell than it does when run in backquotes.  What's the problem?

  It is not set which mis-behaves, but the shell.  When you run in the `sub-shell`
  the "^j" is eaten by the shell command line processing.  A different mechanism
  would be to use the tilde instead of the <return> and then translate the 
  tilde to a <return>.  For ex:

	GUESSES=`cat .TST | sed 's/\(.\)/\1~/g' | tr "~" "\012" | sort....