[comp.unix.shell] protecting whitespace from the Bourne "for" command

rodgers@maxwell.mmwb.ucsf.edu (ROOT) (12/08/90)

Dear Netlanders,

Does anyone know how to protect whitespace in items to be passed to the
"for" operator of the Bourne shell?  Consider the script:

#! /bin/sh
#
# Define list
#
list="'a b' c"
#
# Use list
#
for item in $list
do
   grep $item inputfile
done
#
# Script complete

where "inputfile" might contain, for example:

a b
c
d

The idea is to grep for each of the regular expressions appearing in $list,
one at a time, in the file "inputfile".  In the above example,
"a b" is meant to comprise one such pattern, and "c" another.
I have tried all sorts of combinations of \, ', and " in the definition
of "list" and in the appearance of "$list" on the "for" command line,
in an attempt to prevent the shell from parsing arguments on the whitespace
contained within the the expr "a b", all to no avail.  One such combination
of failed quoting mechanisms is displayed above.

Please, no responses of the form "why do you want to do this," "use perl,"
"use awk," etc.  The above boils down the essence of a problem which appears
in quite a different context.

Any ideas????

Thanks and Cheerio, Rick Rodgers

mcgrew@ichthous.Eng.Sun.COM (Darin McGrew) (12/08/90)

In article <16570@cgl.ucsf.EDU> rodgers@maxwell.mmwb.ucsf.edu (ROOT) writes:
>Does anyone know how to protect whitespace in items to be passed to the
>"for" operator of the Bourne shell?  Consider the script:

Use `eval` so that the quotes are evaluated as such.  Here's the
revised script--

	#! /bin/sh
	#
	# Define list
	#
	list="'a b' c"
	#
	# Use list
	#
	eval	for item in "$list" \; \
		do \
			grep \"\$item\" inputfile \; \
		done
	#
	# Script complete

Yes, getting the quoting right can be difficult if the body of
the loop is large.  Another option might be to have a small loop
that feeds a `while read foo` loop--

	eval	for item in $list \; \
		do \
			echo \"\$item\" \; \
		done |
	while read item
	do
		grep "$item" inputfile
		# More big, hairy, loop that would be too
		# confusing with '\' characters everywhere
	done

Darin McGrew			mcgrew@Eng.Sun.COM
Affiliation stated for identification purposes only.

lml@cbnews.att.com (L. Mark Larsen) (12/08/90)

In article <16570@cgl.ucsf.EDU>, rodgers@maxwell.mmwb.ucsf.edu (ROOT) writes:
# Does anyone know how to protect whitespace in items to be passed to the
# "for" operator of the Bourne shell?  Consider the script:
# 
# #! /bin/sh
# #
# # Define list
# #
# list="'a b' c"
# #
# # Use list
# #
# for item in $list
# do
#    grep $item inputfile
# done
# #
# # Script complete
# 
# where "inputfile" might contain, for example:
# 
# a b
# c
# d
# 
One way to do what you want is to set the positional parameters and loop
through them:

set -- 'a b' c
for item
do
	grep "$item" inputfile
done

Of course, if your script was called with arguments, you may have a small
problem to get around - especially if any of the original arguments had
embedded white space.  Possibly the safest and easiest thing to avoid this
sort of problem might be to use a function:

doit()
{
	for item
	do
		grep "$item" inputfile
	done
}

doit 'a b' c	# once

for arg
do
	# process original args differently
	echo $arg
done

doit 'd e' f	# again

The function idea is quite useful in other situations.  For example, suppose 
you want to change the value of some variable in a script but the change is
taking place inside of a loop where the output is redirected.  With the Bourne
shell (fixed in the Korn shell) such a loop is run in a subshell which means
the change to the variable in the script's environment is lost:

# with /bin/sh, foo is not changed
foo=bar
for i
do
	foo=$i
	echo "loop: foo = $foo"
done >/dev/tty
echo "final = $foo"

However, by putting the loop in a function, the change does take place to
the script's environment:

# in this case, foo *is* changed
doit()
{
	for i
	do
		foo=$i
		echo "loop: foo = $foo"
	done
}

foo=bar
doit $* >/dev/tty
echo "foo = $foo"

cheers,
L. Mark Larsen
lml@atlas.att.com

maart@cs.vu.nl (Maarten Litmaath) (12/11/90)

In article <4198@exodus.Eng.Sun.COM>,
	mcgrew@ichthous.Eng.Sun.COM (Darin McGrew) writes:
)In article <16570@cgl.ucsf.EDU> rodgers@maxwell.mmwb.ucsf.edu (ROOT) writes:
)>Does anyone know how to protect whitespace in items to be passed to the
)>"for" operator of the Bourne shell?  Consider the script:
)
)Use `eval` so that the quotes are evaluated as such.  Here's the
)revised script--
)
)	#! /bin/sh
)	#
)	# Define list
)	#
)	list="'a b' c"
)	#
)	# Use list
)	#
)	eval	for item in "$list" \; \
)		do \
)			grep \"\$item\" inputfile \; \
)		done
)	#
)	# Script complete
)
)Yes, getting the quoting right can be difficult if the body of
)the loop is large.  [...]

Another option is to use the ``set'' command, if the original $* arguments
aren't needed:

	# First remember the original args.

	argc=0
	argv=

	for i
	do
		argc=`expr $argc + 1`
		eval argv$argc='"$i"'
		argv="$argv \"\$argv$argc\""
	done

	# Now set the stuff we want to process.
	# The initial `x' is there to make sure the first argument of the
	# ``set'' command does not start with a `-'.  This method is more
	# portable than ``set - ...''.

	eval set x "$list"
	# Now get rid of the dummy arg.
	shift

	for item
	do
		# The `-e' option `protects' the pattern.
		grep -e "$item" $inputfile
	done

	# Reset the args.
	eval set x $argv
	shift

If the loop can be executed in a subshell, we don't need to remember the
args:
	(
		eval set x "$list"
		shift

		for item
		do
			...
		done
	)
--
In the Bourne shell syntax tabs and spaces are equivalent almost everywhere.
The exception: _indented_ here documents.  :-(
Does anyone remember the famous mistake Makefile-novices often make?

martin@mwtech.UUCP (Martin Weitzel) (12/11/90)

In article <16570@cgl.ucsf.EDU> rodgers@maxwell.mmwb.ucsf.edu (ROOT) writes:
>Dear Netlanders,
>
>Does anyone know how to protect whitespace in items to be passed to the
>"for" operator of the Bourne shell?  Consider the script:
>
>#! /bin/sh
>#
># Define list
>#
>list="'a b' c"
>#
># Use list
>#
>for item in $list
>do
>   grep $item inputfile
>done
>#
># Script complete
>
>where "inputfile" might contain, for example:
>
>a b
>c
>d

If you have any character that will never appear in the items of your
list, you can use this character as delimiter for the items and change
IFS (in most cases it is wise to restore IFS for the rest of the script):

	list="a b:c"
	CIFS=$IFS			# save IFS
	IFS=:
	for item in $list
	do
		IFS=$CIFS		# restore IFS here for the loop
		grep "$item" inputfile
	done
	IFS=$CIFS	# or restore it here for the rest of the script


In my example I used `:' as delimiter character; if you need all the
printing characters you can use some control character (e.g. BEL,
aka ^G), and, if you do it right, you can even use a newline character:

	list="a b
	c"
	IFS="
	"  # ^-- no white space between double quote and newline!!!
	for item in $list
	do
		grep $item inputfile
	done

In this example I left out saving and restoring IFS. 

Now, as we just touched this topic (and for all who don't know):
IFS contains the characters that are used as separators for the
command name and its parameters. In the times I had less experience
with the (Bourne-) shell, I thought the above (second example)
couldn't work, because: How does the shell separate the parts of
the command-line in the body of the loop, when no blank (space character)
occurs within IFS?

The answer is that the space character is *allways* a valid separator,
no matter what is specified in IFS. So the line

	grep $item inputfile

is correctly tokenized into three parts. Then, after several other
shell constructs were recognized, there is a step which replaces the
construct `$item' by he contents ov the variable `item'. And finally,
there comes the step where IFS is obbeyed and the line is further
separated.

For this reason you need no doublequotes around `$item' in the second
example, because IFS doesn't contain a space then, but you absolutely
need them in the first example!! (Think about it, if this is not clear
to you - and then try it.)
-- 
Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83

martin@mwtech.UUCP (Martin Weitzel) (12/11/90)

In article <4198@exodus.Eng.Sun.COM> mcgrew@ichthous.Eng.Sun.COM (Darin McGrew) writes:
>In article <16570@cgl.ucsf.EDU> rodgers@maxwell.mmwb.ucsf.edu (ROOT) writes:
>>Does anyone know how to protect whitespace in items to be passed to the
>>"for" operator of the Bourne shell?  Consider the script:
>
>Use `eval` so that the quotes are evaluated as such.  Here's the
>revised script--
>
>	#! /bin/sh
>	#
>	# Define list
>	#
>	list="'a b' c"
>	#
>	# Use list
>	#
>	eval	for item in "$list" \; \
>		do \
>			grep \"\$item\" inputfile \; \
>		done
>	#
>	# Script complete
>
>Yes, getting the quoting right can be difficult if the body of
>the loop is large. 

Yes, getting the quoting right can be difficult :-( .... but I have
found a simple trick that makes it much easier :-).

Quoting is necessary as the shell essentially parses the arguments of
the `eval'-command two times and the programmer must take care that
some parts are evaluated in the first parse, others in the second.

Most people now quote (only) the parts that must *not* be evaluated in
the first parse. Make it vice versa and quote everything *except* what
must be evaluated in the first parse.

	eval '	for item in '"$list"';
		do
			grep "$item" inputfile;
		done
	'

Looks a little nicer, doesn't it? If you have hardcopy of this, there
is another trick to see what's going on: Take one of this yellow marker
pencils to highlite everything from one single qoute to the next. Leave
out the unquoted parts. I'll try to show it here with capitals:

	eval '	FOR ITEM IN '"$list"';
		DO
			GREP "$ITEM" INPUTFILE;
		DONE
	'

Everything that is highlited on your hardcopy (or capitalized above)
is taken literally during the first parse. Easy to recognize that only the
contents of the variable `list' will be substituted during this.
-- 
Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83