[comp.unix.wizards] This is strange...

maart@cs.vu.nl (Maarten Litmaath) (12/22/88)

mcapron@ektools.UUCP (M. Capron) writes:
\#!/bin/sh
\for i in *.c
\do
\#Place a list of include files in $incs seperated by spaces.
\#CODE A or CODE B goes here.
\	echo "$i : $incs"
\done

\CODE A: This works.
\incs=`egrep '^#[ 	]*include[ 	]*"' $i | awk '{printf "%s ", $2}'`
\incs=`echo "$incs" | sed 's/"//g'`

\CODE B: This does not work.
\incs=`egrep '^#[ 	]*include[ 	]*"' $i | awk '{printf "%s ", $2}' |
	sed 's/"//g'`

Compare your example with the following:

	% echo -n 'merry Xmas' | sed 's/.*/&, happy new year/'
	%

Now get rid of the `-n' and suddenly everything works! The problem: sed won't
do anything with unfinished lines! You explicitly didn't append a newline in
the awk script. See how far that got you! :-)
Solution:

	incs=`egrep '^#[ 	]*include[ 	]*"' $i |
		awk '       {printf "%s ", $2}
			END {printf "\n"}' |
		sed 's/"//g'`

BTW, it's not forbidden to use newlines between backquotes!
Another interesting case:

	$ cat > merry_Xmas
	happy
	1989
	$ card=`cat merry_Xmas`
	$ echo $card
	happy 1989
	$ echo "$card"
	happy
	1989

Csh hasn't got this anomaly.
-- 
if (fcntl(merry, X_MAS, &a))          |Maarten Litmaath @ VU Amsterdam:
        perror("happy new year!");    |maart@cs.vu.nl, mcvax!botter!maart

mcapron@ektools.UUCP (M. Capron) (12/23/88)

Here is some bizareness I found.  Below is a subset of a Bourne Shell script I
am writing on a Sun 3/60 running SunOS 4.0.  This segment generates dependency
lists for makefiles.  Note that the egrep brackets should contain a space and
a tab.

#!/bin/sh
for i in *.c
do
#Place a list of include files in $incs seperated by spaces.
#CODE A or CODE B goes here.
	echo "$i : $incs"
done

CODE A: This works.
incs=`egrep '^#[         ]*include[     ]*"' $i | awk '{printf "%s ", $2}'`
incs=`echo "$incs" | sed 's/"//g'`

CODE B: This does not work.
incs=`egrep '^#[         ]*include[     ]*"' $i | awk '{printf "%s ", $2}' | sed 's/"//g'`

With CODE B, $incs comes out to be nil.  I can't figure out what the difference
is, nor do I have the patience to play with it any furthing.  I present it as an
oddity to any interested parties. 

					Sincerely,
					Mike Capron

capron@chiron.UUCP

ditto@cbmvax.UUCP (Michael "Ford" Ditto) (12/23/88)

In article <1652@ektools.UUCP> mcapron@ektools.UUCP (M. Capron) writes:
>CODE A: This works.
>incs=`egrep '^#[         ]*include[     ]*"' $i | awk '{printf "%s ", $2}'`
>incs=`echo "$incs" | sed 's/"//g'`
>
>CODE B: This does not work.
>incs=`egrep '^#[         ]*include[     ]*"' $i | awk '{printf "%s ", $2}' | sed 's/"//g'`
>
>With CODE B, $incs comes out to be nil.

echo outputs a newline after its arguments, while your awk program won't.
sed only processes lines that are properly newline-terminated.
-- 
					-=] Ford [=-

"The number of Unix installations	(In Real Life:  Mike Ditto)
has grown to 10, with more expected."	ford@kenobi.cts.com
- The Unix Programmer's Manual,		...!sdcsvax!crash!elgar!ford
  2nd Edition, June, 1972.		ditto@cbmvax.commodore.com

leo@philmds.UUCP (Leo de Wit) (12/23/88)

In article <1652@ektools.UUCP> mcapron@ektools.UUCP (M. Capron) writes:
|
|Here is some bizareness I found.  Below is a subset of a Bourne Shell script I
|am writing on a Sun 3/60 running SunOS 4.0.  This segment generates dependency
|lists for makefiles.  Note that the egrep brackets should contain a space and
|a tab.
|
|#!/bin/sh
|for i in *.c
|do
|#Place a list of include files in $incs seperated by spaces.
|#CODE A or CODE B goes here.
|	echo "$i : $incs"
|done
|
|CODE A: This works.
|incs=`egrep '^#[         ]*include[     ]*"' $i | awk '{printf "%s ", $2}'`
|incs=`echo "$incs" | sed 's/"//g'`
|
|CODE B: This does not work.
|incs=`egrep '^#[         ]*include[     ]*"' $i | awk '{printf "%s ", $2}' | sed 's/"//g'`
|
|With CODE B, $incs comes out to be nil.  I can't figure out what the difference
|is, nor do I have the patience to play with it any furthing.  I present it as an
|oddity to any interested parties. 

There certainly is a difference (although it may not be very obvious).
The awk script does not append a newline to the header file list it is
generating. In the case of CODE A that is not a problem: echo will send
one down the pipe to sed. In the case of CODE B sed is attached
directly to awk's output, so it will never get a newline. And since sed
needs a newline as 'input record marker' , it will exit without having
recognized a valid input record - and hence not supply any output.

The solution is simple: add a trailing print statement to the awk script,
as follows:
CODE C: This does also work.
incs=`egrep '^#[ 	]*include[ 	]*"' $i |
      awk '{printf "%s ", $2} END {print}' | sed 's/"//g'`

Furthermore I would like to make some remarks about the script; maybe they
are of some use to someone.

1) The use of a 3 process pipeline for such a simple task seems a
little bit overdone; it all lays well within the capabilities of one,
e.g. with sed:

CODE D: This does also work.
incs=`sed -n '
/^[ 	]*#[ 	]*include[ 	]*"/{
    s/[^"]*"\([^"]*\)".*/\1/
    H
}
${
    g
    s/\n/ /gp
}' $i`

It is even possible to avoid the echo, the `` and incs, since sed can
handle that as well:

CODE E: This does also work (omit the echo in this case).
sed -n '
/^[ 	]*#[ 	]*include[ 	]*"/{
    s/[^"]*"\([^"]*\)".*/\1/
    H
}
${
    g
    s/\n/ /g
    s/^/'$i' : /p
}' $i

The other points are more of a C issue, but I will present them here
since the script was also:

2) When searching for '#include' lines one should allow leading white space.
There is nothing that I could find that forbids white space before the #.
Some programmers even use it to clearify nested conditionals (with #ifdef).
The CODE D,E examples allow leading white space.

3) Source files are not dependent of the header files they name. This
is a commonly made mistake. To understand this, you must realize that
the source file will not change due to a modification in a header file.
The object file however will, since code is generated from the expanded
source file (the output of the preprocessor phase).
So the dependencies should contain lines like:

    file.o : incl.h   (or perhaps: file.o : file.c incl.h)

instead of

    file.c : incl.h

The easiest way is to strip off the .c, and use the filename without
extension:

for i in `echo *.c|sed 's/\.c//g'`
do
#CODE X goes here, using file $i.c
	echo "$i.o : $incs"
done

4) Be aware that the script does not handle header files containing
header files.  Note that an object (amongst others) depends upon all
(nested) included files.  To handle this well, you may perhaps also
want to detect illegal recursion; this is not easy in case of
conditional inclusion, since it depends on preprocessor expressions.

Hope this helps -
                    Leo.

logan@vsedev.VSE.COM (James Logan III) (12/24/88)

In article <1652@ektools.UUCP> mcapron@ektools.UUCP (M. Capron) writes:
# 
# CODE A: This works.
# incs=`egrep '^#[         ]*include[     ]*"' $i | awk '{printf "%s ", $2}'`
# incs=`echo "$incs" | sed 's/"//g'`
# 
# CODE B: This does not work.
# incs=`egrep '^#[         ]*include[     ]*"' $i | awk '{printf "%s ", $2}' | sed 's/"//g'`
# 

Someone else already answered your question correctly, but I have another
version for you that handles all of the following cases:

	#include <file>
	#include "file"
	#  include <file>
	#  include "file"

and runs a little faster, since it is all contained in one awk script
and does not required additional processing by sed.

incs=`
	awk '
		/^#[ 	]*include[ 	]*/ {
			if (NF == 3) {
				# line is like "# include <file>"
				INCFILE=$3;
			} else {
				# line is like "#include <file>"
				INCFILE=$2;
			}
			print substr(INCFILE, 2, length(INCFILE) - 2);
		}
	' <$i;
`;

			-Jim
-- 
Jim Logan		logan@vsedev.vse.com
(703) 892-0002		uucp:	..!uunet!vsedev!logan
			inet:	logan%vsedev.vse.com@uunet.uu.net

pme@umb.umb.edu (Paul English) (12/29/88)

In article <1652@ektools.UUCP> you write:

>Here is some bizareness I found.  Below is a subset of a Bourne Shell
>script I am writing on a Sun 3/60 running SunOS 4.0.  This segment
>generates dependency lists for makefiles. ...

Why don't you just use the -M option of the C compiler to produce your
dependency lists? (Or, if you have gnu gcc, use -MM, which is better.)
See the cc man page for details on -M.

My (generic) application Makefile ends like this:

-----
    dependencies:   $(sources)
                    @echo "# make dependencies for $(program)" > dependencies
                    $(compile) -M $(csrc) >> dependencies

    include dependencies
-----
Of course you have to first create an empty dependencies file (``touch
dependencies''), or comment out the include, until you do the ``make
dependencies'').