[comp.lang.c] Scrunch blank lines

peckham@svax.cs.cornell.edu (Stephen Peckham) (03/25/89)

In article <620@gonzo.UUCP> daveb@gonzo.UUCP (Dave Brower) writes:
>So, I offer this week's challenge:  Smallest program that will take
>"blank line" style cpp output on stdin and send to stdout a scrunched
>version with appropriate #line directives.  [f]lex, Yacc, [na]awk, sed,
>perl, c, c++ are all acceptable.  This will be an amusing excercise in
>typical text massaging that can be enlightening for many people.
>
Here's an awk program that will do the trick.  Single blank lines are left as
is.  Multiple blank lines are removed, and a new line directive is added.

{if (NF == 0) blanks++
 else if ($1=="#") {l_no = $2-1; f = $3; blanks = 2;}
 else {
	if (blanks > 1) print "#", l_no, f;
	else if (blanks == 1) print "";
	blanks = 0;
	print $0;
      }
 l_no++;
}	

Steve Peckham

duane@cg-atla.UUCP (Andrew Duane) (03/27/89)

> In article <620@gonzo.UUCP> daveb@gonzo.UUCP (Dave Brower) writes:
>So, I offer this week's challenge:  Smallest program that will take
>"blank line" style cpp output on stdin and send to stdout a scrunched
>version with appropriate #line directives.  [f]lex, Yacc, [na]awk, sed,
>perl, c, c++ are all acceptable.

If shell scripts are acceptable, how about:

	#!/bin/sh
	cat -s

You may have to use "more" rather than cat. The moral: please
don't reinvent the wheel [1/2 ;-)]

Andrew L. Duane (JOT-7)  w:(508)-658-5600 X5993  h:(603)-434-7934
Compugraphic Corp.			 decvax!cg-atla!duane
200 Ballardvale St.		       ulowell/ \laidback
Wilmington, Mass. 01887		   cbosgd!ima/   \cgeuro
Mail Stop 200II-3-5S		     ism780c/     \wizvax

Only my cat shares my opinions, and she has no blank lines.

daveb@gonzo.UUCP (Dave Brower) (03/29/89)

In article <6839@cg-atla.UUCP> duane@cg-atla.UUCP (Andrew Duane) writes:
>> In article <620@gonzo.UUCP> daveb@gonzo.UUCP (Dave Brower) writes:
>>So, I offer this week's challenge:  Smallest program that will take
>>"blank line" style cpp output on stdin and send to stdout a scrunched
>>version with appropriate #line directives.  [f]lex, Yacc, [na]awk, sed,
>>perl, c, c++ are all acceptable.
>
>If shell scripts are acceptable, how about:
>
>	#!/bin/sh
>	cat -s
>
>You may have to use "more" rather than cat. The moral: please
>don't reinvent the wheel [1/2 ;-)]

Sorry, you lept at the naive and incorrect solution.   Please  say "with
appropriate #line directives."  Cat -s obfuscates matching the output
lines with the input lines.  That is the point of the challenge.

I have two entries so far, one in "lex" and another in "awk".  Both are
less than 20 lines.  It will be interesting to compare timings between
awk, gawk, nawk, lex and flex.

-dB
-- 
"I came here for an argument." "Oh.  This is getting hit on the head"
{sun,mtxinu,amdahl,hoptoad}!rtech!gonzo!daveb	daveb@gonzo.uucp

bernsten@phoenix.Princeton.EDU (Dan Bernstein) (03/30/89)

Dave Brower asks for a filter ``that will take "blank line" style cpp
output on stdin and send to stdout a scrunched version with appropriate
#line directives.'' If we may combine built-in utilities to handle the
problem, then this 9-line shell script will do it (combine the last
two lines to make it 8):

  #!/bin/sh
  ( tr XY '\375\376' | sed 's/^\(.\)\(.*\)/X\1\2Y/
  tend
  i\
  X#line
  d
  :end
  =' | uniq | tr '\012X' ' \012'; echo ''; )
  | sed 's/Y.*//' | tr '\375\376' XY | sed -n '1!p'

The idea is reasonably simple; one could use, e.g., grep -n '.' to
obtain a similar solution. This particular version destroys any \375 and
\376 you may have in your source, and because it's based on sed, it omits
the final line if it has no newline. It has been tested successfully on
a wide variety of sources, and I must say the next time I feel compelled
to look at cpp output, I'll definitely use it.

> I have two entries so far, one in "lex" and another in "awk".  Both are
> less than 20 lines.  It will be interesting to compare timings between
> awk, gawk, nawk, lex and flex.

Ahem? Are we forgetting sed here? (Then again, I hate awk, love sed,
and prefer C to lex. I'd rather have a sed script twice as slow as an
awk script. But that's just personal bias.)

If you time, make sure to test out on really long sources too. I'd hate
to see my script penalized just because it totals eight+sh execs :-).

---Dan Bernstein, bernsten@phoenix.princeton.edu

rupley@arizona.edu (John Rupley) (03/31/89)

In article <7472@phoenix.Princeton.EDU>, bernsten@phoenix.Princeton.EDU (Dan
Bernstein) writes:
> Dave Brower asks for a filter ``that will take "blank line" style cpp
> output on stdin and send to stdout a scrunched version with appropriate
                                                              ^^^^^^^^^^^
> #line directives.'' If we may combine built-in utilities to handle the
> problem, then this 9-line shell script will do it (combine the last
> two lines to make it 8):
> 
>   #!/bin/sh
>   ( tr XY '\375\376' | sed 's/^\(.\)\(.*\)/X\1\2Y/
>   tend
>   i\
>   X#line
>   d
>   :end
>   =' | uniq | tr '\012X' ' \012'; echo ''; )
>   | sed 's/Y.*//' | tr '\375\376' XY | sed -n '1!p'

I am not sure this is what the original poster wanted, ie ``appropriate''
may refer to #line directives with line numbers that reference the 
source file, not the cpp output.  Regardless, the above script is 
truly trivial in Lex:

%%
\n\n+	printf("\n#line %d \n", yylineno);
.|\n	ECHO;

> Ahem? Are we forgetting sed here? (Then again, I hate awk, love sed,
> and prefer C to lex. I'd rather have a sed script twice as slow as an
> awk script. But that's just personal bias.)

How could one forget sed (:-)?  But for matching patterns that cross
line boundaries, Lex is a natural, because it sees a file as a stream of
characters rather than as a stream of records. Sed and awk are record-based
and thus seem forced for multi-line matching.  Prefer C to Lex? Hmmm... Lex
is just the machinery for a pattern-based switch statement, with the user
supplying ``case'' statements written in C.

John Rupley
rupley!local@megaron.arizona.edu

rupley@arizona.edu (John Rupley) (04/01/89)

From rupley!local Fri Mar 31 13:43:14 1989
In article <620@gonzo.UUCP> daveb@gonzo.UUCP (Dave Brower) writes:
>So, I offer this week's challenge:  Smallest program that will take
>"blank line" style cpp output on stdin and send to stdout a scrunched
>version with appropriate #line directives.

The following Lex source is somewhat shorter than a previous Lex version.
Specifications assumed:  single blank lines, as well as runs of blank
lines +- <#> line directives, are to be replaced by <# lineno
"filename">; only truly blank lines (no space or tab) are to be
considered blank.  
------------------------------------------------------------------------
	char f[80];
%S P
%%
#.+\n	{sscanf(yytext,"#%d%s",&yylineno,f);BEGIN P;}
<P>.+\n	{printf("# %d %s\n",yylineno-1,f);ECHO;BEGIN 0;}
\n	BEGIN P;
.+\n	ECHO;
------------------------------------------------------------------------
John Rupley
rupley!local@megaron.arizona.edu

rupley@arizona.edu (John Rupley) (04/01/89)

In article <620@gonzo.UUCP> daveb@gonzo.UUCP (Dave Brower) writes:
>So, I offer this week's challenge:  Smallest program that will take
>"blank line" style cpp output on stdin and send to stdout a scrunched
>version with appropriate #line directives.

Yet another Lex version:
------------------------------------------------------------------------
	char f[80]; int x;
%%
#.+\n	{sscanf(yytext,"#%d%s",&yylineno,f); x++;}
.+\n	{if(x)printf("# %d %s\n",yylineno-1,f); ECHO; x=0;}
\n	x++;
------------------------------------------------------------------------
John Rupley
rupley!local@megaron.arizona.edu