[comp.lang.c] C preprocessing

bomgard@iuvax.cs.indiana.edu (Tim Bomgardner) (09/22/90)

This is a philosophical/religious/semi-rhetorical question.  Wouldn't it
be nice, if in the following code the compiler would do what I meant
instead of what I said?  If I (and I'm sure everyone reading this) can
see exactly what is wanted, why can't a compiler?  Code is written two-
dimensionally--it has visual structure (at least mine does)--so why does
it have to processed as a one-dimensional string of tokens?  My personal
preprocessor (when I get around to writing it) will know exactly what to
do with this:

    if (bool_expression)
        do_something();
        do_something_else();
        if (another_bool_expression)
            do_anything();
    else
        do_this_when_bool_expression_is_false();

karl@haddock.ima.isc.com (Karl Heuer) (09/22/90)

In article <59770@iuvax.cs.indiana.edu> bomgard@iuvax.cs.indiana.edu (Tim Bomgardner) writes:
>Wouldn't it be nice, if ... the compiler would do what I meant [by
>interpreting the indentation] instead of what I said?

I personally like the idea, but it's not C (nor any other language I use,
though I understand Occam has it).  So, being realistic, let's say instead:

Wouldn't it be nice if the compiler (or some related tool) would provide
(optional!) warnings for possibly misindented code?

Things to worry about: (a) whitespace caused by macro expansion; (b) how to
count tabs vs. spaces; (c) accepting all plausible personal indentation styles
(or making it user-configurable).

Karl W. Z. Heuer (karl@kelp.ima.isc.com or ima!kelp!karl), The Walking Lint

martin@mwtech.UUCP (Martin Weitzel) (09/24/90)

In article <18102@haddock.ima.isc.com> karl@kelp.ima.isc.com (Karl Heuer) writes:
>In article <59770@iuvax.cs.indiana.edu> bomgard@iuvax.cs.indiana.edu (Tim Bomgardner) writes:
>>Wouldn't it be nice, if ... the compiler would do what I meant [by
>>interpreting the indentation] instead of what I said?
[...]
>Wouldn't it be nice if the compiler (or some related tool) would provide
>(optional!) warnings for possibly misindented code?

If you have an indentation tool (eg. "cb") which satisfies your personal
stile, you can reformat the source with it and then look for diff's wrt
the original version. This could also be done automatically as part of
"lint"-ing the source.
-- 
Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83

rmj@tcom.stc.co.uk (Rhodri James) (09/24/90)

In article <18102@haddock.ima.isc.com> karl@kelp.ima.isc.com (Karl Heuer) writes:
>In article <59770@iuvax.cs.indiana.edu> bomgard@iuvax.cs.indiana.edu (Tim Bomgardner) writes:
>>Wouldn't it be nice, if ... the compiler would do what I meant [by
>>interpreting the indentation] instead of what I said?
>
>I personally like the idea, but it's not C (nor any other language I use,
>though I understand Occam has it).  So, being realistic, let's say instead:

I understand (from a friend who uses Occam) that it's a complete pain.
Counting spaces, even on a fixed character width screen like nearly all
are, is at times a non-trivial operation.

>Wouldn't it be nice if the compiler (or some related tool) would provide
>(optional!) warnings for possibly misindented code?

Mmmm. OK, I'll buy that.

>Things to worry about: (a) whitespace caused by macro expansion; (b) how to
>count tabs vs. spaces; (c) accepting all plausible personal indentation styles
>(or making it user-configurable).

My main reason for not wanting to use such a thing is that my
indentation habits varying between 3, 4, 5 and 8 spaces depending on how
I felt when I woke up and whose code I'm working on at the time. I can't
say as I would enjoy having to keep adding switches to my compiles just
to get it to shut up from time to time.

>Karl W. Z. Heuer (karl@kelp.ima.isc.com or ima!kelp!karl), The Walking Lint

Rhodri
-- 
* Windsinger                 * "But soft, what light through yonder
* rmj@islay.tcom.stc.co.uk   *      airlock breaks?"
* rmj@tcom.stc.co.uk         *    --RETURN TO THE FORBIDDEN PLANET
* rmj10@phx.cam.ac.uk        *  You've gotta be cruel to be Khund!

johnb@srchtec.UUCP (John T. Baldwin) (09/24/90)

[ellipsis indicates deletion for brevity --jtb]

In article <59770@iuvax.cs.indiana.edu> bomgard@iuvax.cs.indiana.edu
  (Tim Bomgardner) writes:
>Wouldn't it be nice ... if the compiler would do what I meant
>instead of what I said?
>...
>Code is written two-dimensionally ... why does it have to processed as
> a one-dimensional string of tokens?
>...
>    if (bool_expression)
>        do_something();
>        do_something_else();
>        if (another_bool_expression)
>            do_anything();
>    else
>        do_this_when_bool_expression_is_false();

This begs several questions.

#1  Why does the program have to be processed as a one-dimensional string
    of tokens?

It doesn't.

At least, programs in general do not have to be processed this
way.  Your C programs are, because the language has been defined that way.
The reason why language designers like to do this is because it makes it
easier to write the compiler (i.e. the lexer and parser are easier).

If this is not complete enough for you, I'd suggest learning something
about compilers.  I'm *not* a compiler expert, but a good book to read is
the so-called "Dragon Book" by Aho, Sethi, and Ullman (sp?).
At least it is a widely-known text, and I liked it. :-)

#2  Why can't the compiler "catch" the fact that what was meant (above)
    isn't what was said?

Because a compiler's job is to translate what was *said*.  The only
analysis the compiler is required to perform is whatever is germane
to that task.

If you want a critique, use LINT or a lint-like analyser.  At least
one LINT that I know of "understands" general indentation rules and
will flag the probable error above, i.e.

   "line xxx: do you know you *outdented* the 'else' that goes with
    'if (another_bool_expression)' ???"


#3   Wouldn't it be nice if the compiler would do what I meant, instead
     of what I said?

Yes.

When you (or anyone else) manages to do this, be *sure* to publish.  Quickly.
You'll probably win a Nobel Prize.    :-)   :-)   :-)   :-)   :-)



-- 
John T. Baldwin                      |  johnb%srchtec.uucp@mathcs.emory.edu
Search Technology, Inc.              | 
                                     | "... I had an infinite loop,
My opinions; not my employers'.      |  but it was only for a little while..."

rob@raksha.eng.ohio-state.edu (Rob Carriere) (09/25/90)

In article <223@srchtec.UUCP> johnb@srchtec.UUCP (John T. Baldwin) writes:
>#1  Why does the program have to be processed as a one-dimensional string
>    of tokens?
>
>It doesn't.
>
>At least, programs in general do not have to be processed this
>way.  Your C programs are, because the language has been defined that way.
>The reason why language designers like to do this is because it makes it
>easier to write the compiler (i.e. the lexer and parser are easier).

There is at least one other reason.  Many people (especially those who have
been exposed to FORTRAN) consider formatted languages to be A Bad Thing.  The
reason being that my idea of legible formatting need not coincide with the
language designer's.  In a free-form language like C or Pascal, this doesn't
matter, I can format the program any way I want.  Similarly, if you have to
maintain my code and you don't like my formatting, a beautifying tool will
take care of the problem.  With a 2-D language, we would all have to use Big
Brothers Approved Format.

Of course, a lint (or a linting option on a compiler) that takes formatting
into account is a different matter altogether.  I have nothing against
warnings, as long as they follow Heuer's Law.

SR
---

jensting@skinfaxe.diku.dk (Jens Tingleff) (09/25/90)

rob@raksha.eng.ohio-state.edu (Rob Carriere) writes:

>In article <223@srchtec.UUCP> johnb@srchtec.UUCP (John T. Baldwin) writes:
>>#1  Why does the program have to be processed as a one-dimensional string
>>    of tokens?
>>
>>It doesn't.
>>
>>At least, programs in general do not have to be processed this
>>way.  Your C programs are, because the language has been defined that way.
>>The reason why language designers like to do this is because it makes it
>>easier to write the compiler (i.e. the lexer and parser are easier).

And, way out of copmp.lang.c, OCCAM. The english language designed 
specifically for parallel processing has indention as a block delimiter.
E.g. (the SEQ means SEQuence)

      SEQ
	SEQ			-- Some operations in sequence
  	  i := 7
etc etc
	  do.some.thing.or.other(arg.1, arg.2)
	PAR			-- Two operations in parallel
	  do.one.thing()
	  do.another()

etc. (the dot `.' is a legal part of a name, weird hu ?)

The only thing that makes this bearable, is the idea of folding text
editors. In a folding text editor, a group of lines can be made to appear
as one line (usually a descriptive caption), so the above becomes

     SEQ
	SEQ
          ... Some operations in parallel
	PAR
	  ... Two oparetions in parallel

This makes the thing readable (slightly..).

>There is at least one other reason.  Many people (especially those who have
>been exposed to FORTRAN) consider formatted languages to be A Bad Thing.  The
>reason being that my idea of legible formatting need not coincide with the
>language designer's. 

Amen to that. In OCCAM for instance you can't use one of my favourite 
constructions (but then again, why would you if you weren't me ;^)

	IF .. THEN  init_statements;
	  real_work_statements;
	END;

in Modula-2 where I don't need the BEGIN from pascal.. .

Back to you C sufferings ;^)

	Jens
Jens Tingleff MSc EE, Institute of Computer Science, Copenhagen University
Snail mail: DIKU Universitetsparken 1, DK2100 KBH O
"It never runs around here; it just comes crashing down"
	apologies to  Dire Straits 

bomgard@iuvax.cs.indiana.edu (Tim Bomgardner) (09/25/90)

In article <223@srchtec.UUCP> johnb@srchtec.UUCP (John T. Baldwin) writes:
}[ellipsis indicates deletion for brevity --jtb] --tab, too.
}
}In article <59770@iuvax.cs.indiana.edu> bomgard@iuvax.cs.indiana.edu
}  (Tim Bomgardner) writes:
}>Wouldn't it be nice ... if the compiler would do what I meant
}>instead of what I said?
}>...
}>Code is written two-dimensionally ... why does it have to processed as
}> a one-dimensional string of tokens?
}>...
}>    if (bool_expression)
}>        do_something();
}>        do_something_else();
}>        if (another_bool_expression)
}>            do_anything();
}>    else
}>        do_this_when_bool_expression_is_false();
}
}This begs several questions.
}
}#1  Why does the program have to be processed as a one-dimensional string
}    of tokens?
}
}It doesn't.

But it is.  And it has to be, "because the language has been defined that
way."  Assuming of course you still consider what I wrote to be C.

}At least, programs in general do not have to be processed this
}way.  Your C programs are, because the language has been defined that way.
}The reason why language designers like to do this is because it makes it
}easier to write the compiler (i.e. the lexer and parser are easier).
}
}[...suggestion that I read up on compiler theory]

Boy, give 'em lex and yacc and tell 'em about LALR(1) and they're ready
for the big time.  I've produced a compiler or two.  I don't know about
the "language designers" here, but my parsers have no difficulty at all
with this sort of grammer.  You wanna know what's hard?  The parser I'm
working on right now recognizes graphics.

}#2  Why can't the compiler "catch" the fact that what was meant (above)
}    isn't what was said?
}
}Because a compiler's job is to translate what was *said*.  The only
}analysis the compiler is required to perform is whatever is germane
}to that task.

Says you.  In the case of C, I agree.  But I'm not really talking about
C anymore.  Maybe I shouldn't call it a preprocessor, although that's
how I think of it.  Perhaps translator would be a better word.  My
intention is to translate into C the way AT&T does with C++.

}If you want a critique, use LINT or a lint-like analyser.  At least
}one LINT that I know of "understands" general indentation rules and
}will flag the probable error above, i.e.
}
}   "line xxx: do you know you *outdented* the 'else' that goes with
}    'if (another_bool_expression)' ???"

You see!  It isn't really all that hard at all to recognize what's
going on.  But I don't want a critique; I want it to just do it.  That's
why I use a compiler in the first place instead of an assembler.

}#3   Wouldn't it be nice if the compiler would do what I meant, instead
}     of what I said?
}
}Yes.
}
}When you (or anyone else) manages to do this, be *sure* to publish.  Quickly.
}You'll probably win a Nobel Prize.    :-)   :-)   :-)   :-)   :-)

Fortran gave us data abstraction.  Block structured languages gave us
control abstraction.  What I'm looking for now I'll call structure
abstraction.  When I design something, I use little diagrams and pictures
and sometimes raw C code as well.  I then take all that and translate it
into C.  My goal is to have the computer do that for me.  I really don't
care about a nobel prize that much (my ideas aren't really that original),
but I WILL be accepting VISA and Mastercard.

}
}-- 
}John T. Baldwin                      |  johnb%srchtec.uucp@mathcs.emory.edu
}Search Technology, Inc.              | 
}                                     | "... I had an infinite loop,
}My opinions; not my employers'.      |  but it was only for a little while..."

To all who have responded, thank you.  I appreciate the feedback.  This is
helping me refine the concepts.  More to come.

Tim

asylvain@felix.UUCP (Alvin E. Sylvain) (09/26/90)

In article <18102@haddock.ima.isc.com> karl@kelp.ima.isc.com (Karl Heuer) writes:
>In article <59770@iuvax.cs.indiana.edu> bomgard@iuvax.cs.indiana.edu (Tim Bomgardner) writes:
>>Wouldn't it be nice, if ... the compiler would do what I meant [by
>>interpreting the indentation] instead of what I said?
>
>I personally like the idea, but it's not C (nor any other language I use,
>though I understand Occam has it).  So, being realistic, let's say instead:
>Wouldn't it be nice if the compiler (or some related tool) would provide
>(optional!) warnings for possibly misindented code?
>
>Things to worry about: (a) whitespace caused by macro expansion; (b) how to
>count tabs vs. spaces; (c) accepting all plausible personal indentation styles
>(or making it user-configurable).

Just write a PRE-pre-processor.  If the original C code is processed
before the C pre-processor gets it, you'll truly have "what you mean",
in terms of the actual code written, BEFORE any macro expansion.  Tabs
vs. spaces would have to be configured, similarly to your editor.
Indentation styles would have to be standardized by the tool ...
there is an infinite number of styles, and it would be difficult to
try to implement more than 1 or 2.

Since indentation is used for nearly all structured languages, the
tool could produce different constructs for different languages,
e.g., for *.c files, produce {} around indented code, for *.p (*.PAS)
produce begin-end, etc.  There may be some problem in deciding
when to include semi-colons.

This is definately creating a new 'language', much the same way the C
pre-processor allows 'extensions' to C.  If you REALLY want one, it
should be a relatively trivial task to build using 'lex' and 'yacc'.
It can have options for code-generation, or simple verification.

Otherwise, just suffer along with the rest of us, and BE CAREFUL!
(Note:  I always use the '%' command in 'vi', which verifies the
matching of () and {}.  This should be standard practice for all
programmers, along with other standard desk-checking procedures.
Too bad you can't use '%' in Pascal.)
--
------------------------------------------------------------------------
"I got protection for my    |               Alvin "the Chipmunk" Sylvain
affections, so swing your   |   Natch, nobody'd be *fool* enough to have
bootie in my direction!"    |   *my* opinions, 'ceptin' *me*, of course!

miller@GEM.cam.nist.gov (Bruce R. Miller) (09/28/90)

In article <151583@felix.UUCP>, Alvin E. Sylvain writes: 
> In article <18102@haddock.ima.isc.com> karl@kelp.ima.isc.com (Karl Heuer) writes:
>>In article <59770@iuvax.cs.indiana.edu> bomgard@iuvax.cs.indiana.edu (Tim Bomgardner) writes:
>>>Wouldn't it be nice, if ... the compiler would do what I meant [by
>>>interpreting the indentation] instead of what I said?
>>
>>I personally like the idea, but it's not C (nor any other language I use,
>>though I understand Occam has it).  So, being realistic, let's say instead:
>>Wouldn't it be nice if the compiler (or some related tool) would provide
>>(optional!) warnings for possibly misindented code?
>> ...
> Just write a PRE-pre-processor.  ...

At the risk of being flamed from multiple directions [ 1)I'm from the Lisp
camp, 2) I'm from the Emacs camp and 3) There's currently a debate on
comp.lang.lisp on the `virtues (or lack thereof) of lisp syntax'] 

`Good' editors like Emacs have a lisp mode which does a variety of
helpful things for you:

  1) TAB moves you to the appropriate indentation according to the
nesting level where you are typing. [the amount of indentation each kind
of form gets is often customizable. Also there is often single stroke
`newline and tab' command.  If the cursor moves to an odd place you've
got an immediate clue that you've messed up the nesting.
  2) Depending on the capabilities of the console/terminal, the
balancing parenthesis (open or close) will be blinked when the cursor is
on a parenthesis. [On some terminals, perhaps only when you type the
paren.]
  3) There is a command to re-indent a whole expression or definition.
     (c-m-Q)
  4) By typing the comment start character the cursor is moved to an
appropriate (customizable) position to start the comment.
  5) Saving a file will check for balancing parenthesis first.
 ...and so on...

Now granted, not all of this has an exact correspondence in C, and Lisp's
uniform syntax makes it particularly simple to implement these things.
To handle the `nesting' of C you've got to handle, not only {}, () and
[], but also understand some forms like if..else.. switch, etc.
But C does have its syntax rules too. 

I would assume that Gnuemacs would have a pretty decent C mode.  Am I
wrong? Is this less commonly used/known than the equivalents in the Lisp
world?

Ok, have fun...I can take it...
bruce

johnb@srchtec.UUCP (John Baldwin) (09/28/90)

In article <60134@iuvax.cs.indiana.edu> bomgard@iuvax.cs.indiana.edu
 (Tim Bomgardner) writes [in response to my posting]:

>
>Boy, give 'em lex and yacc and tell 'em about LALR(1) and they're ready
>for the big time.  I've produced a compiler or two.  I don't know about
>the "language designers" here, but my parsers have no difficulty at all
>with this sort of grammer.  You wanna know what's hard?  The parser I'm
>working on right now recognizes graphics.
>

Please remember (as I step into the flame-retardant suit), that not all
of us on the net are familiar with each other's proficiency levels
with respect to different aspects of computer science.  I *said* I
was no compiler expert!  (I've got "parts of a compiler" lying about
my den at home; right now it doesn't compile *anything*.  Label me 'novice'.)

Your previous postings [that is, the subset I am familiar with]
would lead me to believe that you were very proficient in programming
in general, but might not have much/any exposure to compilers.

I hope you haven't taken my (original) comments as an affront.
They were never meant to be.


>}#2  Why can't the compiler "catch" the fact that what was meant (above)
>}    isn't what was said?
>}
>}Because a compiler's job is to translate what was *said*.  The only
>}analysis the compiler is required to perform is whatever is germane
>}to that task.
>
>Says you.  In the case of C, I agree.  But I'm not really talking about
>C anymore.

Yes, says me.  In the case of C and other block-structured, procedurally-
oriented languages only.  At least I *thought* that's what we were
originally talking about.         :-}


>
>}#3   Wouldn't it be nice if the compiler would do what I meant, instead
>}     of what I said?
>}

BTW, I'm toying with a language/concept I call DWIRM...
      Do
      What
      I
      Really
      Meant!     :-)



>Fortran gave us data abstraction.  Block structured languages gave us
>control abstraction.  What I'm looking for now I'll call structure
>abstraction.  When I design something, I use little diagrams and pictures
>and sometimes raw C code as well.  I then take all that and translate it
>into C.  My goal is to have the computer do that for me.  I really don't
>care about a nobel prize that much (my ideas aren't really that original),
>but I WILL be accepting VISA and Mastercard.

Now THAT sounds interesting.  Do it right, and you WILL be accepting VISA
and MasterCard.

How about AMEX?

-- 
John T. Baldwin                     | "Pereant qui ante nos nostra dixerunt!"
Search Technology, Inc.             | (A plague on those who said our good
johnb%srchtec.uucp@mathcs.emory.edu |  things before we did!)