[comp.lang.icon] icon #comments

goer%sophist@GARGOYLE.UCHICAGO.EDU (Richard Goerwitz) (09/27/90)

Has anyone worked up an algorithm for stripping comments out of Icon
source files without having to parse every expression?  It appears
that the # shouldn't occur except in strings or csets, or as the com-
ment delineator.  If this is correct, it shouldn't be too hard to
strip them.  Is this correct?

-Richard

ralph@CS.ARIZONA.EDU (Ralph Griswold) (09/27/90)

The character # can occur in string and cset literals.

  Ralph Griswold / Dept of Computer Science / Univ of Arizona / Tucson, AZ 85721
  +1 602 621 6609   ralph@cs.arizona.edu  uunet!arizona!ralph

cjeffery@CS.ARIZONA.EDU (Clinton Jeffery) (09/27/90)

(Richard Goerwitz asks about code to strip out comments)

There may be other programs in the Icon Program Library to do this, but
one comment-removal technique is exemplified in the Idol object-oriented
preprocessor.  It is not perfect, and I am always soliciting improvements.
It is characterized as a predicate for determining whether a position is
within a string or cset literal (which is useful for more than comments).

One interesting triviality is that it appears to use no csets.  On purpose.
Here is an extract, with an untested sample procedure main.
--
procedure main()
  while line := read() do {
    line[ 1(x<-find("#",line),notquote(line[1:x])) : 0] := ""
    write(trim(line))
  }
end
#
# tell whether the character *following* s is within a quote or not
#
procedure notquote(s)
  outs := ""
  #
  # eliminate escaped quotes.
  # this is a bug for people who write code like \"hello"...
  s ? {
    while outs ||:= tab(find("\\")+1) do move(1)
    outs ||:= tab(0)
  }
  # see if every quote has a matching endquote
  outs ? {
    while s := tab(find("\""|"'")+1) do {
	if not tab(find(s[-1])+1) then fail
    }
  }
  return
end