[comp.lang.fortran] Fxref & Flink -- deficiencies?

silvert@cs.dal.ca (Bill Silvert) (07/03/89)

Mike Fischbein has pointed out to me that the fxref and flink utilities
which I have written and distributed do not handle suitably obfuscated,
but perfectly legal, Fortran code.  For example, they cannot find
the correct variable names in the following lines of code:

	DO 100 I = 1.5
	DO100I = 1,5

and Mike wonders whether lex-based analyzers can handle Fortran syntax.

As the above example shows, lexical analysis of Fortran requires
complete analysis of each line.  Ignoring blanks is easy (just modify
input.h as distributed with flink and fxref to skip blanks), but a
complete analytical tool would have to include the complete Fortran
parser.  Therefore I always use white space to separate tokens in my
code, and the tools I develop use this to simplify the task.  If you
don't insert white space, my tools won't help you.

Granted, it is cute to include code like MY MAN = NO GOOD in one's code,
but flink and fxref don't know from cute.  I am sure that lex could
handle this sort of thing if one worked at it (BASIC has the same
problem, and I have seen a BASIC written with lex), but where I work we
write very simple, straightforward, and (I hope) clean code, and these
tools do the job for us.

By the way, if anyone has a Fortran beautifier available, we would like
to get one.
-- 
Bill Silvert, Habitat Ecology Division.
Bedford Institute of Oceanography, Dartmouth, NS, Canada B2Y 4A2
	UUCP: ...!{uunet,watmath}!dalcs!biomel!bill
	Internet: biomel@cs.dal.CA	BITNET: bs%dalcs@dalac.BITNET

hankd@pur-ee.UUCP (Hank Dietz) (07/05/89)

In article <1989Jul3.125106.27708@cs.dal.ca> silvert@cs.dal.ca (Bill Silvert) writes:
>Mike Fischbein has pointed out to me that the fxref and flink utilities
>which I have written and distributed do not handle suitably obfuscated,
>but perfectly legal, Fortran code.  For example, they cannot find
>the correct variable names in the following lines of code:
>
>	DO 100 I = 1.5
>	DO100I = 1,5
>
>and Mike wonders whether lex-based analyzers can handle Fortran syntax.
>
>As the above example shows, lexical analysis of Fortran requires
>complete analysis of each line.  Ignoring blanks is easy (just modify
>input.h as distributed with flink and fxref to skip blanks), but a
>complete analytical tool would have to include the complete Fortran
>parser.  Therefore I always use white space to separate tokens in my
>code, and the tools I develop use this to simplify the task.  If you
>don't insert white space, my tools won't help you.

It is not possible to use lex without a wee bit of help, despite the
suggestion (in Aho & Ullman, Principles of Compiler Design, page 108) that
simple lookahead scanning for a "," would suffice.  The reason is simple;
one must count nesting level for parens to determine if commas are enclosed
within parens.  This can't be done by a "pure" DFA recognizer.  For example:

	DO 10 I=A(1,10)
	DO 10 I=B(C(1),10)
	DO 10 I=D(A(1,((10))), C(B(1,10)), 5)

are all assignments to the variable "DO10I", whereas:

	DO10I = C(1),10
	DO10I = A(1,10),((B))
	DO10I = (E+A(1,((10)))), (C(B(1,10))- 5)

are all DO loop headers.  Yuck.  By the way, even fairly reasonable folk
sometimes rely on spaces not being separators:

	GO TO 10

instead of:

	GOTO 10

and the variable:

	I LIKE FORTRAN

seems much friendlier than:

	ILIKEFORTRAN

Remember, Fortran doesn't allow "_" in variable names.

Personally, despite the above examples, I think this rule of Fortran 77 does
great harm to the readability of one's code because it encourages
inconsistent use of spaces, as well as making the compiler noticibly more
ad-hoc.  I'd like to see this "feature" go away...  is it still in 8x?

								-hankd

PS: I know removing this feature would break old code, but it is not all that
    difficult to write an ad-hoc program which will automatically "clean up"
    the spacing of old code.

silvert@cs.dal.ca (Bill Silvert) (07/05/89)

In article <12112@pur-ee.UUCP> hankd@pur-ee.UUCP (Hank Dietz) writes:
>are all DO loop headers.  Yuck.  By the way, even fairly reasonable folk
>sometimes rely on spaces not being separators:
>
>	GO TO 10
>
>instead of:
>
>	GOTO 10

I found Dietz's comments very interesting, and I wanted to note that flink
and fxref do recognize some of the common variants, such as both GOTO and
GO TO, and both ENDIF and END IF.  They do not recognize unusual variants like
EN DIF, and spaces in variable names (e.g. HOT DOG) are not acceptable.

It would be nice if Fortran 8x accepted the _ as a character, which would 
also facilitate interfacing code in C and other languages.  I wish I had some
easy way to identify some subroutine names as being non-Fortran, since I often
have to call C and assembler routines to handle graphics, etc., and the ability
to use a leading _ (just as C uses a tailing _) would help.

-- 
Bill Silvert, Habitat Ecology Division.
Bedford Institute of Oceanography, Dartmouth, NS, Canada B2Y 4A2
	UUCP: ...!{uunet,watmath}!dalcs!biomel!bill
	Internet: biomel@cs.dal.CA	BITNET: bs%dalcs@dalac.BITNET

msf@sneezy.crd.ge.com (Mike Fischbein) (07/05/89)

Just to forstall a flood of "Oh, here's how to fix the DO loop parsing"
type messages, let me point out that there are NO reserved words in FORTRAN
and that any spelled out operator, such as IF, THEN, STOP, END, etc pose
identical problems.  This is not just a case of clean coding vs someone
trying to be clever; this pops up in real, useful, FORTRAN as well.

		mike

Michael Fischbein, Technical Consultant
Sun Professional Services
Sun Albany 518-783-9613     sunbow!msf or mfischbein@sun.com
These are my opinions and not necessarily those of any other person or

morice@prlhp1.prl.philips.co.uk (morice) (07/17/89)

I keep seeing requests etc. for Fortran-77 tools, the best set I know of is
called TOOLPACK and was the result of collaborative research between several
institutions in USA and UK.  The UK contact is:

   NAG Ltd.
   Wilkinson House
   Jordan Hill Road
   OXFORD
   OX2 8DR
   England

It is essentially free being distributed at only 100 pounds (approx.) only.