[comp.lang.fortran] What's a legitimate floating-point number?

dgh%dgh@Sun.COM (David Hough) (12/02/89)

It's an interesting exercise trying to implement standard Fortran I/O,
given the complexity of list-directed, formatted with BLANK=ZERO,
formatted with BLANK=NULL.  Add namelist I/O if you want to be
commercially successful.  Add VAX VMS compatibility too if you want
to steal DEC's market share.

The latter is tricky.   It turns out that under F3.0, for instance,
VAX VMS Fortran will accept ".e0" and "++1" as legitimate floating-point
numbers, both returning the value 0, which is just what you expected,
of course, if you understood that part of the Fortran-77 standard
which says

13.5.9.2.1
	The input field consists of an optional sign, followed by a string of
	digits optionally containing a decimal point.

Well if the "string of digits" can be an EMPTY string of digits, then
you'll understand that .e0 is equivalent to 0.e0 and ++1 is equivalent
to 0+1 which is equivalent to 0e+1.

Many people would consider that stretching the intent of the standard,
but it turns out that there are a lot of data files containing just a point
that programs expect to convert to a number, zero presumably.  And we've
had bug reports because we didn't accept ".e0" in list-directed input.
Bob Corbett thought of the "++1" example; we haven't yet had a request
for VAX VMX compatibility in that respect.

I decided to investigate what it was we did accept.  It turns out that
at least Sun Fortran 1.1 and 1.2 produce the following results from 
the test program reproduced at the end of this note:

error?	input string			format

 no error  input > . e0x    < ZERO fmt   (BZ,e5.1)  x   0.
 no error  input > . e0x    < NULL fmt   (BN,e5.1)  x   0.
    ERROR  input > . e0x    <    * fmt              x    -1.00000
 no error  input > .e0x     < ZERO fmt   (BZ,e5.1)  x   0.
 no error  input > .e0x     < NULL fmt   (BN,e5.1)  x   0.
    ERROR  input > .e0x     <    * fmt              x    -1.00000
 no error  input >. e0x     < ZERO fmt   (BZ,e5.1)  x   0.
 no error  input >. e0x     < NULL fmt   (BN,e5.1)  x   0.
    ERROR  input >. e0x     <    * fmt              x    -1.00000
    ERROR  input >.e0x      < ZERO fmt   (BZ,e5.1)  x    -1.00000
    ERROR  input >.e0x      < NULL fmt   (BN,e5.1)  x    -1.00000
    ERROR  input >.e0x      <    * fmt              x    -1.00000
 no error  input >.e0       < ZERO fmt   (BZ,e5.1)  x   0.
 no error  input >.e0       < NULL fmt   (BN,e5.1)  x   0.
    ERROR  input >.e0       <    * fmt              x    -1.00000
    ERROR  input >++1x      < ZERO fmt   (BZ,e5.1)  x    -1.00000
    ERROR  input >++1x      < NULL fmt   (BN,e5.1)  x    -1.00000
    ERROR  input >++1x      <    * fmt              x    -1.00000
    ERROR  input >++1       < ZERO fmt   (BZ,e5.1)  x    -1.00000
    ERROR  input >++1       < NULL fmt   (BN,e5.1)  x    -1.00000
    ERROR  input >++1       <    * fmt              x    -1.00000

The difference between .e0 and .e0x is a little hard to fathom,
but otherwise the rule seems to be that .e0 is OK for formatted input,
even when blanks are supposed to be ignored, but is not OK for
list-directed input.
This doesn't make a lot of sense.  As an experiment
I changed the floating-point
number scanner so that it produced the following output:

 no error  input > . e0x    < ZERO fmt   (BZ,e5.1)  x   0.
    ERROR  input > . e0x    < NULL fmt   (BN,e5.1)  x    -1.00000
    ERROR  input > . e0x    <    * fmt              x    -1.00000
    ERROR  input > .e0x     < ZERO fmt   (BZ,e5.1)  x    -1.00000
    ERROR  input > .e0x     < NULL fmt   (BN,e5.1)  x    -1.00000
    ERROR  input > .e0x     <    * fmt              x    -1.00000
 no error  input >. e0x     < ZERO fmt   (BZ,e5.1)  x   0.
    ERROR  input >. e0x     < NULL fmt   (BN,e5.1)  x    -1.00000
    ERROR  input >. e0x     <    * fmt              x    -1.00000
    ERROR  input >.e0x      < ZERO fmt   (BZ,e5.1)  x    -1.00000
    ERROR  input >.e0x      < NULL fmt   (BN,e5.1)  x    -1.00000
    ERROR  input >.e0x      <    * fmt              x    -1.00000
    ERROR  input >.e0       < ZERO fmt   (BZ,e5.1)  x    -1.00000
    ERROR  input >.e0       < NULL fmt   (BN,e5.1)  x    -1.00000
    ERROR  input >.e0       <    * fmt              x    -1.00000
    ERROR  input >++1x      < ZERO fmt   (BZ,e5.1)  x    -1.00000
    ERROR  input >++1x      < NULL fmt   (BN,e5.1)  x    -1.00000
    ERROR  input >++1x      <    * fmt              x    -1.00000
    ERROR  input >++1       < ZERO fmt   (BZ,e5.1)  x    -1.00000
    ERROR  input >++1       < NULL fmt   (BN,e5.1)  x    -1.00000
    ERROR  input >++1       <    * fmt              x    -1.00000

Working this way, .e0 is never accepted and ". e0" is accepted
only when blanks are to be interpreted as zeros.  (leading blanks
are perhaps not supposed to count).  Is this an improvement?

For those who are curious, the test program is

        subroutine test(d)
	real x
        character*10 s,d
        s = "(BZ,e5.1)"
        x=-1
        read (d,s, ERR = 3) x
        print *," no error "," input >",d,"< ZERO fmt   ",s," x ",x
	goto 31
 3	continue
	print *,"    ERROR "," input >",d,"< ZERO fmt   ",s," x ",x
 31	continue
        s = "(BN,e5.1)"
        x=-1
        read (d,s, ERR = 2) x
        print *," no error "," input >",d,"< NULL fmt   ",s," x ",x
	goto 21
 2	continue
	print *,"    ERROR "," input >",d,"< NULL fmt   ",s," x ",x
 21	continue
        s = "         "
        x=-1
        read (d,*, ERR = 1) x
        print *," no error "," input >",d,"<    * fmt   ",s," x ",x
	goto 11
 1	continue
	print *,"    ERROR "," input >",d,"<    * fmt   ",s," x ",x
 11	continue
        end     

	character*10 d
	d = " . e0x"
	call test(d)
	d = " .e0x"
	call test(d)
	d = ". e0x"
	call test(d)
	d = ".e0x"
	call test(d)
	d = ".e0"
	call test(d)
	d = "++1x"
	call test(d)
	d = "++1"
	call test(d)
	end

David Hough

na.hough@na-net.stanford.edu

dgh%dgh@Sun.COM (David Hough) (12/05/89)

It turns out I needn't have sweated the issue of ".e0" so much.  Whether
or not you want to grab some of DEC's market, you'll need to pass the FCVS
to obtain significant commercial success.  And it turns out to pass FCVS 110
you'll need to be able to recognize the following four 15-character fields
as zero:

"               "
"+              "
"+ .        D+00"
"    .        D0"

This is in the default (BLANK=NULL or BN) mode.

Now is this consistent with the Fortran-77 standard?  You can read about
the properties of BLANK=NULL and BLANK=ZERO, or BN and BZ, in the standard.
You might then take exception to the foregoing tests, but it hardly matters.
You won't get a lot of satisfaction out of complaining to the 
Federal Software Testing Center.  
The last time I did that (1986, about some other matters)
I got a report dated 1 April 1983 listing the things I'd complained about
as "under study".  Presumably they're still being studied.

Anyway, I updated the test program to properly distinguish these
cases and a few others and reran it.  After making the required
modifications to the floating-point number syntax scanner, the
new results are

 no error  input >     < ZERO fmt    x   0.
 no error  input >     < NULL fmt    x   0.		fcvs case
    ERROR  input >     <    * fmt    x    -1.00000
 no error  input >+    < ZERO fmt    x   0.
 no error  input >+    < NULL fmt    x   0.		fcvs case
    ERROR  input >+    <    * fmt    x    -1.00000
    ERROR  input >++1  < ZERO fmt    x    -1.00000
    ERROR  input >++1  < NULL fmt    x    -1.00000
    ERROR  input >++1  <    * fmt    x    -1.00000
 no error  input >+ +1 < ZERO fmt    x   0.
 no error  input >+ +1 < NULL fmt    x   0.
    ERROR  input >+ +1 <    * fmt    x    -1.00000
    ERROR  input >.e0  < ZERO fmt    x    -1.00000
    ERROR  input >.e0  < NULL fmt    x    -1.00000
    ERROR  input >.e0  <    * fmt    x    -1.00000
    ERROR  input >+.e0 < ZERO fmt    x    -1.00000
    ERROR  input >+.e0 < NULL fmt    x    -1.00000
    ERROR  input >+.e0 <    * fmt    x    -1.00000
    ERROR  input > .e0 < ZERO fmt    x    -1.00000
    ERROR  input > .e0 < NULL fmt    x    -1.00000
    ERROR  input > .e0 <    * fmt    x    -1.00000
 no error  input >+ .e0< ZERO fmt    x   0.
 no error  input >+ .e0< NULL fmt    x   0.
    ERROR  input >+ .e0<    * fmt    x    -1.00000
 no error  input >. e0 < ZERO fmt    x   0.
 no error  input >. e0 < NULL fmt    x   0.
    ERROR  input >. e0 <    * fmt    x    -1.00000
 no error  input >+. e0< ZERO fmt    x   0.
 no error  input >+. e0< NULL fmt    x   0.
    ERROR  input >+. e0<    * fmt    x    -1.00000
 no error  input > . e0< ZERO fmt    x   0.
 no error  input > . e0< NULL fmt    x   0.		fcvs case
    ERROR  input > . e0<    * fmt    x    -1.00000
 no error  input >1 .0 < ZERO fmt    x    10.00000
 no error  input >1 .0 < NULL fmt    x     1.00000
 no error  input >1 .0 <    * fmt    x     1.00000
 no error  input >. 1  < ZERO fmt    x     1.00000E-02
 no error  input >. 1  < NULL fmt    x     1.00000E-01
    ERROR  input >. 1  <    * fmt    x    -1.00000
 no error  input >.1 2 < ZERO fmt    x     1.02000E-01
 no error  input >.1 2 < NULL fmt    x    0.120000
 no error  input >.1 2 <    * fmt    x     1.00000E-01

diffs with the current Fortran release are just

<  no error  input >.e0  < ZERO fmt    x   0.
<  no error  input >.e0  < NULL fmt    x   0.
>     ERROR  input >.e0  < ZERO fmt    x    -1.00000
>     ERROR  input >.e0  < NULL fmt    x    -1.00000
<  no error  input >+.e0 < ZERO fmt    x   0.
<  no error  input >+.e0 < NULL fmt    x   0.
>     ERROR  input >+.e0 < ZERO fmt    x    -1.00000
>     ERROR  input >+.e0 < NULL fmt    x    -1.00000
<  no error  input > .e0 < ZERO fmt    x   0.
<  no error  input > .e0 < NULL fmt    x   0.
>     ERROR  input > .e0 < ZERO fmt    x    -1.00000
>     ERROR  input > .e0 < NULL fmt    x    -1.00000

So I have succeeded in distinguishing this degenerate case from
others which I know must pass.

By the way, the Fortran-77 standard is quite clear that an all-blank
field must be recognized as a zero under NULL mode but doesn't say
so explicitly under ZERO mode.  So that's one of the inferences I
drew.  In general, both NULL and ZERO modes are supposed to ignore
leading blanks.  Subsequent blanks are interpreted as zeros (ZERO mode)
or ignored (NULL mode) but if you do that then "+ " won't be recognized
and you'll fail FCVS 110.  So in effect you discover that under NULL
mode blanks don't count for anything in terms of the value of the number
but any non-trailing blank legitimizes an input if a zero in the same
place would have legitimized it.

Revised source for my test program is

        subroutine test(d)
        real x
        character*10 s
	character*5 d
        s = "(BZ,f5.1)"
        x=-1
        read (d,s, ERR = 3) x
        print *," no error "," input >",d,"< ZERO fmt   "," x ",x
        goto 31
 3      continue
        print *,"    ERROR "," input >",d,"< ZERO fmt   "," x ",x
 31     continue
        s = "(BN,f5.1)"
        x=-1
        read (d,s, ERR = 2) x
        print *," no error "," input >",d,"< NULL fmt   "," x ",x
        goto 21
 2      continue
        print *,"    ERROR "," input >",d,"< NULL fmt   "," x ",x
 21     continue
        s = "         "
        x=-1
        read (d,*, ERR = 1) x
        print *," no error "," input >",d,"<    * fmt   "," x ",x
        goto 11
 1      continue
        print *,"    ERROR "," input >",d,"<    * fmt   "," x ",x
 11     continue
        end

        character*10 d
        d = "     "
        call test(d)
        d = "+    "
        call test(d)
        d = "++1  "
        call test(d)
        d = "+ +1 "
        call test(d)
        d = ".e0  "
        call test(d)
        d = "+.e0 "
        call test(d)
        d = " .e0 "
        call test(d)
        d = "+ .e0"
        call test(d)
        d = ". e0 "
        call test(d)
        d = "+. e0"
        call test(d)
        d = " . e0"
        call test(d)
	d = "1 .0 "
        call test(d)
	d = ". 1  "
        call test(d)
	d = ".1 2 "
        call test(d)
        end


David Hough

na.hough@na-net.stanford.edu

dgh%dgh@Sun.COM (David Hough) (12/06/89)

It turns out I needn't have sweated the issue of ".e0" so much.  Whether
or not you want to grab some of DEC's market, you'll need to pass the FCVS
to obtain significant commercial success.  And it turns out to pass FCVS 110
you'll need to be able to recognize the following four 15-character fields
as zero:

"               "
"+              "
"+ .        D+00"
"    .        D0"

This is in the default (BLANK=NULL or BN) mode.

Now is this consistent with the Fortran-77 standard?  You can read about
the properties of BLANK=NULL and BLANK=ZERO, or BN and BZ, in the standard.
You might then take exception to the foregoing tests, but it hardly matters.
You won't get a lot of satisfaction out of complaining to the 
Federal Software Testing Center.  
The last time I did that (1986, about some other matters)
I got a report dated 1 April 1983 listing the things I'd complained about
as "under study".  Presumably they're still being studied.

Anyway, I updated the test program to properly distinguish these
cases and a few others and reran it.  After making the required
modifications to the floating-point number syntax scanner, the
new results are

 no error  input >     < ZERO fmt    x   0.
 no error  input >     < NULL fmt    x   0.		fcvs case
    ERROR  input >     <    * fmt    x    -1.00000
 no error  input >+    < ZERO fmt    x   0.
 no error  input >+    < NULL fmt    x   0.		fcvs case
    ERROR  input >+    <    * fmt    x    -1.00000
    ERROR  input >++1  < ZERO fmt    x    -1.00000
    ERROR  input >++1  < NULL fmt    x    -1.00000
    ERROR  input >++1  <    * fmt    x    -1.00000
 no error  input >+ +1 < ZERO fmt    x   0.
 no error  input >+ +1 < NULL fmt    x   0.
    ERROR  input >+ +1 <    * fmt    x    -1.00000
    ERROR  input >.e0  < ZERO fmt    x    -1.00000
    ERROR  input >.e0  < NULL fmt    x    -1.00000
    ERROR  input >.e0  <    * fmt    x    -1.00000
    ERROR  input >+.e0 < ZERO fmt    x    -1.00000
    ERROR  input >+.e0 < NULL fmt    x    -1.00000
    ERROR  input >+.e0 <    * fmt    x    -1.00000
    ERROR  input > .e0 < ZERO fmt    x    -1.00000
    ERROR  input > .e0 < NULL fmt    x    -1.00000
    ERROR  input > .e0 <    * fmt    x    -1.00000
 no error  input >+ .e0< ZERO fmt    x   0.
 no error  input >+ .e0< NULL fmt    x   0.
    ERROR  input >+ .e0<    * fmt    x    -1.00000
 no error  input >. e0 < ZERO fmt    x   0.
 no error  input >. e0 < NULL fmt    x   0.
    ERROR  input >. e0 <    * fmt    x    -1.00000
 no error  input >+. e0< ZERO fmt    x   0.
 no error  input >+. e0< NULL fmt    x   0.
    ERROR  input >+. e0<    * fmt    x    -1.00000
 no error  input > . e0< ZERO fmt    x   0.
 no error  input > . e0< NULL fmt    x   0.		fcvs case
    ERROR  input > . e0<    * fmt    x    -1.00000
 no error  input >1 .0 < ZERO fmt    x    10.00000
 no error  input >1 .0 < NULL fmt    x     1.00000
 no error  input >1 .0 <    * fmt    x     1.00000
 no error  input >. 1  < ZERO fmt    x     1.00000E-02
 no error  input >. 1  < NULL fmt    x     1.00000E-01
    ERROR  input >. 1  <    * fmt    x    -1.00000
 no error  input >.1 2 < ZERO fmt    x     1.02000E-01
 no error  input >.1 2 < NULL fmt    x    0.120000
 no error  input >.1 2 <    * fmt    x     1.00000E-01

diffs with the current Fortran release are just

<  no error  input >.e0  < ZERO fmt    x   0.
<  no error  input >.e0  < NULL fmt    x   0.
>     ERROR  input >.e0  < ZERO fmt    x    -1.00000
>     ERROR  input >.e0  < NULL fmt    x    -1.00000
<  no error  input >+.e0 < ZERO fmt    x   0.
<  no error  input >+.e0 < NULL fmt    x   0.
>     ERROR  input >+.e0 < ZERO fmt    x    -1.00000
>     ERROR  input >+.e0 < NULL fmt    x    -1.00000
<  no error  input > .e0 < ZERO fmt    x   0.
<  no error  input > .e0 < NULL fmt    x   0.
>     ERROR  input > .e0 < ZERO fmt    x    -1.00000
>     ERROR  input > .e0 < NULL fmt    x    -1.00000

So I have succeeded in distinguishing this degenerate case from
others which I know must pass.

By the way, the Fortran-77 standard is quite clear that an all-blank
field must be recognized as a zero under NULL mode but doesn't say
so explicitly under ZERO mode.  So that's one of the inferences I
drew.  In general, both NULL and ZERO modes are supposed to ignore
leading blanks.  Subsequent blanks are interpreted as zeros (ZERO mode)
or ignored (NULL mode) but if you do that then "+ " won't be recognized
and you'll fail FCVS 110.  So in effect you discover that under NULL
mode blanks don't count for anything in terms of the value of the number
but any non-trailing blank legitimizes an input if a zero in the same
place would have legitimized it.

Revised source for my test program is

        subroutine test(d)
        real x
        character*10 s
	character*5 d
        s = "(BZ,f5.1)"
        x=-1
        read (d,s, ERR = 3) x
        print *," no error "," input >",d,"< ZERO fmt   "," x ",x
        goto 31
 3      continue
        print *,"    ERROR "," input >",d,"< ZERO fmt   "," x ",x
 31     continue
        s = "(BN,f5.1)"
        x=-1
        read (d,s, ERR = 2) x
        print *," no error "," input >",d,"< NULL fmt   "," x ",x
        goto 21
 2      continue
        print *,"    ERROR "," input >",d,"< NULL fmt   "," x ",x
 21     continue
        s = "         "
        x=-1
        read (d,*, ERR = 1) x
        print *," no error "," input >",d,"<    * fmt   "," x ",x
        goto 11
 1      continue
        print *,"    ERROR "," input >",d,"<    * fmt   "," x ",x
 11     continue
        end

        character*10 d
        d = "     "
        call test(d)
        d = "+    "
        call test(d)
        d = "++1  "
        call test(d)
        d = "+ +1 "
        call test(d)
        d = ".e0  "
        call test(d)
        d = "+.e0 "
        call test(d)
        d = " .e0 "
        call test(d)
        d = "+ .e0"
        call test(d)
        d = ". e0 "
        call test(d)
        d = "+. e0"
        call test(d)
        d = " . e0"
        call test(d)
	d = "1 .0 "
        call test(d)
	d = ". 1  "
        call test(d)
	d = ".1 2 "
        call test(d)
        end



David Hough

na.hough@na-net.stanford.edu