[comp.std.c] And yet another scanf interpretation question

chris@mimsy.umd.edu (Chris Torek) (01/04/90)

The draft says that scanf's %[efg] formats match `an optionally
signed floating-point number, whose format is the same as expected
for the subject string of the |strtod| function.'  This in turn
is defined as an optional sign, followed by a non-optional digit
sequence, followed by an optional exponent.  The exponent, if
present, has the form: `e' or `E', followed by an optional sign,
followed by a non-optional digit sequence.

Thus, the question that applies to strtol and strtoul (as to whether
a sign followed by no digits is acceptable) does not apply.  A
different question then rears its ugly head:

If the number `looks right' up to a point, but then fails to match
the constraints imposed on it, what is to happen?  We have the following
possible sequences we can feed scanf() when it is matching %[efg]:

	.e10			[missing mantissa digits]
	+1.2345e		[missing exponent digits]
	-e			[missing both digits]

This much is clear:  These can only be considered a matching failure.
The draft goes on to say, however, that `If conversion terminates on
a conflicting input character, the offending input character is left
unread in the input stream.'  This can only be meant to imply `conflicting
with a literal character from the format string', not `conflicting with
the format required by a conversion such as %f'.  Alas, the draft does
*not* say what input character(s) are left unread in the case of a
matching failure.  This question arises only for numeric formats
(and perhaps only for floating-point, depending upon whether `%d' should
accept bare `-' and `+').

Note that the most useful answer---that the entire malformed floating
point number remains unconsumed---requires mandating an arbitrary amount
of pushback (or, equivalently, lookahead): the `floating point number'

    1.111111111111111111111111111111111111111111111111111111111e-

looks just fine until the lack of a digit following the `-' shows up.

The question, then, can be stated as follows:

  What is the condition of the input stream when a matching failure
  occurs `deep inside' a conversion?

(We intend to allow an arbitrary amount of pushback, so whatever the
answer to this question, it is easy for me to handle; but I want to know
what the standard intends.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris

gwyn@smoke.BRL.MIL (Doug Gwyn) (01/05/90)

In article <21625@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes:
>	+1.2345e		[missing exponent digits]

The characters through '5' are consumed and 'e' remains "unread in the
input stream", which in practice means pushed-back or its moral
equivalent.  (If available, peek-ahead could be employed to avoid
having to push anything back.  It amounts to the same thing EXCEPT
for possible interaction with ungetc(), which is an ugly can of
worms.  The Standard deliberately does not consider these "unread"
characters as pushed-back in the sense of ungetc().)

>The draft goes on to say, however, that `If conversion terminates on
>a conflicting input character, the offending input character is left
>unread in the input stream.'  This can only be meant to imply `conflicting
>with a literal character from the format string', not `conflicting with
>the format required by a conversion such as %f'.

No; see also line 35 on page 136 of the December 1988 draft.
It really does mean that the peeked-ahead characters that failed to
match remain "unread".

>Note that the most useful answer---that the entire malformed floating
>point number remains unconsumed---requires mandating an arbitrary amount
>of pushback (or, equivalently, lookahead): the `floating point number'
>    1.111111111111111111111111111111111111111111111111111111111e-
>looks just fine until the lack of a digit following the `-' shows up.

No, the string up to the 'e' is of the expected form and must be
properly converted.  Only three characters of peek-ahead suffice to
detect that the apparent exponent part really isn't an exponent part.