[gnu.g++.lib.bug] Strange Sun 3 problem

schmidt@zola.ics.uci.edu (Doug Schmidt) (03/23/89)

I'm getting an inconsistency when running the following program
on a sun 4 versus a sun 3 using g++ 1.34 and libg++ 1.34.
Here's the code:

----------------------------------------
#include <stream.h>

main()
{                                        
 char alpha[] = "abcdefghijklmnopqrstuvwxyz";
 int  sz = strlen(alpha);
 cout << "length-of(alpha) = " << sz << "\n \n"; 

 for (int i=0; i<sz; i++){
   char ch = alpha[i];
   cout << "`"     << ch       << "'"
        << " = "   << int(ch)
        << " = O"  << oct(ch)
        << " = Ox" << hex(ch)  << "\n";
 }

} /* end of main */
----------------------------------------

On the Sun 3, here are the (incorrect) results:

------------------------------
length-of(alpha) = 26

`a' = 97 = O141 = Ox41
`b' = 98 = O142 = Ox42
`c' = 99 = O143 = Ox43
`d' = 100 = O144 = Ox44
`e' = 101 = O145 = Ox45
`f' = 102 = O146 = Ox46
`g' = 103 = O147 = Ox47
`h' = 104 = O150 = Ox50
`i' = 105 = O151 = Ox51
`j' = 106 = O152 = Ox52
`k' = 107 = O153 = Ox53
`l' = 108 = O154 = Ox54
`m' = 109 = O155 = Ox55
`n' = 110 = O156 = Ox56
`o' = 111 = O157 = Ox57
`p' = 112 = O160 = Ox60
`q' = 113 = O161 = Ox61
`r' = 114 = O162 = Ox62
`s' = 115 = O163 = Ox63
`t' = 116 = O164 = Ox64
`u' = 117 = O165 = Ox65
`v' = 118 = O166 = Ox66
`w' = 119 = O167 = Ox67
`x' = 120 = O170 = Ox70
`y' = 121 = O171 = Ox71
`z' = 122 = O172 = Ox72
------------------------------

Here's the correct Sun 4 result
----------------------------------------
length-of(alpha) = 26
 
 `a' = 97 = O141 = Ox61
 `b' = 98 = O142 = Ox62
 `c' = 99 = O143 = Ox63
 `d' = 100 = O144 = Ox64
 `e' = 101 = O145 = Ox65
 `f' = 102 = O146 = Ox66
 `g' = 103 = O147 = Ox67
 `h' = 104 = O150 = Ox68
 `i' = 105 = O151 = Ox69
 `j' = 106 = O152 = Ox6a
 `k' = 107 = O153 = Ox6b
 `l' = 108 = O154 = Ox6c
 `m' = 109 = O155 = Ox6d
 `n' = 110 = O156 = Ox6e
 `o' = 111 = O157 = Ox6f
 `p' = 112 = O160 = Ox70
 `q' = 113 = O161 = Ox71
 `r' = 114 = O162 = Ox72
 `s' = 115 = O163 = Ox73
 `t' = 116 = O164 = Ox74
 `u' = 117 = O165 = Ox75
 `v' = 118 = O166 = Ox76
 `w' = 119 = O167 = Ox77
 `x' = 120 = O170 = Ox78
 `y' = 121 = O171 = Ox79
 `z' = 122 = O172 = Ox7a
---------------------------------------- 

Can anyone tell me whether this is a sun3 specific problem (i.e., does
it occur on the VAX), and whether it shows a problem with g++, libg++,
or my installation!!

thanks,

        Doug
--
schmidt@ics.uci.edu | On a clear day, under blue skies, one need not seek
office:             | And asking about Buddha 
(714) 856-4043      | Is like proclaiming innocence,
                    | With loot in your pocket.

dl@ROCKY.OSWEGO.EDU (Doug Lea) (03/23/89)

>>I'm getting an inconsistency when running the following program
>>on a sun 4 versus a sun 3 using g++ 1.34 and libg++ 1.34.
>>Here's the code:
>>
>>----------------------------------------
>>#include <stream.h>
>>
>>main()
>>{                                        
>> char alpha[] = "abcdefghijklmnopqrstuvwxyz";
>> int  sz = strlen(alpha);
>> cout << "length-of(alpha) = " << sz << "\n \n"; 
>>
>> for (int i=0; i<sz; i++){
>>   char ch = alpha[i];
>>   cout << "`"     << ch       << "'"
>>        << " = "   << int(ch)
>>        << " = O"  << oct(ch)
>>        << " = Ox" << hex(ch)  << "\n";
>> }
>>
>>} /* end of main */

This is an evaluation order problem. The `form', `dec', `hex', `oct',
and `itoa' formatting functions all return pointers to a single
character formatting buffer, that is *reused* on each call. Even
though the << operator `looks sequential', it is just a regular
operator, so g++ is allowed to evaluate operands in any way it sees
fit. If g++ decides to evaluate `hex(ch)' *before* `oct(ch)' (as it
does in this example on the Vax and Sun3, but not on the Sun4), you
are in trouble!

There is no good general but simple way out of this at the library
implementation level. AT&T libC implements this by carving out
recycled pieces of a fixed size buffer, instead of reusing the same
variable-sized (Obstack-based) buffer. The AT&T strategy would behave
better in your example, but fail in cases where a single formatting
conversion overflows the fixed buffer, as might occur for example,
when printing out the value of pow(Rational(1001,1000),1000) in
libg++/test6.  I felt that the most defensible position was to enforce
the rule that exactly one format conversion is absolutely guaranteed
to be valid at a time, rather than to rely on a method that sometimes
does and sometimes does not maintain more (or less!) than one. A
statement to this effect *is* hiding in the libg++ doucmentation on
format operators.

The basic problem is that the formatting functions are defined by
Stroustrup to return char*'s which have unknowable lifetimes. It is
possible to get more sensible behavior, better approximating this
definition (see footnote\*) by using the libg++ String class for
formatting work, but this would force people to use the libg++ String
class when performing any IO. This would not sit well with people
using libg++ streams under OOPS (oops!  I mean the NIH class library),
for example, which has its own different String class. 

I will contemplate adding functions like `String octS(int)' as String
class functions which would allow people to optionally avoid these
kinds of evaluation order and lifetime problems if they choose to
#include and use the String class. Such functions could be used
transparently in the same way as the regular `oct', etc., functions
because of the (automatically applied when necessary) String->char*
coercion operator.  

I will do something along these lines for a forthcoming libg++ release
unless I hear of any better suggestions (which are hereby solicited).

For now, the easy way for programmers to avoid this kind of problem is
to force sequential evaluation via, in this example,

   cout << "`"     << ch       << "'";
   cout << " = "   << int(ch);
   cout << " = O"  << oct(ch);
   cout << " = Ox" << hex(ch)  << "\n";

Note that this care is necessary *only* when using more than one
*formatting* function (Currently this includes only form, dec, hex,
oct, itoa, BitSettoa, BitStringtoa, and Itoa). No such problems occur
when mixing any other arguments to ostream operator <<.

---
(/*) Still only *approximating* this definition, as can be seen in the
following poor but not illegal code using the String versions, that
also helps further illustrate C++ temporary management rules discussed
in a previous posting:

{
  //...
  char* a = octS(ch);
  char* b = hexS(ch);
  cout << a << b;
}

`a' gets a pointer to the start of the char array represented in the
compiler-generated String temporary from octS(ch), which, because it
is a temporary, is deleted immediately after the assigment. The
compiler-generated temporary for `hexS(ch)' very well might reuse the
freestore space used for the now-deleted first temp, in which case
a==b, and the same kind of problem described above occurs, but this
time for very different reasons. Of course, given the way the String
class is set up, such problems could not occur if the code were written
more sensibly as

{
  //...
  String a = octS(ch);
  String b = hexS(ch);
  cout << a << b;
}

The moral of this is just that while the (char*)(String) operator is
very convenient and useful, when you use it, you get into the sorts of
`C-based' pointer and aliasing problems that the String class helps
you to avoid. (My desire to make the String class as an attractive,
correct, non-error-prone, and efficient substitute for char*'s as possible
accounts for its continuing evolution, as well as my recent postings
on C++ language extensions and clarifications that would assist these
efforts.)

Given Stroustrup's definitions of the format functions, the only fully
correct solution would be to either never delete/reuse formatting
buffers, or to implement a full garbage-collecting storage management
facility for *all* C++ storage and pointers (which would be necessary
because pointers to a format buffer could be propagated all over a
program).  Neither sounds attractive.

-Doug