schmidt@zola.ics.uci.edu (Doug Schmidt) (03/23/89)
I'm getting an inconsistency when running the following program on a sun 4 versus a sun 3 using g++ 1.34 and libg++ 1.34. Here's the code: ---------------------------------------- #include <stream.h> main() { char alpha[] = "abcdefghijklmnopqrstuvwxyz"; int sz = strlen(alpha); cout << "length-of(alpha) = " << sz << "\n \n"; for (int i=0; i<sz; i++){ char ch = alpha[i]; cout << "`" << ch << "'" << " = " << int(ch) << " = O" << oct(ch) << " = Ox" << hex(ch) << "\n"; } } /* end of main */ ---------------------------------------- On the Sun 3, here are the (incorrect) results: ------------------------------ length-of(alpha) = 26 `a' = 97 = O141 = Ox41 `b' = 98 = O142 = Ox42 `c' = 99 = O143 = Ox43 `d' = 100 = O144 = Ox44 `e' = 101 = O145 = Ox45 `f' = 102 = O146 = Ox46 `g' = 103 = O147 = Ox47 `h' = 104 = O150 = Ox50 `i' = 105 = O151 = Ox51 `j' = 106 = O152 = Ox52 `k' = 107 = O153 = Ox53 `l' = 108 = O154 = Ox54 `m' = 109 = O155 = Ox55 `n' = 110 = O156 = Ox56 `o' = 111 = O157 = Ox57 `p' = 112 = O160 = Ox60 `q' = 113 = O161 = Ox61 `r' = 114 = O162 = Ox62 `s' = 115 = O163 = Ox63 `t' = 116 = O164 = Ox64 `u' = 117 = O165 = Ox65 `v' = 118 = O166 = Ox66 `w' = 119 = O167 = Ox67 `x' = 120 = O170 = Ox70 `y' = 121 = O171 = Ox71 `z' = 122 = O172 = Ox72 ------------------------------ Here's the correct Sun 4 result ---------------------------------------- length-of(alpha) = 26 `a' = 97 = O141 = Ox61 `b' = 98 = O142 = Ox62 `c' = 99 = O143 = Ox63 `d' = 100 = O144 = Ox64 `e' = 101 = O145 = Ox65 `f' = 102 = O146 = Ox66 `g' = 103 = O147 = Ox67 `h' = 104 = O150 = Ox68 `i' = 105 = O151 = Ox69 `j' = 106 = O152 = Ox6a `k' = 107 = O153 = Ox6b `l' = 108 = O154 = Ox6c `m' = 109 = O155 = Ox6d `n' = 110 = O156 = Ox6e `o' = 111 = O157 = Ox6f `p' = 112 = O160 = Ox70 `q' = 113 = O161 = Ox71 `r' = 114 = O162 = Ox72 `s' = 115 = O163 = Ox73 `t' = 116 = O164 = Ox74 `u' = 117 = O165 = Ox75 `v' = 118 = O166 = Ox76 `w' = 119 = O167 = Ox77 `x' = 120 = O170 = Ox78 `y' = 121 = O171 = Ox79 `z' = 122 = O172 = Ox7a ---------------------------------------- Can anyone tell me whether this is a sun3 specific problem (i.e., does it occur on the VAX), and whether it shows a problem with g++, libg++, or my installation!! thanks, Doug -- schmidt@ics.uci.edu | On a clear day, under blue skies, one need not seek office: | And asking about Buddha (714) 856-4043 | Is like proclaiming innocence, | With loot in your pocket.
dl@ROCKY.OSWEGO.EDU (Doug Lea) (03/23/89)
>>I'm getting an inconsistency when running the following program >>on a sun 4 versus a sun 3 using g++ 1.34 and libg++ 1.34. >>Here's the code: >> >>---------------------------------------- >>#include <stream.h> >> >>main() >>{ >> char alpha[] = "abcdefghijklmnopqrstuvwxyz"; >> int sz = strlen(alpha); >> cout << "length-of(alpha) = " << sz << "\n \n"; >> >> for (int i=0; i<sz; i++){ >> char ch = alpha[i]; >> cout << "`" << ch << "'" >> << " = " << int(ch) >> << " = O" << oct(ch) >> << " = Ox" << hex(ch) << "\n"; >> } >> >>} /* end of main */ This is an evaluation order problem. The `form', `dec', `hex', `oct', and `itoa' formatting functions all return pointers to a single character formatting buffer, that is *reused* on each call. Even though the << operator `looks sequential', it is just a regular operator, so g++ is allowed to evaluate operands in any way it sees fit. If g++ decides to evaluate `hex(ch)' *before* `oct(ch)' (as it does in this example on the Vax and Sun3, but not on the Sun4), you are in trouble! There is no good general but simple way out of this at the library implementation level. AT&T libC implements this by carving out recycled pieces of a fixed size buffer, instead of reusing the same variable-sized (Obstack-based) buffer. The AT&T strategy would behave better in your example, but fail in cases where a single formatting conversion overflows the fixed buffer, as might occur for example, when printing out the value of pow(Rational(1001,1000),1000) in libg++/test6. I felt that the most defensible position was to enforce the rule that exactly one format conversion is absolutely guaranteed to be valid at a time, rather than to rely on a method that sometimes does and sometimes does not maintain more (or less!) than one. A statement to this effect *is* hiding in the libg++ doucmentation on format operators. The basic problem is that the formatting functions are defined by Stroustrup to return char*'s which have unknowable lifetimes. It is possible to get more sensible behavior, better approximating this definition (see footnote\*) by using the libg++ String class for formatting work, but this would force people to use the libg++ String class when performing any IO. This would not sit well with people using libg++ streams under OOPS (oops! I mean the NIH class library), for example, which has its own different String class. I will contemplate adding functions like `String octS(int)' as String class functions which would allow people to optionally avoid these kinds of evaluation order and lifetime problems if they choose to #include and use the String class. Such functions could be used transparently in the same way as the regular `oct', etc., functions because of the (automatically applied when necessary) String->char* coercion operator. I will do something along these lines for a forthcoming libg++ release unless I hear of any better suggestions (which are hereby solicited). For now, the easy way for programmers to avoid this kind of problem is to force sequential evaluation via, in this example, cout << "`" << ch << "'"; cout << " = " << int(ch); cout << " = O" << oct(ch); cout << " = Ox" << hex(ch) << "\n"; Note that this care is necessary *only* when using more than one *formatting* function (Currently this includes only form, dec, hex, oct, itoa, BitSettoa, BitStringtoa, and Itoa). No such problems occur when mixing any other arguments to ostream operator <<. --- (/*) Still only *approximating* this definition, as can be seen in the following poor but not illegal code using the String versions, that also helps further illustrate C++ temporary management rules discussed in a previous posting: { //... char* a = octS(ch); char* b = hexS(ch); cout << a << b; } `a' gets a pointer to the start of the char array represented in the compiler-generated String temporary from octS(ch), which, because it is a temporary, is deleted immediately after the assigment. The compiler-generated temporary for `hexS(ch)' very well might reuse the freestore space used for the now-deleted first temp, in which case a==b, and the same kind of problem described above occurs, but this time for very different reasons. Of course, given the way the String class is set up, such problems could not occur if the code were written more sensibly as { //... String a = octS(ch); String b = hexS(ch); cout << a << b; } The moral of this is just that while the (char*)(String) operator is very convenient and useful, when you use it, you get into the sorts of `C-based' pointer and aliasing problems that the String class helps you to avoid. (My desire to make the String class as an attractive, correct, non-error-prone, and efficient substitute for char*'s as possible accounts for its continuing evolution, as well as my recent postings on C++ language extensions and clarifications that would assist these efforts.) Given Stroustrup's definitions of the format functions, the only fully correct solution would be to either never delete/reuse formatting buffers, or to implement a full garbage-collecting storage management facility for *all* C++ storage and pointers (which would be necessary because pointers to a format buffer could be propagated all over a program). Neither sounds attractive. -Doug