dl@g.g.oswego.edu (Doug Lea) (10/24/89)
Two issues involving Strings... Andy Koenig says... > > a+b = c; > > > appears to be legal. (As least it compiled under 1.2.) Is it legal under > > 2.0 ? What does it really mean? Shouldn't the '=' operator be forced > > to only accept an lvalue as its left-hand operand? > > How about making operator+ return a const matrix? > Then you won't be able to assign to it. > > > To tell the truth, I hadn't thought about this issue until this > question forced me to do so. There are zillions of things like > string classes out there that say > > extern String operator+(const String&, const String&); > > and apparently they really should say > > extern const String operator+(const string&, const String&); This doesn't seem like the right solution. Consider String& addeol(String& s) { s += "\n"; return s; } main() { String a, b; //... String c = addeol(a+b); //... } which would be illegal if operator+ returned a const String. (Yes, the form of `addeol' is contrived, but not indefensible.) Actually, I think the `a+b = c;' issue is more of a curiosity -- an inherent difference between classes and builtins -- than a real problem. (There are a couple of other class vs builtin differences along these lines that I briefly mentioned in my Denver Usenix paper.) The code is legal and compiles (at least with libg++ Strings), but results in a temporary being created for (a+b), then modified via the assignment (=c), but never bound to any symbol, so inaccessible. While this looks odd, it does do exactly what the programmer specified. In an unrelated thread, Jerry Schwarz says... > Indeed, more than discussed. This is essentially the method > used by the AT&T 1.2 stream package. There are several > problems with it. Where does the space come from for the string? > How about all the twiddles on formatting available in stdio? > (e.g. the case of the alphabetic "digits" in a hex number) > > But you don't have to choose. Its fairly easy to implement > the functionality of the above without intermediate strings. > > One (among several choices) is > > class decimalString() { > public: > decimalString(int v, int w) : value(v), width(w) { } > int value ; > int width ; > } ; > > ostream& operator<< (ostream& o,decimalString& s) > { > int f = o.flags(); > o << dec << setw(s.w) << s.value ; > o.setf(p,ios::basefield); > return o ; > } > > There is a philosopical point here. In C the builtin types are > special. Its perfectly reasonable to have a C I/O library that > has a lot of formatting stuff for them. In C++ user defined classes > are just as important as the builtin types. What is important is > not that there be a lot of formatting stuff for the builtin types, > but that there be a mechanism for extending the I/O. In C++ it is > usually much better to determine styles of printing, widths and > the like based on the role (type) type of the data rather than > specifying it at each individual I/O statement. > > In hindsight I think I put too much special stuff in the > iostream library for the builtin types. Historically, what > happened was that the builtin type stuff was done first, and > only much later did I develop the extensibility features > (such as xalloc). I see the basic problem here just a little differently. The most primitive stream output routine for printing strings might go something like: ostream& ostream::put(const char* p) { while (*p != 0) put(*p++); return *this; } This can be problematic if you'd like to have ostream << int do something like char* dec(int i); ostream& operator << (ostream& s, int i) { return put(dec(i)); }; since, as Jerry notes, you then have to decide how to allocate the space for the results of dec(). To make dec() reasonably general, you can't just use a fixed static buffer, or else cout << dec(10) << dec(20); would not work right if, for example, the compiler uses right-to-left evaluation (which is legal). So instead, you might want to get around this by employing your off-the-shelf String class: class String { char* s; public: operator const char* () { return s; } //... lots of other stuff }; and redo dec() as String dec(int i); but now, something even more unfortunate can happen in ostream& operator << (ostream& s, int i) { return put(dec(i)); } Since dec() returns a String, but put() wants a char*, the String operator const char* () conversion is made. However, this too can fail! The reason has to do with C++ lifetime rules for temporaries: The temp String returned by dec is `used up' by the char* conversion, so the compiler is allowed to kill it off *before* entering put(). But the `conversion' really just returns a pointer into the String, so if the String is killed off, the pointer is invalid, and things are broken again. In other words, the char* conversion operator cannot just return a pointer, it must allocate some space, and copy the String representation. But where? Back to square one. Here are some solutions: 1) Make an ostream << String operator, and use it exclusively instead of char*'s, from the ground up, in ostreams. This is the right solution in many senses, but is problematic in that it presupposes that there is a single, best String class out there suitable for all needs. But there are many good String classes around. Standardizing on a particular version to serve as the basis for the de facto standard stream library seems premature. 2) Change the C++ rules about lifetimes for temporaries, so that they, like `normal' variables have lifetimes to the end of the enclosing scope. This solution has merit on other grounds as well, but also creates some of its own difficulties. Actually, this may be going too far. The lifetime rules for temporaries say that if a *reference* to a temp (or any part thereof?) is taken (or any ref-returning member function is called?), then its lifetime *is* to the end of the enclosing scope. The char* conversion *behaves* like a reference, but is not one. I once proposed that C++ allow the idiom of a char[]& to mean a reference to a character array. Support of this would solve this (and other) problems, since one could create a char[]& String::chars() { return s /* or whatever */ ; }, call it inside the ostream << int via `return put(dec(i).chars())', and everything would work just right. But no one has ever told me that they particularly like this idea. 3) Have dec() and friends return freestore allocated space, and require that programmers manually delete them. Most users wouldn't like this very much. 4) Use a garbage collection scheme for formatting strings, and/or Strings in general. This seems to be overkill for the problem at hand. Strings themselves are very-well behaved lifetime-wise, it's the char* conversions that raise problems. 5) Create a simple approximation to garbage collection. Set up a pool of space to be used for miscellaneous conversions, and use it for dec(), oct(), and so on. Guarantee that the most recent N (some FIXED number, say, 100) formatting strings will be on hand at any given time. The pool manager can then reuse the space for old formatting strings when needed. Both AT&T 1.2, and libg++-1.36.0 use some variation of this approach. The String const char*() operator may also copy into this pool. The major drawback is that if programmers contrive expressions that requires more than N live formatting strings, then they are out of luck. 6) Avoid reliance on generic conversion functions like dec(), and build special conversion buffers, etc., into the stream classes. AT&T 2.0 streams appear to do something along these lines. As Jerry says, this puts too much smarts in the stream classes, but is entirely safe. Unfortunately, it is also not as easily extensible as one might like. It is awkward (although not impossible) to use this scheme to output, say, arbitrary-precision Integers or other types in which the user class, not the stream class knows how to set things up for formatting. It also limits generality a bit. Formatting strings are sometimes needed for other purposes than ostream output. -- Doug Lea, Computer Science Dept., SUNY Oswego, Oswego, NY, 13126 (315)341-2367 email: dl@oswego.edu or dl%oswego.edu@nisc.nyser.net UUCP :...cornell!devvax!oswego!dl or ...rutgers!sunybcs!oswego!dl
ark@alice.UUCP (Andrew Koenig) (10/25/89)
In article <DL.89Oct24075027@g.g.oswego.edu>, dl@g.g.oswego.edu (Doug Lea) writes: > This doesn't seem like the right solution. Consider > String& addeol(String& s) { s += "\n"; return s; } > main() > { > String a, b; //... > String c = addeol(a+b); > //... > } > which would be illegal if operator+ returned a const String. (Yes, > the form of `addeol' is contrived, but not indefensible.) I suggest that operator+(const String&, const String&) should return a const String precisely so that stuff like the example above will be illegal. The trouble with the example is that the value of a+b is a temporary that can be destroyed as soon as addeol() returns. Thus it seems to me that it should be OK for a compiler to generate code that looks like this: evaluate a+b into a temporary T call addeol(T) and save a reference to the result destroy T copy the saved result of addeol() into c In this case, the `saved result' of addeol will have been destroyed before copying it, so c will be garbage. You might say that this argues that the destruction of the temporary that holds a+b should be deferred until later. Unfortunately, doing that doesn't eliminate the problem, it just makes it less likely. -- --Andrew Koenig ark@europa.att.com
jss@jra.ardent.com (Jerry Schwarz (Compiler)) (10/25/89)
In article <DL.89Oct24075027@g.g.oswego.edu> dl@oswego.edu writes: > >6) Avoid reliance on generic conversion functions like dec(), and >build special conversion buffers, etc., into the stream classes. AT&T >2.0 streams appear to do something along these lines. As Jerry says, >this puts too much smarts in the stream classes, but is entirely safe. My remark was subject to misinterpretation. I'll try to clarify. The 2.0 iostream classes contain mechanisms (xalloc, bitalloc, iword, and pword) to support formatting state for user defined classes. If I were redoing the package I would be inclined to use that general mechanism to deal with the builtins as well. This would eliminate all the special stuff for them. I'm not sure what "special conversion buffers" are. I don't think the iostream library has anything that is reasonably described with that phrase. >Unfortunately, it is also not as easily extensible as one might like. I'm not sure whether this refers to functionality or the amount of effort required to write the extension. It does require more coding than I would like to do some kinds of extensions, but I've achieved a reasonable functionality in all cases I've encountered. Jerry Schwarz
dl@g.g.oswego.edu (Doug Lea) (10/25/89)
I had written... > > String& addeol(String& s) { s += "\n"; return s; } > > > main() > > { > > String a, b; //... > > String c = addeol(a+b); > > //... > > } > > > which would be illegal if operator+ returned a const String. (Yes, > > the form of `addeol' is contrived, but not indefensible.) > Andy replied... > The trouble with the example is that the value of a+b is a temporary > that can be destroyed as soon as addeol() returns. Thus it seems > to me that it should be OK for a compiler to generate code that > looks like this: > > evaluate a+b into a temporary T > call addeol(T) and save a reference to the result > destroy T > copy the saved result of addeol() into c > > In this case, the `saved result' of addeol will have been destroyed > before copying it, so c will be garbage. > > You might say that this argues that the destruction of the temporary > that holds a+b should be deferred until later. Unfortunately, doing > that doesn't eliminate the problem, it just makes it less likely. > It's hard to be sure. In my (draft) copy of the 2.0 Reference Manual, section 12.2, it says The compiler must ensure that a temporary object is destroyed. There are only two things that can be done with a temporary: fetch its value (implicitly copying it) to use in some other expresssion, or bind a reference to it. If the value of a temporary is fetched, that temporary is dead and can be destroyed immediately. If a reference is bound to the temporary, the temporary must not be destroyed until the reference is. This destruction must take place before exit from the scope in which the temporary is created. This statement does not explicitly address what happens with multiple references: addeol makes a ref of the temp holding a+b, and in turn binds another ref to it (the return value). The `right' thing to do is to not kill the temp until the return val reference is destroyed (after construction into c). Of course, the compiler cannot know this if addeol is not inline or is an extern, but I assumed that the above rule requires that a compiler play it safe, and not kill the temp until all references spawned from expressions involving it die. But I see that your interpretation could also be right. As I too weakly implied elsewhere in my last note, I think the temp rules could be strengthened by generalizing this paragraph via the simple statement that a temporary (like any normal variable) may be destroyed only when a compiler can prove that it is no longer useful, or at the end of the enclosing scope, whichever comes first. This rule is similar to those used in other languages. The rule requires that if a temp is involved in any reference-returning member or top-level function, or a reference is bound to any of its parts, it should live. It might also be allowed to live if the compiler can prove that it will be recomputed/reused later in the same block (as may be discovered via available expression analysis), thus killing off the recomputation. On the other hand, it might be killed immediately if it is never used (e.g., the single statement `a+b;'), a possibility ignored in the above. This restatement would thus cover the current cases, and also allow the possiblity of a smarter compiler doing smarter things. (I guess I should add that, as we've gone over before, a rule like this also helps legitimize and extend the current practice of not generating X(X&)-based temporaries at all in some situations.) I should emphasize that the form of addeol is NOT one I recommend. Oh, I should clarify another remark in my last note: OPERAND evaluation order is undefined in C++ for `cout << dec(10) << dec(20)'. OPERATOR evaluation is, of course, left-to-right. -- Doug Lea, Computer Science Dept., SUNY Oswego, Oswego, NY, 13126 (315)341-2367 email: dl@oswego.edu or dl%oswego.edu@nisc.nyser.net UUCP :...cornell!devvax!oswego!dl or ...rutgers!sunybcs!oswego!dl