[comp.std.c++] design by committee

mjv@objects.mv.com (Michael J. Vilot) (11/22/90)

Henry Spencer ``hoisted a storm warning'' about the dangers of standards
committees who invent language features, particularly when they ignore the
experience gained through ``prior art.''  I agree with him about the potential
danger.  However, as Bjarne pointed out, there is little evidence of it in the
current membership of the X3J16 committee (OK, there is some, but they're in
the minority ;-).

I'd like to contribute a couple of thoughts.  First, the example of the `noalias
episode of X3J11 was indelibly impressed upon the members of X3J16 who attended
the March meeting.  The sentiment to avoid a repetition is high on the list of
reasons I've heard cited for resisting gratuitous inventions.  On the other
hand, there seems to be sincere desire to do better than trigraphs as a way to
satisfy the legitimate needs of national character sets.

Second, we face a difficult situation when specifying the components of the C++
standard library.  Most of us, as C++ users, have used AT&T's `cfront' or a
derivative.  That means that `streams' (1.2), `iostreams' (2.0), `complex' and
(in some cases) `tasks' constitute the bulk of ``prior art'' in the area of a
standard library for C++ (I consider InterViews, NIHCL, libg++, and others as
libraries distinct from the standard -- which might be an interesting thread).

The availability of templates and exceptions has a substantial impact on how I
design libraries in C++.  I would hope that the library portion of the C++ 
standard would make the best use of the language.  Yet we have little ``prior
art'' in libraries using these features -- particularly the I/O classes.  On the
other hand, we should not gratuitously invalidate the existing C++ code using
streams.

It's a difficult design challenge -- I hope you will contribute your thoughts.

--
Mike Vilot,  ObjectWare Inc, Nashua NH
mjv@objects.mv.com  (UUCP:  ...!decvax!zinn!objects!mjv)

henry@zoo.toronto.edu (Henry Spencer) (11/24/90)

In article <1016@zinn.MV.COM> mjv@objects.mv.com (Michael J. Vilot) writes:
>...there seems to be sincere desire to do better than trigraphs as a way to
>satisfy the legitimate needs of national character sets.

I would have hoped that X3J16 would not be re-hashing all the dumb ideas
that X3J11 carefully considered and carefully rejected for good reasons.
However, given that Bjarne was one of the handful of people pushing this
specific dumb idea, I suppose I should have expected it...

The right answer to national character sets is ISO Latin 1 or equivalent,
not ridiculous contortions in language syntax that *every* compiler
*everywhere* then has to be able to parse.  Trigraphs were a mistake.

Remind me to submit a proposal to X3J16 to change C++ so that it can be
typed using only the intersection of a Model 26 keypunch and an ASR-33.

>The availability of templates and exceptions has a substantial impact on how I
>design libraries in C++.  I would hope that the library portion of the C++ 
>standard would make the best use of the language.  Yet we have little ``prior
>art'' in libraries using these features ...

Hmm.  Now that is a sticky problem.  I fear the obvious answer is to try to
produce upward-compatible extensions, so that existing code works but newer
code can take advantage of the new facilities.  Awkward.  See what you get
when you start adding language features? :-) :-) :-)

This sort of thing actually did come up a little bit in the C library, for
example in the type of the parameter to ctime().  X3J11 opted not to mess
with historical practice.  But they weren't facing a problem anywhere
near the size of this one.
-- 
"I'm not sure it's possible            | Henry Spencer at U of Toronto Zoology
to explain how X works."               |  henry@zoo.toronto.edu   utzoo!henry

domo@tsa.co.uk (Dominic Dunlop) (11/26/90)

In article <1990Nov23.211727.2802@zoo.toronto.edu> henry@zoo.toronto.edu
(Henry Spencer) writes:
> In article <1016@zinn.MV.COM> mjv@objects.mv.com (Michael J. Vilot) writes:
> >...there seems to be sincere desire to do better than trigraphs as a way to
> >satisfy the legitimate needs of national character sets.
> 
> I would have hoped that X3J16 would not be re-hashing all the dumb ideas
> that X3J11 carefully considered and carefully rejected for good reasons.
> However, given that Bjarne was one of the handful of people pushing this
> specific dumb idea, I suppose I should have expected it...
> 
> The right answer to national character sets is ISO Latin 1 or equivalent,
> not ridiculous contortions in language syntax that *every* compiler
> *everywhere* then has to be able to parse.  Trigraphs were a mistake.

Yes.  Strange, isn't it, that the Danes are so in love with seven-bit
character sets?  We've seen it in C, we're seeing it in POSIX.  Looks as
though it's making trouble in C++.

I'm sorry if that sounds like an ethnic slur, but, as Henry says,
equipment which talks using an 8-bit character set such as ISO Latin 1
is an obvious (minimum) requirement for program development.  Any
software shop which shackles its staff to inadequate hardware deserves
every bit of productivity that it fails to get out of them.  Those who
suggest that the standards community should spend its time on
grandfathering in support for coded character sets already superseded
because of their clear inadequacies stand in grave danger of fashioning
standards for the past, not for the future.  While there might have
been a case for doing this with C, a language which, beacuse of the
time of its development, had a number of dependencies on a particular
seven-bit coded character set (ASCII), it seems to me to be
counter-productive to expend much effort on providing support for
variants of that old character set in a new language -- C++.


I hope that was clear.  Now let me muddy it a bit.  While I can see no
reason for the development of C++ software to be carried out on
inadequate hardware, it may be that the resulting programs have to
support an installed base of inadequate hardware.  Such is life.  I'm
talking about cross-development tools running on new hardware, but
churning out binary code for old.  No reason why the problem
shouldn't be solved that way.  Hell, it's not a new idea.  (Whereas
trigraphs were -- and a poor one, at that.)  How long has COBOL had an
environment division?  Not that anybody uses it much, I'll grant you.
But then, like trigraphs, maybe it just seemed a good idea at the time...
-- 
Dominic Dunlop

tom@ssd.csd.harris.com (Tom Horsley) (11/27/90)

domo> I'm sorry if that sounds like an ethnic slur, but, as Henry says,
domo> equipment which talks using an 8-bit character set such as ISO Latin 1
domo> is an obvious (minimum) requirement for program development.  Any
domo> software shop which shackles its staff to inadequate hardware deserves
domo> every bit of productivity that it fails to get out of them.

I dunno... Maybe trigraphs were a good idea, they are not too hard to
implement in a compiler, but they are absolutely *miserable* to use.  Maybe
the idea was to make using them hurt so much that people would upgrade their
obsolete systems? :-):-):-)
--
======================================================================
domain: tahorsley@csd.harris.com       USMail: Tom Horsley
  uucp: ...!uunet!hcx1!tahorsley               511 Kingbird Circle
                                               Delray Beach, FL  33444
+==== Censorship is the only form of Obscenity ======================+
|     (Wait, I forgot government tobacco subsidies...)               |
+====================================================================+

steve@taumet.com (Stephen Clamage) (12/02/90)

tom@ssd.csd.harris.com (Tom Horsley) writes:

>Maybe trigraphs were a good idea, they are not too hard to
>implement in a compiler, but they are absolutely *miserable* to use.

Trigraphs are not all that easy to implement efficiently, either; they
really do slow down the compiler.  The scanner, which dominates compiler
front-end time, requires 3-character lookahead, not to mention
complicating the interpretation of end-of-line for finding the ends of
macros.  Our original straightforward implementation of trigraphs
caused a 15% slowdown of the compiler front end.  We spent quite a bit
of time finding an efficient way to handle them, and reduced the
overhead to about 5%.  Please note this affects every program ever
compiled, even ones which contain no trigraphs.
-- 

Steve Clamage, TauMetric Corp, steve@taumet.com

tom@ssd.csd.harris.com (Tom Horsley) (12/02/90)

>>>>> Regarding Re: design by committee (was: templates and exceptions in g++?); steve@taumet.com (Stephen Clamage) adds:

steve> Our original straightforward implementation of trigraphs
steve> caused a 15% slowdown of the compiler front end.  We spent quite a bit
steve> of time finding an efficient way to handle them, and reduced the
steve> overhead to about 5%.  Please note this affects every program ever
steve> compiled, even ones which contain no trigraphs.

I don't want to sound too insulting here, but I would say you have a seriously
flawed design. I worked on a ANSI C scanner as a sort of academic exercise
while trying to fully understand the way the macro processor works, and my
scanner has no additional overhead to speak of even if you do use trigraphs.

The key to making this work fast is recognizing that you have to examine
each character in the buffer to classify it as you go along anyway. I used a
<ctype.h>-like array that marked "interesting" characters and embedded the
check in a getc()-like macro. The macro normally returns the next character
using inline code, but if an interesting character shows up it calls a
subroutine to do additional processing.  A '\0' character is interesting
because I might have to re-fill the buffer, A '\\' character is interesting
because it might be followed by a newline and both of them will have to be
squeezed out (remember that a backslash followed by a newline has always
been a special sequence you had to check for even before question-mark
question-mark came along - the overhead for tri-graphs is no worse than
this).  With tri-graphs, '?' is now also an interesting character.  Sticking
an extra check for the ?? tri-graph sequence in the subroutine that is only
invoked when an interesting character comes along does not cost that much
extra (unless you have a LOT of question marks in your source code). The
tricky part is making sure you go ahead and fill the buffer if you are
within 4 characters of the end and handling the case of a line terminated by
??/ followed by a newline.

When I do find something like a tri-graph or a \ newline, I squeeze them out
and replace them with what really belongs there. The routine knows where the
current token starts in the buffer, so it just shifts it right to take up
the slack, then it returns the proper character and scanning continues
normally. This allows me to handle the phases of translation which process
tri-graphs and backslash newlines transparently in the GetNextCharacter
macro while I am also busting up the source into tokens.  I can also leave
the tokens in the input buffer without wasting the time copying them around
unless I have to do something like squeeze out a trigraph.
--
======================================================================
domain: tahorsley@csd.harris.com       USMail: Tom Horsley
  uucp: ...!uunet!hcx1!tahorsley               511 Kingbird Circle
                                               Delray Beach, FL  33444
+==== Censorship is the only form of Obscenity ======================+
|     (Wait, I forgot government tobacco subsidies...)               |
+====================================================================+