xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) (09/07/90)
howell@bert.llnl.gov (Louis Howell) writes: >xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) writes: >Now if there actually were a standard---IEEE, >ANSI, or whatever---then the compilers should certainly support >it. Recent comments in this newsgroup show, however, that there >isn't even a general agreement on what a standard should look like. [see below] >>[7.0 million lines of code...] >This is the only one of your arguments that I can really sympathize >with. I've never worked directly on a project of anywhere near that >size. As a test, however, I just timed the compilation of my own >current project. 4500 lines of C++ compiled from source to >executable in 219 seconds on a Sun 4. Scaling linearly to 7 million >lines gives 3.41e5 seconds or about 95 hours of serial computer >time---large, but doable. Adding in the human time required to >deal with the inevitable bugs and incompatibilities, it becomes >clear that switching compilers is a major undertaking that should >not be undertaken more often than once a year or so. Yeah, especially since code effort scales more like 1.5th power of LOC. It took 200 programmers + extra for managers 18 months to port that code across a compiler update, with lots of automated assistance. >The alternative, though, dealing with a multitude of different >modules each compiled under slightly different conditions, sounds >to me like an even greater nightmare. Misapprehension. As typical for a large business, this was hundreds of independently running programs, accessing common, commonly formatted data [and not in a shared memory, but shared database, but the concept of a shared memory system this large exists in practice for example in air traffic control], but it illustrates the problem of insisting on porting all of a large software suite at once because of a compiler change, your "solution" to the shared memory situation to which solution I was trying to take exception. >You can still recompile and test incrementally if you maintain >separate test suites for each significant module of the code. If >the only test is to run a single 7 million line program and see if >it smokes, your project is doomed from the start. (1/2 :-) ) Yep, every independent program had its own extensive set of regression tests, thus the ~300 man years of porting effort. >Again, most users don't work in this type of environment. A >monolithic code should be written in a very stable language to >minimize revisions. Here I must disagree. It is exactly in such huge mulligans of software that the maintenance cost reductions promised by object encapsulation offer the greatest rewards. In fact, as commented elsewhere here recently, it is only with the advent of such truely awesome piles of software that the frustrations of the software engineer have called out most loudly for the "silver bullet" that OOP is trying to provide. This _is_ the target for which we should be specifying our languages, even if the present experience with the news OOPLs is limited to considerably more modest programs as engineers "get their feet wet" in OOP. >Hey, I'm a user too! I do numerical analysis and fluid mechanics. >What I do want is the best tools available for doing my job. If >stability were a big concern I'd work in Fortran---C++ is considered >pretty radical around here. I think the present language is a >big improvement over alternatives, but it still has a way to go. But what has made FORTRAN so valuable to the (hard) engineering profession is exactly that the "dusty decks" still run. I doubt that the originators of FORTRAN envisioned _at_that_time_ a set of applications software that would outlast the century being written with the first compilers, but so it has proved. With the perspective of history to assist us, we know that stability makes for a useful language, and should try to make all the important decisions for long term utility of the objects written today as early as feasible, not put them off in the interests of granting "flexibility" to the compiler writer. Unlike the middle '50's, today we have a plethora of highly experienced compiler writers to guide our projects; we can depend that a lot of the "best" ways, or ways good enough to compete well with the eventual best are already in hand. This doesn't deny the possibility of progress, or even breakthroughs, nor suggest that either be prevented. Instead, let's install the mechanisms now that will let today's objects be the "dusty decks" of the 2010's, while leaving options to bypass those mechanisms where some other goal (speed, size, orthogonality) is more important to a piece of code than stability. >As a compromise, why don't we add to the language the option of >specifying every detail of structure layout---placement as well >as ordering. This will satisfy users who need low-level control >over structures, without forcing every user to painfully plot >out every structure. Just don't make it the default; most people >don't need this capability, and instead should be given the best >machine code the compiler can generate. And here at last, I think we agree. Let the compiler writers have a ball. Just give me a switch, like the optimization switches now common, to turn it all off and preserve my own explicit control over structure layout if that is a real need for my application (or just for me to go to bed with a warm fuzzy feeling that I need not expect a 2AM "the system just upchucked your code" call) to have that control. [At last answering the first quoted paragraph:] But, to have control of structure layout as described in this thread (across time, files, memory space, and comm lines (or some subset -- not worth arguing over)), there needs to be _now_ an agreed standard for what the specification of the layout of a structure that I write _means_, bit by bit, byte by byte. As Ron noted, C "allows" arbitrary amounts of padding between fields in a structure, but "nobody" does anything but the sensible single or double word alignment padding. Let's pick one layout now in use (take the obvious hits if there is a big endian, little endian conflict), and make it the standard (or, make the standard such that I can force e.g. "two byte alignment", "four byte alignment" or whatever, so long as I am consistent about it among the modules accessing the data; sounds like an object candidate to me! ;-), and publish it for all compiler writers to implement as a choice allowed forthe user who needs this level of control. There has been enough common practice in C structure layout implementations to observe and adopt some part of it by now. Again, this is just the standard for what I mean when _I_ take control of laying out a structure. If I give that control to the compiler writer, I'd better make no assumptions at all in my code about the result, because it is explicitly allowed to be "unstandard", and I have chosen to write at a high level and delegate those details to the compiler writer's ingenuity. Peace? Kent, the man from xanth. <xanthian@Zorch.SF-Bay.ORG> <xanthian@well.sf.ca.us>
howell@bert.llnl.gov (Louis Howell) (09/07/90)
In article <1990Sep6.194543.7685@zorch.SF-Bay.ORG>, xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) writes: |> But what has made FORTRAN so valuable to the (hard) engineering |> profession is exactly that the "dusty decks" still run. I doubt |> that the originators of FORTRAN envisioned _at_that_time_ a set |> of applications software that would outlast the century being |> written with the first compilers, but so it has proved. With the |> perspective of history to assist us, we know that stability makes |> for a useful language, and should try to make all the important |> decisions for long term utility of the objects written today as |> early as feasible, not put them off in the interests of granting |> "flexibility" to the compiler writer. Unlike the middle '50's, |> today we have a plethora of highly experienced compiler writers |> to guide our projects; we can depend that a lot of the "best" |> ways, or ways good enough to compete well with the eventual best |> are already in hand. This doesn't deny the possibility of |> progress, or even breakthroughs, nor suggest that either be |> prevented. Instead, let's install the mechanisms now that will |> let today's objects be the "dusty decks" of the 2010's, while |> leaving options to bypass those mechanisms where some other |> goal (speed, size, orthogonality) is more important to a piece |> of code than stability. IMHO, another reason for Fortran's long dominion is that it is (now) a user language, not a CS language. It's used by people who know no other language, and who have no interest in programming beyond its use as a tool for their specialty. (How many C programmers know only C?) It can be easy to forget that users who know only one language often strongly resist switching to a second one, but learning your seventh language is easy. Computing professionals tend to learn many languages, and thus these languages go out of fashion more quickly. C++ may influence future languages, but I very much doubt if it will be directly compatible with them. I think the proper role for C++ will be more like that of Algol. To get dusty decks, you need a user base outside the CS community that creates hard-to-translate information. Scientific computing is the only present non-CS user base that does much of its own programming (unless you count BASIC or Cobol :-), so I doubt that another language will rise to monolithic dominion in another field in the forseeable future. It doesn't have to be just programs, though. You want to know what the "dusty decks" of the 2010's will be? They'll be data and possibly macro packages written for PC applications like 1-2-3. Other possibilities include formatting languages like TeX, and hard-to-translate graphics languages like Postscript, but these can't be considered reusable code. |> a ball. Just give me a switch, like the optimization switches now |> common, to turn it all off and preserve my own explicit control |> over structure layout if that is a real need for my application The extreme case would give you functionality something like the Fortran EQUIVALENCE statement. Create an array of chars, then make the first member start at char 3, the second member start at char 1, the third member start at char 4 (overlaps first member), and so on. I am a bit reluctant to make class layout rigidly defined unless the user has direct control over it. What if we agree to justify on single-word boundaries, then some future architecture makes it efficient to pack classes with no leftover space at all? We'd be stuck with a wasteful construct and no way to work around it in the language. I think there should be either no control, or more control than a single switch. I have no good idea as to what syntax would be best for this, however. |> Again, this is just the standard for what I mean when _I_ take |> control of laying out a structure. If I give that control to the |> compiler writer, I'd better make no assumptions at all in my code |> about the result, because it is explicitly allowed to be "unstandard", |> and I have chosen to write at a high level and delegate those details |> to the compiler writer's ingenuity. Good. Two levels of functionality, for two kinds of code. |> Peace? Agreed. Unfortunately, the standards committee doesn't give a damn whether we agree or not, or what we agree on, and even within the newsgroup we are now most likely in everyone else's kill files. -- Louis Howell "A few sums!" retorted Martens, with a trace of his old spirit. "A major navigational change, like the one needed to break us away from the comet and put us on an orbit to Earth, involves about a hundred thousand separate calculations. Even the computer needs several minutes for the job."
jimad@microsoft.UUCP (Jim ADCOCK) (09/12/90)
In article <1990Sep6.194543.7685@zorch.SF-Bay.ORG> xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) writes: >As Ron noted, C "allows" arbitrary amounts of padding between >fields in a structure, but "nobody" does anything but the sensible >single or double word alignment padding. Let's pick one layout >now in use (take the obvious hits if there is a big endian, little >endian conflict), and make it the standard (or, make the standard >such that I can force e.g. "two byte alignment", "four byte alignment" >or whatever, so long as I am consistent about it among the modules >accessing the data; sounds like an object candidate to me! ;-), and >publish it for all compiler writers to implement as a choice allowed >forthe user who needs this level of control. Let's assume for the moment we're just talking "C" structures, and leave "C++" issues out of it. Even with just good old "C" structures you run into the following problems: Big endian, Little endian, or something in between? There's more than two flavors of byte ordering -- two ways to pack two bytes into a short, but 24 ways to pack four bytes into a long. And yes, there are machines using some of these non-endian approaches. Double byte, quad byte alignment? Some machines "only" support double byte alignment. Some "only" support quad byte alignment. [Some only support octal byte alignment] A compiler can't practically offer you a choice if the underlying hardware doesn't support it. Bit field ordering. Pack from the low end, or the high end? Signed or not? 32bit max, or 16bit max? 32bit subfield max, or 16bit subfield max? Pack over a double byte boundary, or break? Signed chars, or not? 16bit ints, or 32bit ints? 0:16, 16:16, 16:16+16, 0:32, 0:32+16, 16:32, 16:32+16, 0:48, or 0:64bit ptrs? Or what combination thereof allowed in structures? ....So my claim is that there is no format that can be agreed upon that will run on a large percentage of the machines with acceptible performance. -- And this is just with issues of "C" structures -- no inheritence issues, no vtable, vbase issues, member ptrs, no calling convention issues, etc. that arise in C++. Ellis and Stroustrup do a good job of pointing out a few of the implementation choices that can be made, although they barely scratch the surface. I counterpropose that programmers program so as to avoid the need for bitwise-equivalence across machines or compilers. Because in simple reality, its never going to exist.
xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) (09/12/90)
[Context omitted due to size; please review last referenced article] Without going to the trouble to filter through and analyze that list step by step, you have enormously complicated a question by introducing many non-pertinent items. If the goal is to share structures in a shared memory architecture, between two vendor's compilers, or two generations of one vendor's compilers, then none of the questions of endedness, bit ordering, efficient bit path width for fetches and stores will vary, nor need they be considered when laying out a standard for structure packing. Those problems exist _independently_ of any compiler, they are at the hardware level, and so a shared memory system among processors which disagree on such questions would very likely not be programmable in a compiled language in any case, and so would be either assembly programmed or, better, never built. If the goal is to share structures from an offline store across many generations of a vendor's compiler, then again either the hardware bit, byte and word access strategies are constant, and that level of detail need not be considered, or one is forced to pay the major headache of doing a bit by bit translation of a data base when changing machine architectures. (I've taken a 36 bit machine's 50,000 reels of mag tape data to a 32 bit machine; I'm intimately familiar with the problems involved here; I wrote the record reformatter.) Again, this is a problem that exists independently of compilers at all, and so that level of detail is not needed. If the goal is to share structures across a communications network, then either smart translators rearrange the "endedness" or bit ordering of the data, or not, depending on the correspondence between the architectures at each end, but when the structures are handed to the non-translator software, the standard at the appropriate level of detail still provides unchanged access to the data, since it's natural units are still packed the same way, albeit they may be reordered appropriate to the architecture within (not among) themselves. Again, the question of which end of the word arrives first across the communications channel is independent of any compiler, and the almost certainly assembly language translator does not see the data in the same structured way that a C++ object is viewed. The same thing happens if the data structures are to be shared between two architectures via an offline medium. While one level of translation to take into account architectural differences may be needed, at the level of packing those differences are not so important; when trying to decide whether to pack two byte entities on two or four byte boundaries, it matters not a whit whether they have their most significant bits/bytes first or last. On the question of pointers, it is well known that pointers do not port offline, so structures containing pointers will _always_ be revised to contain counts or other ways of indicating how a linked structure should be rebuilt. Again, this translation must happen independently of even whether the ultimate consumer of the data was compiled or assembled, and is independent of the question of structure layout standards. For internal consumption, (finessing for the moment the question of near-far architectural bogosities) the shape of a pointer is constant, and so the question again does not cause any difficulties with defining a structure layout standard. The question of whether to pack bits across various byte boundaries does need to be considered, but so long as the software at the C++ programmmer's level sees the results as consecutive bits, as mandated by the standard, it need not be terribly interesting how the hardware sees them. I'm not sure I'm comfortable with allowing an arithmetic shift as a way to access those bits; I think that needs to be buried; but I'm quite comfortable with saying "pack bitfield m after bitfield n in the third int16; now assign this value to bitfield m" and having the compiler cause the architecture to do what I say. What I gain by knowing this is the ability to move the larger (byte, int16, int32) unit, in which I have packed a set of bitfields, as an entity, for efficiency, and knowing for certain which bitfields I just moved, and that it was efficient in fact and not a gather-scatter operation going on where I cannot see it happening except in an assembly dump of my code. I apologize if this is not expressed terribly clearly; I am not tremendously expert in these areas. Kent, the man from xanth. <xanthian@Zorch.SF-Bay.ORG> <xanthian@well.sf.ca.us>
bobatk@microsoft.UUCP (Bob ATKINSON) (09/14/90)
Kent Paul Dolan writes: >If the goal is to share structures in a shared memory >architecture, between two vendor's compilers, or two generations >of one vendor's compilers, then none of the questions of >endedness, bit ordering, efficient bit path width for fetches and >stores will vary, nor need they be considered when laying out a >standard for structure packing. Those problems exist >_independently_ of any compiler, they are at the hardware level, My understanding was, for instance, that it would be quite resonable for one C or C++ compiler to implement signed bit fields, and another, unsigned bitfields. A similar situation applies to the grouping chunk size of bit fields. A choice of processor may suggest certain sizes, but it by no means pins these things down precisely. Jim correctly identifies these as compiler issues, not hardware ones. >and so a shared memory system among processors which disagree on >such questions would very likely not be programmable in a compiled >language in any case, and so would be either assembly programmed >or, better, never built. Correcting for your mistake in attributing these characteristics to hardware when in fact they are compiler issues, your argument in analogy would say that there is no point trying to share memory structures between programs written with two different compilers. Such a practice clearly has things to watch out for, but is frequently done. A similar sentence would apply to heterogeneous multi-processors. >Kent, the man from xanth. Bob Atkinson Microsoft
xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) (09/14/90)
bobatk@microsoft.UUCP (Bob ATKINSON) writes: >Kent Paul Dolan writes: > >>If the goal is to share structures in a shared memory >>architecture, between two vendor's compilers, or two generations >>of one vendor's compilers, then none of the questions of >>endedness, bit ordering, efficient bit path width for fetches and >>stores will vary, nor need they be considered when laying out a >>standard for structure packing. Those problems exist >>_independently_ of any compiler, they are at the hardware level, > >My understanding was, for instance, that it would be quite resonable >for one C or C++ compiler to implement signed bit fields, and another, >unsigned bitfields. A similar situation applies to the grouping >chunk size of bit fields. A choice of processor may suggest >certain sizes, but it by no means pins these things down precisely. >Jim correctly identifies these as compiler issues, not hardware >ones. I realize I'm being incredibly obtuse here, but "huh"? If I want to pack, from a particular four four-bit fields A,B,C,D, and eight two-bit fields p,q,r,s,t,u,v,w, in order A,p,B,q,C in the first two byte int, and r,s,t,D,u,v,w in the second two byte int, for my own malicious reasons, what does it matter whether the bit fields are signed or not, for heavens sake? Help me out here. I see lots of complications being introduced in the name of not solving this problem that don't seem to have anything to do with it. What am I missing? Kent, the man from xanth. <xanthian@Zorch.SF-Bay.ORG> <xanthian@well.sf.ca.us>