wilson@carcoar.Stanford.EDU (Paul Wilson) (03/11/90)
I've gotten a lot of mail about *how* to do tagged immediate floats, but also some asking *why* it's important. A couple of people thought that a generational gc should *alleviate* the problem of heap-allocated floats considerably; they wanted to know why I thought the opposite. It turns out that heap-allocated floats can be a BIG problem for a generational gc because they can violate the assumptions that the generational strategy depends on. Generational gc's assume that most objects are short-lived, and that few pointers are created that point from an object in an older generation to an object in a younger generation. Heap-allocating float values can violate this assumption because a lot of unnecessary objects are created *and* pointers to those objects are often stored into older objects. Consider the case of updating every element of a million-element (4-megabyte) array. (Perhaps you're incrementing each element by the corresponding element of another array, or whatever.) In this case, you allocate several words on the heap for each element of your four-megabyte array -- probably 8 megabytes or so. This will cause your youngest generation to fill and be scavenged several times, unless it is huge. Not only will you perform several extra scavenges, but all of those objects that you created will survive the scavenges, because they are pointed to from the array in older memory. That means you not only allocated them, but you copy them at least once, to get them out of the youngest generation. So for each float value you update, you spend a handful of instructions to heap allocate it and at least another handful to copy what you allocated. And things can get worse. A generational gc must keep track of all pointers from older generations into younger ones. (This is necessary to ensure that all live objects in the younger generations can be found at scavenge time, and all pointers to them can be updated.) If you use the simple strategy of keeping a list ("remembered set") of all of the objects you've stored such pointers into, you can run into trouble. In the case of updating our million-element array, we'll have to scan the whole array at every scavenge that the extra allocation forces. That means scanning four megabytes every time we allocate up through the youngest generation (probably a quarter or a half megabyte of allocation.) This not only increases processing overhead, but is likely to cause thrashing. Tektronix Smalltalk's Large Object Space alleviates this problem for many programs by managing large objects specially. (See Caudill & Wirfs-Brock's OOPSLA 86 paper and Ungar and Jackson's OOPSLA 88 paper.) My gc is also less sensitive to this sort of thing because it keeps dirty bits for small fixed-size areas of memory ("cards") rather than tracking objects. (See my OOPSLA 89 paper.) But neither of these tricks always works. The large object space doesn't work if you have a large data structure made up of small objects, which gets promoted into an older generation and then a lot of the objects get updated. (This is a variant of the "pig-in-the-snake" problem, to use Jon L White's term.) And in a really bad case, card marking may not help either -- your cards may not be smaller than your objects. So really bad locality of writes can be bad news. Note that a smart compiler, which keeps intermediate floating-point values in registers or on a special stack (a la Maclisp or S-1 Lisp) will *not* help a lot with this problem. The floats that cause this problem are not the intermediate values in an expression, they're the results that get stored back into the heap. The upshot is that it's preferable to use immediate values when you can, rather than creating heap objects that may violate the assumptions of the gc. If you can update a float value in place, do it. This seems to me particularly important because float-intensive programs often *do* use lots of them in *big* data structures. One tried-and-true way to punt on this at the language level is to provide primitive homogeneous arrays. (e.g., array-of-double) But that requires that the user do un-Lispy or un-Smalltalky things -- writing FORTRAN-like programs. And it won't work for arbitrary non-array objects that just happen to have a float in some slot. If a user just uses a float because something is conceptually a real number, but doesn't need really high precision and fast math, it shouldn't choke the gc. So my philosophy is to provide short immediate floats for such "casual" purposes. Then the issue becomes one of whether to sacrifice a little precision to make room for a tag, or to sacrifice a little speed to allow compacting most of them without loss of precision. -- Paul Paul R. Wilson Software Systems Laboratory lab ph.: (312) 996-9216 U. of Illin. at C. EECS Dept. (M/C 154) wilson@bert.eecs.uic.edu Box 4348 Chicago,IL 60680
chao@sunflower.crd.ge.com (William Chao) (03/19/90)
In the paper "Basic Polymorphic typechecking," by Luca Cardelli, Science of Computer Programming 8 (1987) pp. 147-172. On page 150: ----------------------------------------------------------------------- Polymorphism in language comes from the interaction of two contrasting programming language design goals: static typing and reusability. : : Polymorphic type systems try to reconcile these two goals by providing all the safety of statically typed languages, and most (but not all) the flexibility of untyped languages. ----------------------------------------------------------------------- My 1st question: Why he says "but not all?" 2nd question: It seems that the "full" flexibility of untyped languages will never be achieved by static typing, then static binding. Does this imply that we got to have dynamic binding to accomplish the full reusability? 3rd question: People (especially real-time systems) are against dynamic binding because it is too slow. Some people use the same reason (too slow) to against object-oriented programming. Can we do somethings about this? I will appreciate whoever in this news group gives me the answer.
lgm@cbnewsc.ATT.COM (lawrence.g.mayka) (03/19/90)
In article <6143@crdgw1.crd.ge.com> chao@sunflower.crd.ge.com (William Chao) writes: >In the paper "Basic Polymorphic typechecking," by Luca Cardelli, >Science of Computer Programming 8 (1987) pp. 147-172. On page 150: >----------------------------------------------------------------------- >Polymorphic type systems try to reconcile these two goals >by providing all the safety of statically typed languages, >and most (but not all) the flexibility of untyped languages. >----------------------------------------------------------------------- >My 1st question: Why he says "but not all?" Languages without mandatory compile-time typing, such as Common Lisp, are able to manipulate information elements of heterogeneous, unrelated types in a uniform manner. A Common Lisp list, for example, can contain elements of completely different types - an integer, a complex number, a function, another list, etc. - and operate on them all in a uniform fashion. All types are reusable to the programmer of the list, and the list is reusable to writers of new types in the future. >2nd question: It seems that the "full" flexibility of untyped languages > will never be achieved by static typing, then static binding. > Does this imply that we got to have dynamic binding to accomplish > the full reusability? In my opinion, *yes*. >3rd question: People (especially real-time systems) are against dynamic binding > because it is too slow. Some people use the same reason (too > slow) to against object-oriented programming. Can we do somethings > about this? Several approaches: a) Show that a dynamically bound system can or does meet the real-time constraints *of your application* right now. b) Look for, or agitate for, a better-optimizing language implementation. Much optimization is based on essentially the same principle as demand paging and RAM caches: Make the usual cases faster, at the expense of the rarer ones, while maintaining a simple, uniform abstraction for the application programmer. c) Look for, or agitate for, a more powerful processor. Of course, greater expense may require competitive justification - e.g., greater functionality, flexibility, productivity, or reusability. But arguments just such as these have in the past been successfully employed to justify C over assembly language, the UNIX System over MS-DOS, the X Window System over native window systems, etc. d) Go ahead and prototype your system on a high-reusability platform. You may find that a higher level of abstraction permits the use of better algorithms, resulting in *better* total performance than low-reusability implementations. Even if not, the computer industry's amazing price/performance increases may, over time, make your solution feasible. Lawrence G. Mayka AT&T Bell Laboratories lgm@ihlpf.att.com Standard disclaimer.
brian@comp.vuw.ac.nz (Brian Boutel) (03/19/90)
In article <6143@crdgw1.crd.ge.com>, chao@sunflower.crd.ge.com (William Chao) writes: > In the paper "Basic Polymorphic typechecking," by Luca Cardelli, > Science of Computer Programming 8 (1987) pp. 147-172. On page 150: > ----------------------------------------------------------------------- > Polymorphism in language comes from the interaction of two contrasting > programming language design goals: static typing and reusability. > : > : > Polymorphic type systems try to reconcile these two goals > by providing all the safety of statically typed languages, > and most (but not all) the flexibility of untyped languages. > ----------------------------------------------------------------------- > My 1st question: Why he says "but not all?" > 2nd question: It seems that the "full" flexibility of untyped languages > will never be achieved by static typing, then static binding. > Does this imply that we got to have dynamic binding to accomplish > the full reusability? > 3rd question: People (especially real-time systems) are against dynamic binding > because it is too slow. Some people use the same reason (too > slow) to against object-oriented programming. Can we do somethings > about this? > I will appreciate whoever in this news group gives me the answer. Typing schemes generally disallow some programs that could be written in an untyped language. An example of this is self-application, where a function can take itself as an argument. This is not to say that no type system can deal with this. The problem is finding an algorithm that will do the type inference. Dynamic binding was a mistake. Even Lisp people now accept this. The problem is that the meaning of a name can only be determined at runtime, because it depends on the dynamic environment, and this precludes compile-time analysis of the program, so reliability goes out of the window. It's slow too, but this is not the fundamental problem. A boss told be years ago that it is better to be late that wrong! --brian Internet: brian@comp.vuw.ac.nz UUCP: ...!uunet!vuwcomp!brian Postal: Brian Boutel, Computer Science Dept, Victoria University of Wellington, PO Box 600, Wellington, New Zealand Phone: +64 4 721000 Fax: +64 4 712070
sean@castle.ed.ac.uk (S Matthews) (03/19/90)
In article <6143@crdgw1.crd.ge.com> chao@sunflower.crd.ge.com (William Chao) writes: >In the paper "Basic Polymorphic typechecking," by Luca Cardelli, >Science of Computer Programming 8 (1987) pp. 147-172. On page 150: >----------------------------------------------------------------------- >Polymorphism in language comes from the interaction of two contrasting >programming language design goals: static typing and reusability. > : > : >Polymorphic type systems try to reconcile these two goals >by providing all the safety of statically typed languages, >and most (but not all) the flexibility of untyped languages. >----------------------------------------------------------------------- >My 1st question: Why he says "but not all?" Because there is a theoretical limit to the amount of type-checking that is automatable. To check, at compile time, that a program written in (say) Lisp is well typed you need basically to verify the program, and that is undecideable. You could think of the amount of type checking that is possible as analogous to (decidable) propositonal calculus, while to get the sort of type checking you would like in Lisp, you need something analogous to (undecidable) predicate calculus. >2nd question: It seems that the "full" flexibility of untyped languages > will never be achieved by static typing, then static binding. > Does this imply that we got to have dynamic binding to accomplish > the full reusability? This is not really true, but you would need to bolt a theorem prover onto the side of your compiler (see above). >3rd question: [no answer here due to incompetence] Sean