jeff@aiai.ed.ac.uk (Jeff Dalton) (05/01/91)
In article <641@zogwarg.etl.army.mil> hoey@zogwarg.etl.army.mil (Dan Hoey) writes: >The fact that GET/GETF use EQ instead of EQL needs to be emphasized >more. Such as a remark that it is an error to use a number as an >indicator in the property list. > >It's all the worse that for many implementations EQL fixnums are EQ, >so you don't see the bug until your problem gets into bignums. I suppose this isn't the _best_ place to raise this question, but when EQ, fixnums, and FAQ appear in the same place it's almost irresistible. As you all will recall, Common Lisp allows EQ to return false for fixnums that are EQL. Indeed, it can do so even in a case like this: [1] (let ((x z) (y z)) (eq x y)) ; CLtL II p 288 or even: [2] (let ((x 5)) (eq x x)) ; CLtL II p 104 Indeed, at least one compiler (AKCL 1-505) has EQ always return NIL when the arguments are known to be fixnums. There is no significant efficiency reason for this. EQL for known fixnums compiles into the same code as EQ for objects: (x == y). The only difference is that x and y are C ints in the EQL case and pointers in the EQ one. A number of people find cases like [1] and especially [2] difficult to understand, even if they are experienced users or even implementors of Lisp. They could understand if the reason were just to keep the rule simple: EQ can _never_ be relied on for fixnums. What's hard to see is why in any reasonable implementation [1] or [2] would actually turn out to be false. And, as I will try to show below, the usual answers to this are not very convincing. I'm hoping there's a better answer and that someone can say what it is. It's easy to understand a case like this one: [3] (eq 1 (- 3 2)) ==> ? Why? Well, in some implementations fixnums might be implemented as pointers to locations that contain the actual values and new fixnums might be allocated whenever a fixnum result is returned. In [3], 1 and (- 3 2) might be pointers to different locations that both contain 1; but since EQ compares only the pointers it would return NIL. So much for [3]. But we need a different argument to show why [1] or [2] might return NIL, because if all fixnums were pointers-to, both [1] and [2] would be true (unless we imagine that merely referring to a fixnum causes a new one to be allocated, an implementation that would be bizarre, to say the least). Moreover, they would also be true if all fixnums were always represented as tagged immediate values, because they would have the same tag (fixnum) and the same value. Indeed, it's hard to imagine any reasonable representation that, if used for all fixnums, would have [1] or [2] be false. This suggests that the relevant case is when more than one representation might be used. The official reason (CLtL II p 288 again) seems to be that the "breakdown" of EQ for numbers is needed for a compiler to be able to produce "exceptionally efficient numerical code". We can imagine an implementation that normally allocated fixnums on the heap and represented them as pointers to heap locations but also put fixnums directly in registers or stack locations for manipulation by efficient compiled code. But this isn't very convincing either. KCL uses two representations in the way suggested and could make [1] and [2] return true without any efficiency loss -- unless you count the efficiency of being able to skip the comparison altogether. It is, moreover, hard to see why KCL should be a special case. If fixnum values are being used directly, the compiler knows and can emit a direct compare. If the values are not direct, the compiler knows _that_ and can emit a direct compare of the pointers. This seems to account at least for [2], (let ((x 5)) (eq x x)), where there is only one variable. But what about [1], (let ((x z) (y z)) (eq x y))? Maybe x is represented one way and y another. But again the compiler has to know this, because if it didn't it would also be unable to handle cases such as [4] (let ((x z) (y z)) (+ x y)) So we seem to be left with the conclusion that there isn't an efficiency reason after all. There is still a conceptual argument, namely that if x and y are different registers or stack locations they shouldn't count as EQ. This will not work as-is, because it would imply that x and y should not count as EQ when they contain other values such as (pointers to) cons cells. Instead, we'd have to say something like this: when fixnum values are used directly, they aren't Lisp objects any more and so it isn't meaningful to compare them with EQ. That's better, but it still seems to be wrong. This treatment of fixnums is an optimization. We should let optimizations have an impact on the semantics only when some important efficiency gain is obtained. Otherwise, we leave the semantics alone and allow only those optimizations that respect the semantics to the extent that any violations have no impact on the behavior of programs. That some objects might be manipulated internally as non-objects is not in itself a reason for saying they weren't objects in the first place. It therefore seems that examples such as [1] and [2] are useful only to make the problems due to examples such as [3] more universal. -- jd
ram+@cs.cmu.edu (Rob MacLachlan) (05/01/91)
In article <4580@skye.ed.ac.uk> jeff@aiai.UUCP (Jeff Dalton) writes: > >I suppose this isn't the _best_ place to raise this question, >but when EQ, fixnums, and FAQ appear in the same place it's >almost irresistible. > >As you all will recall, Common Lisp allows EQ to return false >for fixnums that are EQL. > >[2] (let ((x 5)) (eq x x)) ; CLtL II p 104 > >What's hard to see >is why in any reasonable implementation [1] or [2] would actually turn >out to be false. This is more convincing with floats, since fixnums normally have immediate representations in modern lisps, but here is the scenario: -- X is known to be a fixnum, so the compiler allocates in a register without any tag bits. This is an unboxed (or non-descriptor) representation. -- The compiler notices some references to X in contexts that require a tagged Lisp object. For each such reference, it heap allocates a copy of X. -- This results in EQ being passed two difference copies of X. Now, you might argue that this is a rather silly thing for the compiler to do, especially when both arguments are known to be fixnums. But consider something like: (let ((x 1)) (eq x (if yow x (gloob)))) Basically, there was a langauge design tradoff here. Is it more important to have a well-defined pointer identity for numbers, or is it more important to be able to generate tense numeric code? The only way to ensure that pointer identity is preserved for numbers would be to always retain the original object pointer, as well as any untagged value, and then to use the original object pointer in any tagged context. But this causes nasty problems with set variables: (let ((x z) (y z)) (declare (fixnum x y)) (when (gloob) (setq x (+ x (the fixnum grue)))) (eq x y)) Now what to you do? Keep a shadow tagged X, but invalidate it whenever it is set, and then cons X at the EQ when the shadow X is invalidated? Cons the result of + just so that you can store it into the shadow X? If this were an important problem to solve, I'm sure a solution could be worked out that wouldn't be too terribly inefficient most of the time, but why bother? As near as I can tell, object identity (EQ) is only an important operation on objects which can be side-effected, and numbers can't. You can also use EQ as a probabilistic optimization of general equality: if two objects are EQ, they are definitely the same. If not, they might be the same. But number copying only makes this hint slightly less sucessful, it doesn't break the semantics. Robert MacLachlan (ram@cs.cmu.edu)
haltraet@gondle.idt.unit.no (Hallvard Traetteberg) (05/01/91)
In article <4580@skye.ed.ac.uk> jeff@aiai.ed.ac.uk (Jeff Dalton) writes: As you all will recall, Common Lisp allows EQ to return false for fixnums that are EQL. Indeed, it can do so even in a case like this: [1] (let ((x z) (y z)) (eq x y)) ; CLtL II p 288 or even: [2] (let ((x 5)) (eq x x)) ; CLtL II p 104 ... we need a different argument to show why [1] or [2] might return NIL, because if all fixnums were pointers-to, both > [1] and [2] would be true (unless we imagine that merely referring to > a fixnum causes a new one to be allocated, an implementation that > would be bizarre, to say the least). Moreover, they would also be true if all fixnums were always represented as tagged immediate values, because they would have the same tag (fixnum) and the same value. Indeed, it's hard to imagine any reasonable representation that, if used for all fixnums, would have [1] or [2] be false. -- jd It may not be all that bizarre always to allocate new fixnum-objects when referring to then. The whole thing depends on what fixnums are used for. Consider: (dotimes (a-fixnum *max-fixnum*) <some code>) This code would generate an enormous amount of garbage fixnums, since one cannot in general know if the values of a-fixnum are alive e.g put into a list after the loop. But if one can guarantee that no fixnum-object is shared between structures, one can always garbage collect fixnum-objects that lose a (its only) reference. How can an implementor make this guarantee? The code above may expand to: (let ((a-fixnum 0)) (tagbody #:start <some code> (if (< (setq a-fixnum (1+ a-fixnum)) *max-fixnum*) (go #:start)))) Every time a-fixnum is set, the old fixnum-object that it refered to is garbage collected. This is safe if a-fixnum is the only reference to the fixnum-object. And with the above-mentioned guarantee this is true. The technique is to generate a new fixnum-object every time eval returns a fixnum. In the loop above, this new object will be the old fixnum-object just garbage collected. So the total amount of new fixnum-objects allocated in the loop is very small (2). Compare this with the *max-fixnum* fixnums that must be allocated without this technique, which would you prefer? KCL has a fixed set of already-made fixnum-objects that is reused instead of allocating duplicates. Short loops will thus not generate garbage like the example above suggests it should. The same trick can be used with the technique described above to prevent calling the allocator all the time. If one writes code that normally generates fixnum-sharing datastructures, the technique will slow the system down. But if this isn't the case, i believe it will be considerable faster as long as the implementation optimizes the recycling of garbage collected fixnums. Also, the technique depends on that copying fixnums is fast, e.g. one memory read/store. Extending this to bigger objects than fixnums may not be wise. I have never tried this although I intended to implement it in a Lisp interpreter I wrote some years ago. Is this really and old idea that everybody has thought of, tried and discarded as inefficient? Has anyone experience with systems implementing it? - haltraet, Norwegian Institute of Technology -- - hal
jeff@aiai.ed.ac.uk (Jeff Dalton) (05/07/91)
In article <1991May1.015534.27054@cs.cmu.edu> ram+@cs.cmu.edu (Rob MacLachlan) writes: >In article <4580@skye.ed.ac.uk> jeff@aiai.UUCP (Jeff Dalton) writes: >>I suppose this isn't the _best_ place to raise this question, >>but when EQ, fixnums, and FAQ appear in the same place it's >>almost irresistible. >> >>As you all will recall, Common Lisp allows EQ to return false >>for fixnums that are EQL. >> >>[2] (let ((x 5)) (eq x x)) ; CLtL II p 104 >> >>What's hard to see is why in any reasonable implementation [1] or [2] >>would actually turn out to be false. >This is more convincing with floats, since fixnums normally have immediate >representations in modern lisps, but here is the scenario: > -- X is known to be a fixnum, so the compiler allocates in a register > without any tag bits. This is an unboxed (or non-descriptor) > representation. > -- The compiler notices some references to X in contexts that require a > tagged Lisp object. For each such reference, it heap allocates a copy > of X. > -- This results in EQ being passed two difference copies of X. This is a good explanation (just what I was looking for), and it suggests a number of examples. However, it's not clear that expressions such as [2] are among them. >Now, you might argue that this is a rather silly thing for the compiler to >do, especially when both arguments are known to be fixnums. But consider >something like: > (let ((x 1)) (eq x (if yow x (gloob)))) It's still a silly thing to do, isn't it? If the compiler allocates an unboxed fixnum and then needs to create heap copy for every reference, then the compiler's made the code slower rather than faster. What we really need is an example that mixes fixnum (or float) arithmetic with other operations. The problem with examples like [2] is (I still think) that they're there to make it clear that the rule is fairly absolute but are somewhat confusing in themselves. >The only way to ensure that pointer identity is preserved for numbers would >be to always retain the original object pointer, as well as any untagged >value, and then to use the original object pointer in any tagged context. >But this causes nasty problems with set variables: > (let ((x z) > (y z)) > (declare (fixnum x y)) > (when (gloob) (setq x (+ x (the fixnum grue)))) > (eq x y)) > >Now what to you do? Keep a shadow tagged X, but invalidate it whenever it >is set, and then cons X at the EQ when the shadow X is invalidated? Cons >the result of + just so that you can store it into the shadow X? When I was thinking about mixed examples before, it seemed to me that a compiler might follow the rule "if a pointer is ever needed, allocate on the heap". Purely numeric subexpressions that did not contain assignments to the fixnum variables could still be optimized by dereferencing once at the start. This would be simpler than maintaining the shadow X, or at least more like the optimizations compilers normally perform. >If this were an important problem to solve, I'm sure a solution could be >worked out that wouldn't be too terribly inefficient most of the time, but >why bother? I'm inclined to agree, especially since EQ doesn't perform a value comparison on (boxed) numbers in any case. -- jeff
ram+@cs.cmu.edu (Rob MacLachlan) (05/08/91)
In article <4613@skye.ed.ac.uk> jeff@aiai.UUCP (Jeff Dalton) writes: >>>[2] (let ((x 5)) (eq x x)) ; CLtL II p 104 >This is a good explanation (just what I was looking for), and it >suggests a number of examples. However, it's not clear that >expressions such as [2] are among them. I think that the real problem is that simple examples such as these are trying to demonstrate that pointer identity doesn't work on numbers. The justification for this restriction is that it helps numeric code, but as you point out, these particular examples are not helped (and are rather silly). In fact, CMU CL will optimize [2] to T, even if an arbitrary expression is substituted for "5". >When I was thinking about mixed examples before, it seemed to me >that a compiler might follow the rule "if a pointer is ever needed, >allocate on the heap". This would be a reasonable rule were it not for loops and seldom-taken branches (like error checks.) Consider: (let ((sum 0.0)) (dotimes (i (length vec)) (incf sum (aref (the (simple-array single-float (*)) vec) i)) (when (> sum 1e3) (error "Sum too big: ~S" sum))) sum) In this case, we have a pointer use in an inner loop that is in reality never used. It is also possible to have pointer uses outside of a loop when the variable is referenced in a loop. For example, if we wanted to accumulate the sum in a global variable, then it would be a good idea to do: (let ((sum *global-sum*)) ...summing loop... (setq *global-sum* sum)) Robert MacLachlan (ram+@cs.cmu.edu)