huggins@ticipa.ti.com (Gray Huggins) (06/12/91)
Hi, We have noticed a big difference between populating an IdentityDictionary with 15000 entries and one with 30000 entries. - Why? - What can be done to improve performance? - Why do we get "a primitive has failed" when we do really big stuff without garbage collecting? Here is our test 1) | a | a := IdentityDictionary new. Time millisecondsToRun: [ 1 to: 15000 do: [ :i | a at: DummyClass new put: i]] Result => 11390 milliseconds 2) | a | a := IdentityDictionary new. Time millisecondsToRun: [ 1 to: 30000 do: [ :i | a at: DummyClass new put: i]] Result => 1049351 milliseconds Regards, -- Gray Huggins Internet: huggins@ticipa.csc.ti.com Texas Instruments PO Box 655012 M/S 3635 TI MSG: GHUG Dallas, TX 75265 Voice: (214) 917-2202
hubert@cs.uni-sb.de (Hubert Baumeister) (06/17/91)
In article <HUGGINS.91Jun12092724@dino.ti.com>, huggins@ticipa.ti.com (Gray Huggins) writes: |> Hi, |> We have noticed a big difference between populating an IdentityDictionary |> with 15000 entries and one with 30000 entries. |> |> - Why? |> - What can be done to improve performance? |> - Why do we get "a primitive has failed" when we do really big stuff without |> garbage collecting? |> |> Here is our test |> |> 1) |> | a | |> a := IdentityDictionary new. |> Time millisecondsToRun: [ |> 1 to: 15000 do: [ :i | |> a at: DummyClass new put: i]] |> |> Result => 11390 milliseconds |> |> 2) |> | a | |> a := IdentityDictionary new. |> Time millisecondsToRun: [ |> 1 to: 30000 do: [ :i | |> a at: DummyClass new put: i]] |> |> Result => 1049351 milliseconds |> |> Regards, |> -- |> Gray Huggins Internet: huggins@ticipa.csc.ti.com |> Texas Instruments |> PO Box 655012 M/S 3635 TI MSG: GHUG |> Dallas, TX 75265 Voice: (214) 917-2202 When storing an association in an IdentityDictionary Smalltalk uses a hash function on the key element. If there are collisions it looks for nearest free entry. The result of the hash function defined in Object which is the one always used (except for Characters and Smallintegers) is a 14 bit quantity. Thus the result of the hash function is a value between 0 and 16383. If you create 15000 objects each object gets a unique hash value. That happens in the first case. But if you create 30000 objects then after 16384 objects the hash values repeat itself. Thus after 16384 objects there are only collisions which are resolved by looking for the next free entry. But since there are no 'holes' in the set of hash values, for each new entry about 7000 and more elements have to be accessed. A quick solution would be to redefine the method identityHash in a subclass of Object to give a more suitable result. If you know that you will have 30000 elements in your dictionary then modify the identityHash method of DummyClass as follows: identityHash ^super identityHash * 2 This will not avoid collisions after 16000 elements but in average an empty slot is found after one search. In the example above the time increases only by a factor 3. But this won't work for even higher numbers of elements. Hubert -- Hubert Baumeister Max-Planck-Institut f"ur Informatik Im Stadtwald 6600 Saarbr"ucken Telefon: (x49-681)302-5432 e-mail: hubert@cs.uni-sb.de
objtch@extro.ucc.su.OZ.AU (Peter Goodall) (06/18/91)
huggins@ticipa.ti.com (Gray Huggins) writes: >Hi, > We have noticed a big difference between populating an IdentityDictionary >with 15000 entries and one with 30000 entries. > - Why? > - What can be done to improve performance? > - Why do we get "a primitive has failed" when we do really big stuff without > garbage collecting? > Here is our test >1) >| a | >a := IdentityDictionary new. >Time millisecondsToRun: [ > 1 to: 15000 do: [ :i | > a at: DummyClass new put: i]] >Result => 11390 milliseconds >2) >| a | >a := IdentityDictionary new. >Time millisecondsToRun: [ > 1 to: 30000 do: [ :i | > a at: DummyClass new put: i]] >Result => 1049351 milliseconds >Regards, >-- >Gray Huggins Internet: huggins@ticipa.csc.ti.com >Texas Instruments >PO Box 655012 M/S 3635 TI MSG: GHUG >Dallas, TX 75265 Voice: (214) 917-2202 Greetings, As another reply suggested the problem is probably in the hash table used to store the objects. I had a problem on a client site with Smalltalk/286. I told them that it appeared that their perfomance was being eaten in SymbolTable hashing. Often, if you don't have a profiler, you can see what the system is doing by creating a user-break in the system and looking at the Smalltalk stack. If you see the same method sequence a lot, that's likely where the time is going. I eventually built an interrupt-driven profiler to prove a few points. We looked at rewriting the hashing function, but after a little experimentation, I changed the load-factor in the hash-tables from the 85% or so to 55% - and the system increased its speed by about 40%. I did lots of tests to measure the number of collisions in the has function and found it was horrific with about 7000 symbols, which was thereal size of the SymbolTable. I notice that Digitalk has dropped the loading to about 50% in Smalltalk/V WIndows. Advice: Beg borrow or buy a profiling system, and read Knuth on hash functions. Bye, -- Peter Goodall - Smalltalk Systems Consultant - objtch@extro.ucc.su.oz.au ObjecTech Pty. Ltd. - Software Tools, Training, and Advice 162 Burns Bay Rd, LANE COVE, NSW, AUSTRALIA. - Phone/Fax: +61 2 418-7433