soohong@gondor.cs.psu.edu (Soohong Kim) (09/09/88)
Could some one please enlighten me by explaining these? In Hwang & Briggs (Computer architecture and parallel processing) it is said that the virtual page number is first randomized by hashing before feeding it to the TLB. It is also said that the reason is for efficient use of the low-order TLB entries. And Vax/750 seems to avoid hashing by using a number of high-order bits and 5 low-order bits to select the set in the set-associative TLB. My questions are: (1) Why is it inefficient to use the Virtual page number directly? (2) How could hashing or bit-combination help in improving the efficiency? Thank you. ---Soohong
news@amdcad.AMD.COM (Network News) (09/09/88)
In article <3907@psuvax1.cs.psu.edu> soohong@gondor.cs.psu.edu (Soohong Kim) writes: | Could some one please enlighten me by explaining these? | | In Hwang & Briggs (Computer architecture and parallel processing) it is | said that the virtual page number is first randomized by hashing before | feeding it to the TLB. It is also said that the reason is for efficient use | of the low-order TLB entries. And Vax/750 seems to avoid hashing by using | a number of high-order bits and 5 low-order bits to select the set in the | set-associative TLB. | | My questions are: | (1) Why is it inefficient to use the Virtual page number directly? | (2) How could hashing or bit-combination help in improving the efficiency? | | Thank you. | ---Soohong If a TLB is fully-associative, then the entire TLB is searched in parallel for the virtual address, and no hashing is required. However, if the TLB is set-associative or direct-mapped, then a mapping must exist from a virtual address to the set to examine. This mapping (or hashing) must make efficient use of the TLB by evenly distributing the virtual addresses over all of the sets instead of clustering them into a few sets. Otherwise the TLB will thrash, causing performance degradation. The most obvious hash is to use the lsb's of the virtual address as an index into the TLB. This is better than using the msb's, because addresses exhibit the principal of locality, so we want sequential pages to map to different TLB sets. This scheme can be augmented in many ways. For example, if the virtual address space is divided up into 1/2 user and 1/2 supervisor, then the msb of the virtual address can be included in the hash to allow user and supervisor page mappings to coexist in the TLB without replacing each other (this is apparently what is going on in the Vax/750 example.) -- Tim Olson Advanced Micro Devices (tim@crackle.amd.com)
baum@Apple.COM (Allen J. Baum) (09/10/88)
[] > tim@crackle.amd.com (Tim Olson) writes: > soohong@gondor.cs.psu.edu (Soohong Kim) writes: >| In Hwang & Briggs (Computer architecture and parallel processing) it is >| said that the virtual page number is first randomized by hashing before >| feeding it to the TLB. >| >| My questions are: >| (1) Why is it inefficient to use the Virtual page number directly? >| (2) How could hashing or bit-combination help in improving the efficiency? >.. stuff about how to avoid thrashing the TLB... >The most obvious hash is to use the lsb's of the virtual address as an >index into the TLB. This is better than using the msb's, because >addresses exhibit the principal of locality, so we want sequential pages >to map to different TLB sets. This scheme can be augmented in many ways. Unfortunately, this is largely affected by how compilers and OS's allocate pages. It is often not enough to use the LSBs, because then the first page of every process would collide, or the heap (allocated to high mem.) would collide with the stak d(in low mem., or vice versa), or user and system pages would collide. So, the hashing that I've seen exclusive-ors the msb's of the page number with the lsb's, sometimes reversing the bits of one half or the other to really get them good and random. -- {decwrl,hplabs,ihnp4}!nsc!apple!baum (408)973-3385
dre%ember@Sun.COM (David Emberson) (09/13/88)
In article <3907@psuvax1.cs.psu.edu>, soohong@gondor.cs.psu.edu (Soohong Kim) writes: > Could some one please enlighten me by explaining these? > > In Hwang & Briggs (Computer architecture and parallel processing) it is > said that the virtual page number is first randomized by hashing before > feeding it to the TLB. It is also said that the reason is for efficient use > of the low-order TLB entries. And Vax/750 seems to avoid hashing by using > a number of high-order bits and 5 low-order bits to select the set in the > set-associative TLB. > > My questions are: > (1) Why is it inefficient to use the Virtual page number directly? > (2) How could hashing or bit-combination help in improving the efficiency? > > Thank you. > ---Soohong Most programs tend to use the same parts of their virtual address spaces, i.e. low pages for text and data, and high pages for stack. If you just used the virtual page number to address the TLB, you would be doing a lot of thrashing between processes and would not be using the TLB efficiently. What is typically done is to ex-or the low order bits of the virtual page number (bits most likely to change) with the low order bits of the process ID to generate a pseudo-random number which addresses the TLB. This ensures that all entries in the TLB are used to the same degree and thrashing between contexts is minimized. Dave Emberson (dre@sun.com)
casey@admin.cognet.ucla.edu (Casey Leedom) (09/21/88)
In article <16891@apple.Apple.COM> baum@apple.UUCP (Allen Baum) writes: > [] > > tim@crackle.amd.com (Tim Olson) writes: > > .. stuff about how to avoid thrashing the TLB... > > The most obvious hash is to use the lsb's of the virtual address as an > > index into the TLB. This is better than using the msb's, because > > addresses exhibit the principal of locality, so we want sequential pages > > to map to different TLB sets. This scheme can be augmented in many ways. > > It is often not enough to use the LSBs, because then the first page of > every process would collide, or the heap (allocated to high mem.) would > collide with the stack in low mem., or vice versa), or user and system > pages would collide. So, the hashing that I've seen exclusive-ors the > msb's of the page number with the lsb's, sometimes reversing the bits of > one half or the other to really get them good and random. Another source of problems with simply using the LSBs is if an application manipulates two (or more) arrays who's ``corresponding addresses'' are 2^N away from each other (where N is the number of LSB bits used to index the TLB). If a situation like this occurs, any linear access of the two (or more) arrays will cause *MASSIVE* thrashing. This actually happened with a group trying to do astronomic image processing at Berkeley using a Sun 3/2XX. They were copying a 1/4Mb image array 30 times a second as part of their application. They'd justified buying the Sun 3/2XX based on their need for speed. Much to their dismay, the Sun 3/2XX performed the memory copy about 3 times slower than the Sun 3/1XX! The problem turned out to be that the Sun 3/2XX cache was 64Kb with 16 byte lines indexed by a formula very dependent on the LSBs of the address. The loop would pick up a 4 byte long word causing a line to be faulted in. It would write that into the destination array causing that line to be faulted in. On the next read, the same source line would have to be faulted in, but now that line in the cache was dirty, so a 30 cycle write back would go down ... We finally got Sun 3/1XX performance by offsetting the source and destination arrays by 24 bytes. Because the array being copied was much larger than the cache, we got best performance at: | D - S | % 16 == 8 && | D - S | >= 16 && | D - S | <= 64K - 16 with a sin wave performance curve with the peaks as above and the valleys half way in between and also tapering out at the end points (hence the second and third conditions). I haven't sufficiently analyzed why the 24 byte offset (and other similar offsets) had the effect they did. It would require a better understanding of the exact indexing algorithm being employeed by the Sun 3/2XX cache. Casey