[net.arch] 386 Architectural Description

clif@intelca.UUCP (Clif Purkiser) (10/23/85)

At the request of some people I am reposting a fairly brief description
of the architecture of the 80386.  


                          80386 Product Brief



Introduction

The 80386 is a high performance, 32-bit microprocessor designed for 
advanced applications like CAD/CAM engineering workstations, high 
resolution graphics, and factory automation.  The 80386 brings to 
these application an unprecedented performance of 3-4 million 
instructions per second, complete 32-bit architecture, and paged 
virtual memory support.  The iAPX 386 family of products provides the 
lastest in microprocessor technology and performance without 
compromising compatibility to the large software base of the iAPX 86 
family.  Of special interest is the 80386's unique virtual machine 
capabilities which allow multitasking between diverse operating 
systems such as Unix and MS-DOS.  This allows OEM's to incorporate 
large amounts of standard 16-bit application software directly into 
new 32-bit designs.


HIGHLIGHTS

o   32-bit virtual memory microprocessor with 4 gigabytes physical  
    address space, 4 gigabyte maximum segment size, and 64 terabyte 
    virtual address space.

o   Sustained performance of 3-to-4 million instructions per second 
    (MIPS)

o   Flexible 32-bit architecture with 8-, 16-, 32-bit data types.

o   Memory management and protection with segmentation and paging 
    integrated on-chip.

o   32 entry on-chip paging cache (translation lookaside buffer) with 
    a 98% hit rate for efficient paging

o   Object-code compatible with all iAPX 86 family processors

o   Virtual 8086 mode allows direct execution of iAPX 86 family 
    software and operating systems as guest in a protected 32-bit 
    environment.

o   High speed interface for 80287 and 80387 floating point numeric 
    coprocessors

o   Demultiplexed 32-bit address and data bus with 32 megabyte per 
    second bandwidth for high speed local buses or local caching

o   High speed, high density, CHMOS III technology yields 12 and 16 
    MHz clock rates



DESCRIPTION

The 80386 rivals the performance of most super minicomputers, at 16 
MHz, the 80386 is capable of executing at sustained rates of 3-to-4 
million 32-bit instructions per second.  This achievement was made 
possible through a state-of-the-art design combining advanced 
semiconductor technology, a pipelined architecture, address 
translation caches, a high performance bus, and specialized, 
high-speed coprocessors.

The 80386 32-bit processor provides a rich, generalized register and 
instruction set for manipulating 32-bit data and addresses.  Advanced 
features, such as scaled indexing and a 64-bit barrel shifter, ensure 
efficient addressing and fast instruction processing.

For the convenience of compiler writers, the 80386 provides multiple 
addressing modes, a capability which ensures that high-level languages 
can be implemented in the most efficient manner possible.  Scaling by 
data type is supported for direct indexing of arrays without the need 
to perform math explicitly on an effective address.

The 80386 instruction set is marked by both power and flexibility.  It 
offers the compiler writer and assembly language programmer a broad 
range of choices in which operations and data can be specified.  
Special emphasis has been placed on providing optimized instructions 
for high-level languages and operating system functions.  Programmers 
will find that the instruction set is suitable for the entire spectrum 
of high-performance computer applications from engineering 
workstations through commercial data processing and real-time 
control.  Instructions are clear, consistent, and quickly learned.  
The same highly efficient code is easily generated from source 
languages as varied as C, Fortran, Cobol, and Ada*.

Advanced functions, such as hardware-supported multitasking and 
virtual memory support, provide the foundation necessary to build the 
most sophisticated multitasking and multiuser systems.  Many operating 
system functions have been placed in hardware to enhance execution 
speed.  The integrated memory management and protection mechanism 
translates virtual addresses to physical addresses and enforces the 
protection rules necessary for maintaining task integrity in a 
multitasking environment.

The 80386 provides easy access to the large base of software developed 
for the 8086, 8088, 80186, 80188, and 80286 microprocessors.  
Binary-level-code compatibility allows execution of existing 16-bit 
applications without recompilation or reassembly, directly in a 
virtual iAPX 86 environment.  Programs and even entire operating 
systems written for iAPX 86 processors can be run as guests under new 
32-bit 80386 operating systems.  Since the 80386 memory management 
unit is a superset of the 80286's, all 80286 software including 
operating systems is directly portable to the 80386.  The OEM 
preserves his software investment and can reduce the time-to-market 
for new products.




PIPELINED MICROARCHITECTURE

The 80386's pipelined architecture performs instruction fetching, 
decoding, execution, and memory management functions in parallel.  
With this highly parallel operation, instruction fetch and decode 
times disappear as consumers of execution time, allowing performance 
levels 5 times greater than non-pipelined implementations.


ON-CHIP MEMORY MANAGEMENT AND PROTECTION

The 80386 provides efficent support for memory management and demand 
paged virtual memory on-chip.  By performing memory management 
on-chip, the 386 eliminates the serious access delays inherent in 
other implementations that use off-chip memory management units.  The 
benefit is not only high performance but relaxed memory-access time 
requirements, hence lower system cost.


HIGH SPEED BUS

The 80386 has seperate 32-bit data and address paths.  A 32-bit access 
can be completed in only two clock cycles, enabling the bus to sustain 
a throughput of 32 Megabytes per second.  By making prompt transfers 
between the microprocessor, memory, and peripherals, the high-speed 
bus design ensures that the entire system benefits from the 
processor's increased performance.


CHMOS III

Intel's advanced CHMOS III process (Complementary High Speed Metal 
Oxide Semiconductor) eliminates the frequency and reliability 
limitations of traditional CMOS processes and opens a new era in 
microprocessor performance.  It combines the high performance and high 
density capabilities of Intel's leading HMOS III technology with the 
low power characteristics of CMOS.  Using this technology, the 80386 
is designed to operate at 12 and 16 MHz.


NUMERIC COPROCESSOR SUPPORT

The 80287 and 80387 are high-performance floating-point coprocessors 
for 80386 designs.  A coprocessor takes numerics functions that would 
normally be performed in software by the microprocessor and instead 
executes them in hardware.  The 80287 makes numerics power available 
to low-cost 80386 designs, while the 80387 provides enhanced 
functionality and the highest numerics performance available for 
32-bit microprocessors.  Both implement the IEEE 754 floating point 
standard, with high-precision 80-bit architectures and full support 
for single, double, and extended precision operations.  Both 
coprocessors offer substantial performance enhancements over numeric 
software implementations, are binary-compatible with the 
industry-standard 8087 numerics coprocessor, and are fully supported 
by Intel and third-party high-level languages.


COPROCESSORS

Most applications can obtain an even higher boost in performance by using 
specialized coprocessors.  A coprocessor takes functions that would 
normally be performed in software by the microprocessor and instead 
executes them in hardware.  Coprocessors are best viewed as a means of 
extending the iAPX 386's already extensive instruction set.  Instructions 
for the coprocessors are located in-line with code for the processor.

For applications that would benefit from higher precision integer and 
floating point calculations, Intel will offer the 80387, a numerics 
coprocessor with full support for the IEEE standard for floating-point 
operations.  The 80387 will run more than six times faster than the 80287, 
which has already set new standards in numerics performance, and is 
software compatible with its predecessor the 8087.  The iAPX 386's 
coproccessor interface supports both the 80287 and the 80387 to offer the 
system designer the choice of low cost or high performance numeric 
solutions.

For word processing and other common applications, system performance will 
benefit by using text and graphics coprocessors, and for systems connected 
by local area networks, the 82586 and 82588 LAN coprocessors speed 
interprocessor communication.

Clif Purkiser
{hplaps quantal amd}!intelca!clif

bobp@petfe.UUCP (Dan Masi) (10/28/85)

<<>>

>   32 entry on-chip paging cache (translation lookaside buffer) with 
>   a 98% hit rate for efficient paging
>     ^^^^^^^^^^^^

Does this mean that I will see a 98% cache hit rate for *all* programs
that I can run on this processor???   Hmmm...


Dan Masi
...!petsd!petfe!bobp

jb@terak.UUCP (John Blalock) (10/29/85)

> At the request of some people I am reposting a fairly brief description
> of the architecture of the 80386.  
> (followed by 197 more lines of advertising)
Who are the "some people"?  Intel marketing types, no doubt.  If everyone feels
happy about paying the phone bills to receive your message, I'm sure I can put
together a similar "fairly brief description" of my company's latest product
which I'll be glad to post.  But if I do it, then others will too and the net
will become a mass of commercials and then cease to exist.  Please register my
vote as against such use of the net.

sambo@ukma.UUCP (Father of micro-ln) (10/30/85)

In article <531@petfe.UUCP> bobp@petfe.UUCP (Dan Masi) writes:
>>   32 entry on-chip paging cache (translation lookaside buffer) with 
>>   a 98% hit rate for efficient paging
>>     ^^^^^^^^^^^^
>
>Does this mean that I will see a 98% cache hit rate for *all* programs
>that I can run on this processor???   Hmmm...

I think this flame is unwarranted.  If the author had read the original
posting more closely, he would have noticed that it was a brief des-
cription of the 386, and if he would have bothered to read some more
detailed literature from Intel, he would have found out that this fi-
gure of 98% is for typical systems.
--
Samuel A. Figueroa, Dept. of CS, Univ. of KY, Lexington, KY  40506-0027
ARPA: ukma!sambo<@ANL-MCS>, or sambo%ukma.uucp@anl-mcs.arpa,
      or even anlams!ukma!sambo@ucbvax.arpa
UUCP: {ucbvax,unmvax,boulder,oddjob}!anlams!ukma!sambo,
      or cbosgd!ukma!sambo

	"Micro-ln is great, if only people would start using it."

rcd@opus.UUCP (Dick Dunn) (10/30/85)

I have mixed reactions to the parent 386 "product brief".  I'm not about to
flame about commercial information--it's nice to get some word about what's
coming up.  However, there was an awful lot of pitch to wade through to
find the real information.

Doing a breakout of words from the article and frequency-counting them
showed the two most common, after discarding articles and prepositions, to
be "high" and "performance".  Maybe it's the marketing style of writing
that made it seem so non-technical.

> ...The 80386 brings to 
> these application an unprecedented performance of 3-4 million 
> instructions per second,...

I would be quite happy if we NEVER saw any such non-measures again in this
newsgroup.  Instructions per second, with no other qualification, says
almost nothing useful.  (A 10 MHz 68010 can run 2.5 mips--if they're the
right instructions.)  Anyway, what's "unprecedented" about this rate?

>...o   Object-code compatible with all iAPX 86 family processors

I wish folks would learn what the word "compatible" means.  I shouldn't be
picking on Intel here, but if you say that you've got compatibility, it
means that you can not only run 8086 code on a 386 but you can run 386 code
on an 86.  "Upward compatible" is a rather different, much more restrictive
term.

> For the convenience of compiler writers, the 80386 provides multiple 
> addressing modes, a capability which ensures that high-level languages 
> can be implemented in the most efficient manner possible.  Scaling by 
> data type is supported for direct indexing of arrays without the need 
> to perform math explicitly on an effective address.

This is the sort of paragraph that gives me the mixed reaction.  The first
sentence is very nearly content-free.  (As a compiler writer, I know that
only a very few addressing modes are useful; beyond that they just
complicate the compiler.  And as far as some of the odd ways I've seen for
encoding addressing information--if I never have to produce another segment
override byte in my life it will be a spot of joy.)  On the other hand, the
information about scaling (presumably meaning the same idea that already
exists in the NS 320xx and the 68020) is good news--though I'm hoping that
"performing math on an effective address" really means "shifting an
index".  (Could it actually be multiplication?  That would be a lot to hope
for.  I'd prefer something accurate to "math".)

> The 80386 instruction set is marked by both power and flexibility.  It 
> offers the compiler writer and assembly language programmer a broad 
> range of choices in which operations and data can be specified.  

Again speaking as a compiler writer, if there's one thing that's a pain in
the...
...code generator, it's "a broad range of choices..."  The fewer ways there
are to do things:
	- the fewer choices the compiler has to make
	- the fewer chances it gets to make the wrong choice
	- the less time it has to spend making choices
	- the less time the compiler-writer has to spend teaching the
	  compiler to make these choices
I understand the architectural attitude that has given ever-richer
instruction sets and addressing structures--but by and large these have not
only NOT been helpful to compiler writers; they've led to compilers which
are larger, slower, less reliable, and yet use an ever-decreasing subset of
the hardware's capability.  Save the "broad range of choices" for assembly
language folks; give the compiler people simple, FAST machines.

Some of the good-news items, as I see it:
	- Segments are finally large enough that they can be ignored.
	- It looks like Intel is buying into the IBM VM-style of handling
	  existing programs running under existing systems.  It can be
	  clunky, but at the same time it can be effective and it's a good
	  marketing tool.

The article mentioned something to the effect of "generalized register"
structure.  Does this really mean anything new?  I know there are more
segment registers; apparently there are no more "data" registers.  (Why is
this?)  Are the segment registers of greater capability than in the past?
Specifically, can any of them (say, other than SS and CS:-) be used as
general 32-bit operands?
-- 
Dick Dunn	{hao,ucbvax,allegra}!nbires!rcd		(303)444-5710 x3086
   ...At last it's the real thing...or close enough to pretend.

abc@brl-sem.ARPA (Brint Cooper ) (10/31/85)

I read Usenet for professional reasons.  It's one more way to try and
keep up with rapidly expanding technology.  Therefore, I am happy to receive 
such notices as that of the 386 and of other new and innovative
computer products.

If companies are REALLY concerned about their phone bills (and not about
oneupsmanship), they'll immediately direct their host administrators to
shut off net.bizarre, net.jokes, net.women, net.singles, net.social,
net.motss, net.religion.xxx, net.games (except for, perhaps, the game
companies!), net.rec, and the like.  

Brint
	 ARPA:  abc@brl.arpa
	 UUCP:  ...{seismo,decvax,cbosgd}!brl-tgr!abc

		  Dr Brinton Cooper
		  U.S. Army Ballistic Research Laboratory
		  Attn: SLCBR-SECAD (Cooper)
		  Aberdeen Proving Ground, MD  21005-5066

Offc:    301 278-6883    AV:  298-6883     FTS: 939-6883
Home:	 301-879-8927

Oleg Kiselev@birtch.UUCP (OLG) (11/01/85)

In article <531@petfe.UUCP> bobp@petfe.UUCP (Dan Masi) writes:
>>   32 entry on-chip paging cache (translation lookaside buffer) with 
>>   a 98% hit rate for efficient paging
>>     ^^^^^^^^^^^^
>Does this mean that I will see a 98% cache hit rate for *all* programs
>that I can run on this processor???   Hmmm...

According to 386 specs, translation buffer maps 128K worth of 4K pages.
Most small programs written for 8086/80286 systems had a 64K data segment 
(limit for most programmers who did not want to pay speed penalties for address
decoding). For those programs 98% hit rate is quite reasonable. 

I guess 386 was tested running MS-DOS... Some habits never go away ....:-)

-- 
Disclamer: My employers go to church every Sunday, listen to Country music,
and donate money to GOP. I am just a deviant.
----------------------------------+ Don't bother, I'll find the door,
"Only through a violent revolution|                       Oleg Kiselev.
 can the existing order be pre-   |...!{trwrb|scgvaxd}!felix!birtch!oleg
 served..."-Perfect Student Union |...!{ihnp4|randvax}!ucla-cs!uclapic!oac6!oleg

chuck@dartvax.UUCP (Chuck Simmons) (11/01/85)

> >>   32 entry on-chip paging cache (translation lookaside buffer) with 
> >>   a 98% hit rate for efficient paging
> >>     ^^^^^^^^^^^^
> >
> >Does this mean that I will see a 98% cache hit rate for *all* programs
> >that I can run on this processor???   Hmmm...
> 
> I think this flame is unwarranted.  If the author had read the original
> posting more closely, he would have noticed that it was a brief des-
> cription of the 386, and if he would have bothered to read some more
> detailed literature from Intel, he would have found out that this fi-
> gure of 98% is for typical systems.

What is a "typical system"?  I think this is a completely warranted flame.
I think such an outrageous claim needs considerable documentation.  Usually
people only claim 50-75% cache hit rates.

chuck@dartvax

mdm@ecn-pc.UUCP (Mike D McEvoy) (11/01/85)

In article <130@intelca.UUCP> clif@intelca.UUCP (Clif Purkiser) writes:
>At the request of some people I am reposting a fairly brief description
>of the architecture of the 80386.  
>
>                          80386 Product Brief

What many of us would like to see is some benchmarks of the 68020 vs 386.
May I suggest that you run both th Dhrystone and Whetstone benchmarks ASAP
and post them on the net.micro and net.68K.  If you need source, let me know.

				Mike McEvoy
				317-497-0509

phil@amdcad.UUCP (Phil Ngai) (11/02/85)

In article <531@petfe.UUCP> bobp@petfe.UUCP (Dan Masi) writes:
>>   32 entry on-chip paging cache (translation lookaside buffer) with 
>>   a 98% hit rate for efficient paging
>
>Does this mean that I will see a 98% cache hit rate for *all* programs
>that I can run on this processor???   Hmmm...

This is a paging cache, not an instruction or data cache. That is,
instead of poking through the page tables for each virtual address
generated by the program, you cache the virtual to physical address
mapping for 32 pages. This saves a lot of time. With 32 4K pages
mapped, that's 128K and 98% doesn't sound unreasonable. Let's look at
it another way, suppose you only use each address (assume 32 bit
words) in a 4K page once and after that demanded a new page. Then your
hit rate on a 1 entry TLB is 1023 out of 1024 accesses or about 99.9%.

You probably were thinking of a data cache. But that's not what Intel
said. Hey Intel, why don't you defend yourselves? Are you going to sit
there and wait for your competitors to defend you? :-)
-- 
 The Miami Police Department's Vice Squad has an annual budget of $1.5M.
 Each episode of the TV show "Miami Vice" costs $1.6M.

 Phil Ngai +1 408 749-5720
 UUCP: {ucbvax,decwrl,ihnp4,allegra}!amdcad!phil
 ARPA: amdcad!phil@decwrl.dec.com

omondi@unc.UUCP (Amos Omondi) (11/03/85)

> > >>   32 entry on-chip paging cache (translation lookaside buffer) with 
> > >>   a 98% hit rate for efficient paging
> > >>     ^^^^^^^^^^^^
> > >
> > >Does this mean that I will see a 98% cache hit rate for *all* programs
> > >that I can run on this processor???   Hmmm...
> > 
> > I think this flame is unwarranted.  If the author had read the original
> > posting more closely, he would have noticed that it was a brief des-
> > cription of the 386, and if he would have bothered to read some more
> > detailed literature from Intel, he would have found out that this fi-
> > gure of 98% is for typical systems.
> 
> What is a "typical system"?  I think this is a completely warranted flame.
> I think such an outrageous claim needs considerable documentation.  Usually
> people only claim 50-75% cache hit rates.
> 
> chuck@dartvax



The figure of 98 % is not really outrageous. As Phil Ngai points out
the writer is giving figures for the number of entries in the address
translation hardware where 16 to 64 entries will usually give a hit
ratio of anywhere from 90% to 99%. Actually i wonder if there are any
machines out there with a translation cache of more than 64 entires.

dfh@scirtp.UUCP (David F. Hinnant) (11/04/85)

> In article <130@intelca.UUCP> clif@intelca.UUCP (Clif Purkiser) writes:
> >At the request of some people I am reposting a fairly brief description
> >of the architecture of the 80386.  
> >
> >                          80386 Product Brief
> 
> What many of us would like to see is some benchmarks of the 68020 vs 386.
> May I suggest that you run both th Dhrystone and Whetstone benchmarks ASAP
> and post them on the net.micro and net.68K.  If you need source, let me know.
> 
> 				Mike McEvoy

I agree, but BEWARE WHO RUNS THE BENCHMARKS!  How about an INDEPENDENT
UNBIASED volunteer?  The Dhrystone should be a better representation
than the Whetstone though.  Moreover, some highly complex application
program (VLSI routing for example) would serve as a good test case.
It's important to make sure the operating system doesn't affect the
benchmark.  Both the CPU and the OS version should be the same (i.e.
the same implementation of UNIX).  Since I doubt 4.2BSD runs on the 386
yet, how about System III or V?  Remember - Benchmark the CPU, not the
UNIX implementation.  Right Intel?

-- 
				David Hinnant
				SCI Systems, Inc.
				{decvax, akgua}!mcnc!rti-sel!scirtp!dfh

jer@peora.UUCP (J. Eric Roskos) (11/04/85)

> This is a paging cache, not an instruction or data cache. That is,
> instead of poking through the page tables for each virtual address
> generated by the program, you cache the virtual to physical address
> mapping for 32 pages. This saves a lot of time.

Does the 386 let you invalidate entries in the paging cache from outside?
-- 
Shyy-Anzr:  J. Eric Roskos
UUCP: Ofc:  ..!{decvax,ucbvax,ihnp4}!vax135!petsd!peora!jer
     Home:  ..!{decvax,ucbvax,ihnp4}!vax135!petsd!peora!jerpc!jer
  US Mail:  MS 795; Perkin-Elmer SDC;
	    2486 Sand Lake Road, Orlando, FL 32809-7642

doug@terak.UUCP (Doug Pardee) (11/04/85)

> >   32 entry on-chip paging cache (translation lookaside buffer) with 
> >   a 98% hit rate for efficient paging
>
> I think such an outrageous claim needs considerable documentation.  Usually
> people only claim 50-75% cache hit rates.

This isn't a data cache, it's a paging/MMU cache.  National Semi has
claimed a 98% hit rate for the 32-entry MMU TLB in their NS32081, and
my experience has been that this is a valid figure, at least when
running 4.2BSD.

Since the 32081 has 512-byte pages, the cache addresses 16K.  I hear
that the '386 has 4K pages, so the cache addresses 128K, and a hit
rate of even 99% would seem reasonable.
-- 
Doug Pardee -- CalComp -- {calcom1,savax,seismo,decvax,ihnp4}!terak!doug

zben@umd5.UUCP (11/05/85)

In article <181@opus.UUCP> rcd@opus.UUCP (Dick Dunn) writes:
>> (marks quotes from the original article -CBC)

>> For the convenience of compiler writers, the 80386 provides multiple 
>> addressing modes, a capability which ensures that high-level languages 
>> can be implemented in the most efficient manner possible.   

>                  ...  (As a compiler writer, I know that
>only a very few addressing modes are useful; beyond that they just
>complicate the compiler.  And as far as some of the odd ways I've seen for
>encoding addressing information--if I never have to produce another segment
>override byte in my life it will be a spot of joy.)  ...

>> The 80386 instruction set is marked by both power and flexibility.  It 
>> offers the compiler writer and assembly language programmer a broad 
>> range of choices in which operations and data can be specified.  

>Again speaking as a compiler writer, if there's one thing that's a pain in
>the...
>...code generator, it's "a broad range of choices..."  The fewer ways there
>are to do things:
>	- the fewer choices the compiler has to make
>	- the fewer chances it gets to make the wrong choice
>	- the less time it has to spend making choices
>	- the less time the compiler-writer has to spend teaching the
>	  compiler to make these choices
>I understand the architectural attitude that has given ever-richer
>instruction sets and addressing structures--but by and large these have not
>only NOT been helpful to compiler writers; they've led to compilers which
>are larger, slower, less reliable, and yet use an ever-decreasing subset of
>the hardware's capability.  Save the "broad range of choices" for assembly
>language folks; give the compiler people simple, FAST machines.

I think this only applies when horrid things are done to the basic machine
in order to support the fancy faz-baz features.  Anybody who has ever seen
the Huffman-coded opcode fields for the Intel 3000 should grok this...

But, the machine I mainly use has 128 registers (sort of, you can't use them
all for everything) which respond to low-core addresses too.  The toy
(subset-of-Algol) compiler I wrote taking classes used exactly two registers.
One was used for all arithmetic, the other for array subscripting.

So, like, put a zero in register R0 and pretend there is a no-index mode.
Write the compiler to use only simple register-to-memory operations and do
its subscript calculations with ADD and MUL like G*d meant them to be, and
ignore that POLY instruction...

Another way of looking at it is that the basic machine will probably have
been slowed down a bit to accomodate all that faz-baz, and that this is 
the ultimate cost.  Looking at it this way just drags us back to the old 
"to-RISC-or-not-to-RISC" dead horse.  I would be willing to bet that the
complaintant here is a closet "RISC" person...  :-)
-- 
Ben Cranston  ...{seismo!umcp-cs,ihnp4!rlgvax}!cvl!umd5!zben  zben@umd2.ARPA

clif@intelca.UUCP (Clif Purkiser) (11/18/85)

> > This is a paging cache, not an instruction or data cache. That is,
> > instead of poking through the page tables for each virtual address
> > generated by the program, you cache the virtual to physical address
> > mapping for 32 pages. This saves a lot of time.
> 
> Does the 386 let you invalidate entries in the paging cache from outside?
> -- 
> Shyy-Anzr:  J. Eric Roskos
> UUCP: Ofc:  ..!{decvax,ucbvax,ihnp4}!vax135!petsd!peora!jer
>      Home:  ..!{decvax,ucbvax,ihnp4}!vax135!petsd!peora!jerpc!jer
>   US Mail:  MS 795; Perkin-Elmer SDC;
> 	    2486 Sand Lake Road, Orlando, FL 32809-7642

Yes,  loading the page directory root register (CR3) with a Mov CR3, Reg
instruction invalidates all of the entries in the TLB.

Also a task (process) switch which loads a NEW value into CR3 invalidates
the TLB.  But if you are only using one set of page tables in your system
you probably wouldn't want to invalidate the TLB so only new values in CR3
invalidate.

However there is no hardware pin which lets use flush the TLB.
-- 
Clif Purkiser, Intel, Santa Clara, Ca.
HIGH PERFORMANCE MICROPROCESSORS
{pur-ee,hplabs,amd,scgvaxd,dual,idi,omsvax}!intelca!clif
	
{standard disclaimer about how these views are mine and may not reflect
the views of Intel, my boss , or USNET goes here. }