clif@intelca.UUCP (Clif Purkiser) (10/23/85)
At the request of some people I am reposting a fairly brief description of the architecture of the 80386. 80386 Product Brief Introduction The 80386 is a high performance, 32-bit microprocessor designed for advanced applications like CAD/CAM engineering workstations, high resolution graphics, and factory automation. The 80386 brings to these application an unprecedented performance of 3-4 million instructions per second, complete 32-bit architecture, and paged virtual memory support. The iAPX 386 family of products provides the lastest in microprocessor technology and performance without compromising compatibility to the large software base of the iAPX 86 family. Of special interest is the 80386's unique virtual machine capabilities which allow multitasking between diverse operating systems such as Unix and MS-DOS. This allows OEM's to incorporate large amounts of standard 16-bit application software directly into new 32-bit designs. HIGHLIGHTS o 32-bit virtual memory microprocessor with 4 gigabytes physical address space, 4 gigabyte maximum segment size, and 64 terabyte virtual address space. o Sustained performance of 3-to-4 million instructions per second (MIPS) o Flexible 32-bit architecture with 8-, 16-, 32-bit data types. o Memory management and protection with segmentation and paging integrated on-chip. o 32 entry on-chip paging cache (translation lookaside buffer) with a 98% hit rate for efficient paging o Object-code compatible with all iAPX 86 family processors o Virtual 8086 mode allows direct execution of iAPX 86 family software and operating systems as guest in a protected 32-bit environment. o High speed interface for 80287 and 80387 floating point numeric coprocessors o Demultiplexed 32-bit address and data bus with 32 megabyte per second bandwidth for high speed local buses or local caching o High speed, high density, CHMOS III technology yields 12 and 16 MHz clock rates DESCRIPTION The 80386 rivals the performance of most super minicomputers, at 16 MHz, the 80386 is capable of executing at sustained rates of 3-to-4 million 32-bit instructions per second. This achievement was made possible through a state-of-the-art design combining advanced semiconductor technology, a pipelined architecture, address translation caches, a high performance bus, and specialized, high-speed coprocessors. The 80386 32-bit processor provides a rich, generalized register and instruction set for manipulating 32-bit data and addresses. Advanced features, such as scaled indexing and a 64-bit barrel shifter, ensure efficient addressing and fast instruction processing. For the convenience of compiler writers, the 80386 provides multiple addressing modes, a capability which ensures that high-level languages can be implemented in the most efficient manner possible. Scaling by data type is supported for direct indexing of arrays without the need to perform math explicitly on an effective address. The 80386 instruction set is marked by both power and flexibility. It offers the compiler writer and assembly language programmer a broad range of choices in which operations and data can be specified. Special emphasis has been placed on providing optimized instructions for high-level languages and operating system functions. Programmers will find that the instruction set is suitable for the entire spectrum of high-performance computer applications from engineering workstations through commercial data processing and real-time control. Instructions are clear, consistent, and quickly learned. The same highly efficient code is easily generated from source languages as varied as C, Fortran, Cobol, and Ada*. Advanced functions, such as hardware-supported multitasking and virtual memory support, provide the foundation necessary to build the most sophisticated multitasking and multiuser systems. Many operating system functions have been placed in hardware to enhance execution speed. The integrated memory management and protection mechanism translates virtual addresses to physical addresses and enforces the protection rules necessary for maintaining task integrity in a multitasking environment. The 80386 provides easy access to the large base of software developed for the 8086, 8088, 80186, 80188, and 80286 microprocessors. Binary-level-code compatibility allows execution of existing 16-bit applications without recompilation or reassembly, directly in a virtual iAPX 86 environment. Programs and even entire operating systems written for iAPX 86 processors can be run as guests under new 32-bit 80386 operating systems. Since the 80386 memory management unit is a superset of the 80286's, all 80286 software including operating systems is directly portable to the 80386. The OEM preserves his software investment and can reduce the time-to-market for new products. PIPELINED MICROARCHITECTURE The 80386's pipelined architecture performs instruction fetching, decoding, execution, and memory management functions in parallel. With this highly parallel operation, instruction fetch and decode times disappear as consumers of execution time, allowing performance levels 5 times greater than non-pipelined implementations. ON-CHIP MEMORY MANAGEMENT AND PROTECTION The 80386 provides efficent support for memory management and demand paged virtual memory on-chip. By performing memory management on-chip, the 386 eliminates the serious access delays inherent in other implementations that use off-chip memory management units. The benefit is not only high performance but relaxed memory-access time requirements, hence lower system cost. HIGH SPEED BUS The 80386 has seperate 32-bit data and address paths. A 32-bit access can be completed in only two clock cycles, enabling the bus to sustain a throughput of 32 Megabytes per second. By making prompt transfers between the microprocessor, memory, and peripherals, the high-speed bus design ensures that the entire system benefits from the processor's increased performance. CHMOS III Intel's advanced CHMOS III process (Complementary High Speed Metal Oxide Semiconductor) eliminates the frequency and reliability limitations of traditional CMOS processes and opens a new era in microprocessor performance. It combines the high performance and high density capabilities of Intel's leading HMOS III technology with the low power characteristics of CMOS. Using this technology, the 80386 is designed to operate at 12 and 16 MHz. NUMERIC COPROCESSOR SUPPORT The 80287 and 80387 are high-performance floating-point coprocessors for 80386 designs. A coprocessor takes numerics functions that would normally be performed in software by the microprocessor and instead executes them in hardware. The 80287 makes numerics power available to low-cost 80386 designs, while the 80387 provides enhanced functionality and the highest numerics performance available for 32-bit microprocessors. Both implement the IEEE 754 floating point standard, with high-precision 80-bit architectures and full support for single, double, and extended precision operations. Both coprocessors offer substantial performance enhancements over numeric software implementations, are binary-compatible with the industry-standard 8087 numerics coprocessor, and are fully supported by Intel and third-party high-level languages. COPROCESSORS Most applications can obtain an even higher boost in performance by using specialized coprocessors. A coprocessor takes functions that would normally be performed in software by the microprocessor and instead executes them in hardware. Coprocessors are best viewed as a means of extending the iAPX 386's already extensive instruction set. Instructions for the coprocessors are located in-line with code for the processor. For applications that would benefit from higher precision integer and floating point calculations, Intel will offer the 80387, a numerics coprocessor with full support for the IEEE standard for floating-point operations. The 80387 will run more than six times faster than the 80287, which has already set new standards in numerics performance, and is software compatible with its predecessor the 8087. The iAPX 386's coproccessor interface supports both the 80287 and the 80387 to offer the system designer the choice of low cost or high performance numeric solutions. For word processing and other common applications, system performance will benefit by using text and graphics coprocessors, and for systems connected by local area networks, the 82586 and 82588 LAN coprocessors speed interprocessor communication. Clif Purkiser {hplaps quantal amd}!intelca!clif
bobp@petfe.UUCP (Dan Masi) (10/28/85)
<<>> > 32 entry on-chip paging cache (translation lookaside buffer) with > a 98% hit rate for efficient paging > ^^^^^^^^^^^^ Does this mean that I will see a 98% cache hit rate for *all* programs that I can run on this processor??? Hmmm... Dan Masi ...!petsd!petfe!bobp
jb@terak.UUCP (John Blalock) (10/29/85)
> At the request of some people I am reposting a fairly brief description > of the architecture of the 80386. > (followed by 197 more lines of advertising) Who are the "some people"? Intel marketing types, no doubt. If everyone feels happy about paying the phone bills to receive your message, I'm sure I can put together a similar "fairly brief description" of my company's latest product which I'll be glad to post. But if I do it, then others will too and the net will become a mass of commercials and then cease to exist. Please register my vote as against such use of the net.
sambo@ukma.UUCP (Father of micro-ln) (10/30/85)
In article <531@petfe.UUCP> bobp@petfe.UUCP (Dan Masi) writes: >> 32 entry on-chip paging cache (translation lookaside buffer) with >> a 98% hit rate for efficient paging >> ^^^^^^^^^^^^ > >Does this mean that I will see a 98% cache hit rate for *all* programs >that I can run on this processor??? Hmmm... I think this flame is unwarranted. If the author had read the original posting more closely, he would have noticed that it was a brief des- cription of the 386, and if he would have bothered to read some more detailed literature from Intel, he would have found out that this fi- gure of 98% is for typical systems. -- Samuel A. Figueroa, Dept. of CS, Univ. of KY, Lexington, KY 40506-0027 ARPA: ukma!sambo<@ANL-MCS>, or sambo%ukma.uucp@anl-mcs.arpa, or even anlams!ukma!sambo@ucbvax.arpa UUCP: {ucbvax,unmvax,boulder,oddjob}!anlams!ukma!sambo, or cbosgd!ukma!sambo "Micro-ln is great, if only people would start using it."
rcd@opus.UUCP (Dick Dunn) (10/30/85)
I have mixed reactions to the parent 386 "product brief". I'm not about to flame about commercial information--it's nice to get some word about what's coming up. However, there was an awful lot of pitch to wade through to find the real information. Doing a breakout of words from the article and frequency-counting them showed the two most common, after discarding articles and prepositions, to be "high" and "performance". Maybe it's the marketing style of writing that made it seem so non-technical. > ...The 80386 brings to > these application an unprecedented performance of 3-4 million > instructions per second,... I would be quite happy if we NEVER saw any such non-measures again in this newsgroup. Instructions per second, with no other qualification, says almost nothing useful. (A 10 MHz 68010 can run 2.5 mips--if they're the right instructions.) Anyway, what's "unprecedented" about this rate? >...o Object-code compatible with all iAPX 86 family processors I wish folks would learn what the word "compatible" means. I shouldn't be picking on Intel here, but if you say that you've got compatibility, it means that you can not only run 8086 code on a 386 but you can run 386 code on an 86. "Upward compatible" is a rather different, much more restrictive term. > For the convenience of compiler writers, the 80386 provides multiple > addressing modes, a capability which ensures that high-level languages > can be implemented in the most efficient manner possible. Scaling by > data type is supported for direct indexing of arrays without the need > to perform math explicitly on an effective address. This is the sort of paragraph that gives me the mixed reaction. The first sentence is very nearly content-free. (As a compiler writer, I know that only a very few addressing modes are useful; beyond that they just complicate the compiler. And as far as some of the odd ways I've seen for encoding addressing information--if I never have to produce another segment override byte in my life it will be a spot of joy.) On the other hand, the information about scaling (presumably meaning the same idea that already exists in the NS 320xx and the 68020) is good news--though I'm hoping that "performing math on an effective address" really means "shifting an index". (Could it actually be multiplication? That would be a lot to hope for. I'd prefer something accurate to "math".) > The 80386 instruction set is marked by both power and flexibility. It > offers the compiler writer and assembly language programmer a broad > range of choices in which operations and data can be specified. Again speaking as a compiler writer, if there's one thing that's a pain in the... ...code generator, it's "a broad range of choices..." The fewer ways there are to do things: - the fewer choices the compiler has to make - the fewer chances it gets to make the wrong choice - the less time it has to spend making choices - the less time the compiler-writer has to spend teaching the compiler to make these choices I understand the architectural attitude that has given ever-richer instruction sets and addressing structures--but by and large these have not only NOT been helpful to compiler writers; they've led to compilers which are larger, slower, less reliable, and yet use an ever-decreasing subset of the hardware's capability. Save the "broad range of choices" for assembly language folks; give the compiler people simple, FAST machines. Some of the good-news items, as I see it: - Segments are finally large enough that they can be ignored. - It looks like Intel is buying into the IBM VM-style of handling existing programs running under existing systems. It can be clunky, but at the same time it can be effective and it's a good marketing tool. The article mentioned something to the effect of "generalized register" structure. Does this really mean anything new? I know there are more segment registers; apparently there are no more "data" registers. (Why is this?) Are the segment registers of greater capability than in the past? Specifically, can any of them (say, other than SS and CS:-) be used as general 32-bit operands? -- Dick Dunn {hao,ucbvax,allegra}!nbires!rcd (303)444-5710 x3086 ...At last it's the real thing...or close enough to pretend.
abc@brl-sem.ARPA (Brint Cooper ) (10/31/85)
I read Usenet for professional reasons. It's one more way to try and keep up with rapidly expanding technology. Therefore, I am happy to receive such notices as that of the 386 and of other new and innovative computer products. If companies are REALLY concerned about their phone bills (and not about oneupsmanship), they'll immediately direct their host administrators to shut off net.bizarre, net.jokes, net.women, net.singles, net.social, net.motss, net.religion.xxx, net.games (except for, perhaps, the game companies!), net.rec, and the like. Brint ARPA: abc@brl.arpa UUCP: ...{seismo,decvax,cbosgd}!brl-tgr!abc Dr Brinton Cooper U.S. Army Ballistic Research Laboratory Attn: SLCBR-SECAD (Cooper) Aberdeen Proving Ground, MD 21005-5066 Offc: 301 278-6883 AV: 298-6883 FTS: 939-6883 Home: 301-879-8927
Oleg Kiselev@birtch.UUCP (OLG) (11/01/85)
In article <531@petfe.UUCP> bobp@petfe.UUCP (Dan Masi) writes: >> 32 entry on-chip paging cache (translation lookaside buffer) with >> a 98% hit rate for efficient paging >> ^^^^^^^^^^^^ >Does this mean that I will see a 98% cache hit rate for *all* programs >that I can run on this processor??? Hmmm... According to 386 specs, translation buffer maps 128K worth of 4K pages. Most small programs written for 8086/80286 systems had a 64K data segment (limit for most programmers who did not want to pay speed penalties for address decoding). For those programs 98% hit rate is quite reasonable. I guess 386 was tested running MS-DOS... Some habits never go away ....:-) -- Disclamer: My employers go to church every Sunday, listen to Country music, and donate money to GOP. I am just a deviant. ----------------------------------+ Don't bother, I'll find the door, "Only through a violent revolution| Oleg Kiselev. can the existing order be pre- |...!{trwrb|scgvaxd}!felix!birtch!oleg served..."-Perfect Student Union |...!{ihnp4|randvax}!ucla-cs!uclapic!oac6!oleg
chuck@dartvax.UUCP (Chuck Simmons) (11/01/85)
> >> 32 entry on-chip paging cache (translation lookaside buffer) with > >> a 98% hit rate for efficient paging > >> ^^^^^^^^^^^^ > > > >Does this mean that I will see a 98% cache hit rate for *all* programs > >that I can run on this processor??? Hmmm... > > I think this flame is unwarranted. If the author had read the original > posting more closely, he would have noticed that it was a brief des- > cription of the 386, and if he would have bothered to read some more > detailed literature from Intel, he would have found out that this fi- > gure of 98% is for typical systems. What is a "typical system"? I think this is a completely warranted flame. I think such an outrageous claim needs considerable documentation. Usually people only claim 50-75% cache hit rates. chuck@dartvax
mdm@ecn-pc.UUCP (Mike D McEvoy) (11/01/85)
In article <130@intelca.UUCP> clif@intelca.UUCP (Clif Purkiser) writes: >At the request of some people I am reposting a fairly brief description >of the architecture of the 80386. > > 80386 Product Brief What many of us would like to see is some benchmarks of the 68020 vs 386. May I suggest that you run both th Dhrystone and Whetstone benchmarks ASAP and post them on the net.micro and net.68K. If you need source, let me know. Mike McEvoy 317-497-0509
phil@amdcad.UUCP (Phil Ngai) (11/02/85)
In article <531@petfe.UUCP> bobp@petfe.UUCP (Dan Masi) writes: >> 32 entry on-chip paging cache (translation lookaside buffer) with >> a 98% hit rate for efficient paging > >Does this mean that I will see a 98% cache hit rate for *all* programs >that I can run on this processor??? Hmmm... This is a paging cache, not an instruction or data cache. That is, instead of poking through the page tables for each virtual address generated by the program, you cache the virtual to physical address mapping for 32 pages. This saves a lot of time. With 32 4K pages mapped, that's 128K and 98% doesn't sound unreasonable. Let's look at it another way, suppose you only use each address (assume 32 bit words) in a 4K page once and after that demanded a new page. Then your hit rate on a 1 entry TLB is 1023 out of 1024 accesses or about 99.9%. You probably were thinking of a data cache. But that's not what Intel said. Hey Intel, why don't you defend yourselves? Are you going to sit there and wait for your competitors to defend you? :-) -- The Miami Police Department's Vice Squad has an annual budget of $1.5M. Each episode of the TV show "Miami Vice" costs $1.6M. Phil Ngai +1 408 749-5720 UUCP: {ucbvax,decwrl,ihnp4,allegra}!amdcad!phil ARPA: amdcad!phil@decwrl.dec.com
omondi@unc.UUCP (Amos Omondi) (11/03/85)
> > >> 32 entry on-chip paging cache (translation lookaside buffer) with > > >> a 98% hit rate for efficient paging > > >> ^^^^^^^^^^^^ > > > > > >Does this mean that I will see a 98% cache hit rate for *all* programs > > >that I can run on this processor??? Hmmm... > > > > I think this flame is unwarranted. If the author had read the original > > posting more closely, he would have noticed that it was a brief des- > > cription of the 386, and if he would have bothered to read some more > > detailed literature from Intel, he would have found out that this fi- > > gure of 98% is for typical systems. > > What is a "typical system"? I think this is a completely warranted flame. > I think such an outrageous claim needs considerable documentation. Usually > people only claim 50-75% cache hit rates. > > chuck@dartvax The figure of 98 % is not really outrageous. As Phil Ngai points out the writer is giving figures for the number of entries in the address translation hardware where 16 to 64 entries will usually give a hit ratio of anywhere from 90% to 99%. Actually i wonder if there are any machines out there with a translation cache of more than 64 entires.
dfh@scirtp.UUCP (David F. Hinnant) (11/04/85)
> In article <130@intelca.UUCP> clif@intelca.UUCP (Clif Purkiser) writes: > >At the request of some people I am reposting a fairly brief description > >of the architecture of the 80386. > > > > 80386 Product Brief > > What many of us would like to see is some benchmarks of the 68020 vs 386. > May I suggest that you run both th Dhrystone and Whetstone benchmarks ASAP > and post them on the net.micro and net.68K. If you need source, let me know. > > Mike McEvoy I agree, but BEWARE WHO RUNS THE BENCHMARKS! How about an INDEPENDENT UNBIASED volunteer? The Dhrystone should be a better representation than the Whetstone though. Moreover, some highly complex application program (VLSI routing for example) would serve as a good test case. It's important to make sure the operating system doesn't affect the benchmark. Both the CPU and the OS version should be the same (i.e. the same implementation of UNIX). Since I doubt 4.2BSD runs on the 386 yet, how about System III or V? Remember - Benchmark the CPU, not the UNIX implementation. Right Intel? -- David Hinnant SCI Systems, Inc. {decvax, akgua}!mcnc!rti-sel!scirtp!dfh
jer@peora.UUCP (J. Eric Roskos) (11/04/85)
> This is a paging cache, not an instruction or data cache. That is, > instead of poking through the page tables for each virtual address > generated by the program, you cache the virtual to physical address > mapping for 32 pages. This saves a lot of time. Does the 386 let you invalidate entries in the paging cache from outside? -- Shyy-Anzr: J. Eric Roskos UUCP: Ofc: ..!{decvax,ucbvax,ihnp4}!vax135!petsd!peora!jer Home: ..!{decvax,ucbvax,ihnp4}!vax135!petsd!peora!jerpc!jer US Mail: MS 795; Perkin-Elmer SDC; 2486 Sand Lake Road, Orlando, FL 32809-7642
doug@terak.UUCP (Doug Pardee) (11/04/85)
> > 32 entry on-chip paging cache (translation lookaside buffer) with > > a 98% hit rate for efficient paging > > I think such an outrageous claim needs considerable documentation. Usually > people only claim 50-75% cache hit rates. This isn't a data cache, it's a paging/MMU cache. National Semi has claimed a 98% hit rate for the 32-entry MMU TLB in their NS32081, and my experience has been that this is a valid figure, at least when running 4.2BSD. Since the 32081 has 512-byte pages, the cache addresses 16K. I hear that the '386 has 4K pages, so the cache addresses 128K, and a hit rate of even 99% would seem reasonable. -- Doug Pardee -- CalComp -- {calcom1,savax,seismo,decvax,ihnp4}!terak!doug
zben@umd5.UUCP (11/05/85)
In article <181@opus.UUCP> rcd@opus.UUCP (Dick Dunn) writes: >> (marks quotes from the original article -CBC) >> For the convenience of compiler writers, the 80386 provides multiple >> addressing modes, a capability which ensures that high-level languages >> can be implemented in the most efficient manner possible. > ... (As a compiler writer, I know that >only a very few addressing modes are useful; beyond that they just >complicate the compiler. And as far as some of the odd ways I've seen for >encoding addressing information--if I never have to produce another segment >override byte in my life it will be a spot of joy.) ... >> The 80386 instruction set is marked by both power and flexibility. It >> offers the compiler writer and assembly language programmer a broad >> range of choices in which operations and data can be specified. >Again speaking as a compiler writer, if there's one thing that's a pain in >the... >...code generator, it's "a broad range of choices..." The fewer ways there >are to do things: > - the fewer choices the compiler has to make > - the fewer chances it gets to make the wrong choice > - the less time it has to spend making choices > - the less time the compiler-writer has to spend teaching the > compiler to make these choices >I understand the architectural attitude that has given ever-richer >instruction sets and addressing structures--but by and large these have not >only NOT been helpful to compiler writers; they've led to compilers which >are larger, slower, less reliable, and yet use an ever-decreasing subset of >the hardware's capability. Save the "broad range of choices" for assembly >language folks; give the compiler people simple, FAST machines. I think this only applies when horrid things are done to the basic machine in order to support the fancy faz-baz features. Anybody who has ever seen the Huffman-coded opcode fields for the Intel 3000 should grok this... But, the machine I mainly use has 128 registers (sort of, you can't use them all for everything) which respond to low-core addresses too. The toy (subset-of-Algol) compiler I wrote taking classes used exactly two registers. One was used for all arithmetic, the other for array subscripting. So, like, put a zero in register R0 and pretend there is a no-index mode. Write the compiler to use only simple register-to-memory operations and do its subscript calculations with ADD and MUL like G*d meant them to be, and ignore that POLY instruction... Another way of looking at it is that the basic machine will probably have been slowed down a bit to accomodate all that faz-baz, and that this is the ultimate cost. Looking at it this way just drags us back to the old "to-RISC-or-not-to-RISC" dead horse. I would be willing to bet that the complaintant here is a closet "RISC" person... :-) -- Ben Cranston ...{seismo!umcp-cs,ihnp4!rlgvax}!cvl!umd5!zben zben@umd2.ARPA
clif@intelca.UUCP (Clif Purkiser) (11/18/85)
> > This is a paging cache, not an instruction or data cache. That is, > > instead of poking through the page tables for each virtual address > > generated by the program, you cache the virtual to physical address > > mapping for 32 pages. This saves a lot of time. > > Does the 386 let you invalidate entries in the paging cache from outside? > -- > Shyy-Anzr: J. Eric Roskos > UUCP: Ofc: ..!{decvax,ucbvax,ihnp4}!vax135!petsd!peora!jer > Home: ..!{decvax,ucbvax,ihnp4}!vax135!petsd!peora!jerpc!jer > US Mail: MS 795; Perkin-Elmer SDC; > 2486 Sand Lake Road, Orlando, FL 32809-7642 Yes, loading the page directory root register (CR3) with a Mov CR3, Reg instruction invalidates all of the entries in the TLB. Also a task (process) switch which loads a NEW value into CR3 invalidates the TLB. But if you are only using one set of page tables in your system you probably wouldn't want to invalidate the TLB so only new values in CR3 invalidate. However there is no hardware pin which lets use flush the TLB. -- Clif Purkiser, Intel, Santa Clara, Ca. HIGH PERFORMANCE MICROPROCESSORS {pur-ee,hplabs,amd,scgvaxd,dual,idi,omsvax}!intelca!clif {standard disclaimer about how these views are mine and may not reflect the views of Intel, my boss , or USNET goes here. }