kristyn@aludra.usc.edu (KRISTYN GREENWOOD) (10/30/89)
Stupid Question time: has anyone gotten a V20 to work with the '88 AMI bios on a vanilla XT? Somebody gave me a bios that works with the V20, but it doesnt seem to want to recognize the mono card(Hercules graphic card -the real thing). System boots fine, just no picture. Any shred of wisdom on the subject would be greatly appreciated. -g. +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ | Disclaimer - Dont blame them, I just | I hope they dont blow us up rent this space. | as they try to figure out how | to blow us up. Glenn Schmall - !uunet!ucscb!astroid | -Geordi TNG +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
vicc@unix.cie.rpi.edu (VICC Project (Rose)) (11/10/89)
In article <6129@merlin.usc.edu> kristyn@aludra.usc.edu (Kris' better half) writes: > > Stupid Question time: has anyone gotten a V20 to work with the > '88 AMI bios on a vanilla XT? > > Somebody gave me a bios that works with the V20, but it doesnt > seem to want to recognize the mono card(Hercules graphic card > -the real thing). System boots fine, just no picture. Hmm, I've got a Zenith Z161 (portable - luggable) I just popped my V20 in and zingo everything goes. My Norton SI went from 1.0 to 1.8! Now for my question: where can I find software that uses the V20 (especially assemblers, disassemblers and debuggers. I've verified that much of the additional instructions are in fact the same as the 80186 so using .186 for Tirbo assembler works fine, but I would like to be able to use the bit field instructions without defining macros. Btw: for those who don't know, you can call NEC and have them send a users manual with full instruction set description (Free!, and they were fast - 2 days! - says something about the US-SNAIL these days) -- Frank Filz Center For Integrated Electronics Rensselaer Polytechnic Institute vicc@unix.cie.rpi.edu
rob@prism.TMC.COM (11/11/89)
> Hmm, I've got a Zenith Z161 (portable - luggable) I just popped my V20 > in and zingo everything goes. My Norton SI went from 1.0 to 1.8! It's been said before, but it's worth repeating - The 1.8 number from SI is unrealistic (in general, any number from SI comparing different CPUs is unrealistic). Norton's SI tests speed by looping around an IMUL and IDIV instruction. The V20 executes these disproportionately quickly compared to the 8086/88. Since IMUL and IDIV instructions are very rare in 'real world' code, SI's figure isn't meaningful. The same problem comes when running SI on 286/386/etc... machines. In general, a V20 should give you about a 5 - 10% speedup, possibly as high as 20 - 30% when running floating point code, which tends to be heavy in integer multiplies and divides unless you have an 8087.
brianr@phred.UUCP (Brian Reese) (11/17/89)
In article <206900136@prism> rob@prism.TMC.COM writes: > > It's been said before, but it's worth repeating - The 1.8 number from >SI is unrealistic (in general, any number from SI comparing different > > In general, a V20 should give you about a 5 - 10% speedup, possibly ^^^^^^ >as high as 20 - 30% when running floating point code, which tends to be Have you ever actually _tried_ it? I replaced the CPU in my XT with a V20 and realized very noticable increase in speed, running a variety of apps. If the increase was only 5 - 10%, I really doubt that I would notice it. I'd say, on the average, I got a 40 - 50% boost. (Just for GP, it went from 1.0 to 1.8, just like the original poster.) I do agree with you that the SI is rather unrealistic, for the reasons you cited. I'm just offering my hands-ons, personal expirience. Anyone else out there with V20's? Brian -- Brian Reese uw-beaver!pilchuck!seahcx!phred!brianr Physio Control Corp., Redmond, Wa. brianr@phred.UUCP "Sticks and stones may break my bones, but whips and chains excite me!" * Do not write on this line. This line has been left blank intentionally. *
silver@eniac.seas.upenn.edu (Andy Silverman) (11/17/89)
While I find 40-50% speedup kind of an unrealistic statistic, I'd say 30% is reasonable in specific applications. Take for example, that FRACTINT program that does all those neat fractals using integer math. On the old 8088, the integer math ops were SLLOOWW, but with a V20 in my system there was a very noticeable speed increase. +-----------------------+-----------------------------------------+ | Andy Silverman | Internet: silver@eniac.seas.upenn.edu | | "All stressed out and | Compu$erve: 72261,531 | | nobody to choke." | | +-----------------------+-----------------------------------------+
vicc@unix.cie.rpi.edu (VICC Project (Rose)) (11/17/89)
In article <2851@phred.UUCP> brianr@phred.UUCP (Brian Reese) writes: >In article <206900136@prism> rob@prism.TMC.COM writes: >> >> It's been said before, but it's worth repeating - The 1.8 number from >>SI is unrealistic (in general, any number from SI comparing different >> >> In general, a V20 should give you about a 5 - 10% speedup, possibly > ^^^^^^ >>as high as 20 - 30% when running floating point code, which tends to be > >Have you ever actually _tried_ it? I replaced the CPU in my XT with a V20 >and realized very noticable increase in speed, running a variety of apps. >If the increase was only 5 - 10%, I really doubt that I would notice it. >I'd say, on the average, I got a 40 - 50% boost. (Just for GP, it went >from 1.0 to 1.8, just like the original poster.) Actually, from examining NECs timing, it would seem that an increase of 20-30% could be expected. Multiplies and divides run 4x faster. All Effective Addresses take 2 clock cycles, instead of 6+ (the V20 has hardware address calculation as opposed to microcode) The V20 also does multiple shifts at 1 cycle per bit instead of 4 because of hardware aid to the microcode (ie - not a barrel shifter which would do all shifts in 1 cycle. A number of instructions are also 1 or 2 cycles quicker. Of course these numbers are affected by the instruction cache (which I think might be better on the V20 also) REP instructions are also speeded up, prefix interrupts are handled correctly (up to 3 prefixes are 'remembered' as opposed to only 1 on the 8086 - but the V20 adds a REPC [carry] or REPNC so you could have 4 - but - most people dont use the LOCK prefix) As I said before the V20 has all 80186 instructions. In addition the V20 has REPC, REPNC, bit field instructions, BCD arithmetic string functions, and 8080 emulation (either mode shift or SW interrupt) Since someone asked for the number for NEC: 1-800-632-3531 (or 3532 in California) Ask for the V20 User's Manual One complaint I have about the manual: the registers are renamed, and the instructions are renamed (probably a copyright problem) One question I have: there is also a 2nd Co-Processor Escape op-code, does anyone know what this does? (could it support a 387 or something weird - neat like that? (I doubt it but one could hope)) One note I picked up from a friend - if you send your system in for repairs, make sure that they pull the V20 if they replace your system board, my friend lost his V20 because of that (so now he has a handfull to take care of all future problems - also purchased when it looked like the supply would dry up real fast) A note about the speed up - a register to memory operation is typically 15 cycles or so on the V20 and 13+EA on the 8088, which translates to a 20% speedup at worst. This is what I base my 20-30% speedup on, most instructions are not MUL or DIV, but many are MOV reg,mem or OP reg,mem. -- Frank Filz Center For Integrated Electronics Rensselaer Polytechnic Institute vicc@unix.cie.rpi.edu
rob@prism.TMC.COM (11/17/89)
>> It's been said before, but it's worth repeating - The 1.8 number from >>SI is unrealistic (in general, any number from SI comparing different >> >> In general, a V20 should give you about a 5 - 10% speedup, possibly ^^^^^^ >>as high as 20 - 30% when running floating point code, which tends to be >Have you ever actually _tried_ it? I replaced the CPU in my XT with a V20 >and realized very noticable increase in speed, running a variety of apps. >If the increase was only 5 - 10%, I really doubt that I would notice it. >I'd say, on the average, I got a 40 - 50% boost. (Just for GP, it went >from 1.0 to 1.8, just like the original poster.) Actually, I did try it a few years ago. I should have been more specific about what I meant by 'in general'. Running non-floating point code (program compiles, spreadsheets, and databases), the speedup was from 5 to 10%. I wouldn't have noticed it if I hadn't been timing it. As mentioned, floating point code, which makes heavy use of the integer multiplies and divides at which the V20 excels, shows a greater increase (in my experience, around 25%). It's sort of surprising how little difference a V20 makes. As another note mentioned, it also claims to drastically speed up (by a factor of 3 to 6) effective address calculation, which, unlike integer multiplies and divides, is a real factor in most code. Yet a test program I wrote that simply looped around a bunch of statements like MOV AX, [BX+SI+2] showed a speedup of only about 20%, as I recall. Looping overhead was clearly a consideration (the V20 doesn't claim to speed up loops significantly), but I still expected a larger gain. Still, whether it's worthwhile to you depends on what you're running, and what you consider a significant speedup. You could probably also realize a larger gain if you optimized code for the V20. Given how inexpensive it is, getting a V20 or V30 is probably worth it. The point remains, though: someone expecting the 80% speedup that SI promises will be disappointed (i.e. my complaint is with SI, not the V20).
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (11/18/89)
In article <2851@phred.UUCP> brianr@phred.UUCP (Brian Reese) writes: | I do agree with you that the SI is rather unrealistic, for the reasons you | cited. I'm just offering my hands-ons, personal expirience. | | Anyone else out there with V20's? I found that the one program which justified buying a V20 did run much faster. That's good, because I don't notice anything else running better. I did measure a few programs, but the change was 10-15% faster, not enough to really notice. Then I got a 386... that I notice. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) "The world is filled with fools. They blindly follow their so-called 'reason' in the face of the church and common sense. Any fool can see that the world is flat!" - anon
Ralf.Brown@B.GP.CS.CMU.EDU (11/18/89)
In article <206900137@prism>, rob@prism.TMC.COM wrote: > It's sort of surprising how little difference a V20 makes. As another >note mentioned, it also claims to drastically speed up (by a factor of 3 >to 6) effective address calculation, which, unlike integer multiplies and >divides, is a real factor in most code. Yet a test program I wrote that >simply looped around a bunch of statements like > > MOV AX, [BX+SI+2] > >showed a speedup of only about 20%, as I recall. Looping overhead was >clearly a consideration (the V20 doesn't claim to speed up loops >significantly), but I still expected a larger gain. A major problem is that the 8088 and V20 are bus-bound. Any instruction that executes in less than four clock cycles per byte will drain the four-byte instruction prefetch queue. Once the prefetch queue is empty, instructions run only as fast as they can be fetched from memory (at one byte every four clock cycles). Since every branch empties the prefetch queue (and the instructions at the destination may not let it refill), the prefetch queue spends a significant percentage of the time empty. For example, the sequence SHL AX,1 SHL AX,1 SHL AX,1 SHL AX,1 takes eight clocks according to the official Intel instruction timings. Unfortunately, each of these instructions is two bytes long, so it takes eight clocks to fetch each instruction. Thus, the best case is when the instruction queue is full at the start of this sequence: SHL AX,1 two clocks, PQ now has two bytes and is fetching a third SHL AX,1 two clocks, PQ now empty, third byte arrives at end SHL AX,1 only one byte, so start fetching next four clocks later, we can start, so total is six clocks SHL AX,1 wait two clocks for first byte, four for second, then two clocks to execute = eight clocks Total: 18 clocks Worst case is when the prefetch queue is empty, with the next byte two clocks away. Then the first three instructions each take eight clocks to execute, and the last takes ten clocks, for a total of 34 clocks. You should see a greater improvement when replacing an 8086 with a V30, since they can fetch two bytes every four clocks and have a six-byte prefetch queue, greatly reducing the bus-boundedness of the processor (the above instruction sequence runs in eight to 16 clocks, depending on how full the prefetch queue is at the beginning) -- UUCP: {ucbvax,harvard}!cs.cmu.edu!ralf -=-=-=-=- Voice: (412) 268-3053 (school) ARPA: ralf@cs.cmu.edu BIT: ralf%cs.cmu.edu@CMUCCVMA FIDO: Ralf Brown 1:129/46 FAX: available on request Disclaimer? I claimed something? "How to Prove It" by Dana Angluin 8. proof by wishful citation: The author cites the negation, converse, or generalization of a theorem from the literature to support his claims.
rob@prism.TMC.COM (11/21/89)
> A major problem is that the 8088 and V20 are bus-bound. Any instruction > that executes in less than four clock cycles per byte will drain the > four-byte instruction prefetch queue. This is true for many code sequences, though not all. The problem is that, on the 8088 at least, many instructions take longer than 4 clocks/byte. In general, register intensive code (like the SHL AX,1 in your example) is bus bound, while memory intensive code, and some register arithmetic, is compute bound. 'Typical' code falls somewhere in between, though the bus is still a bottleneck. The V20, with its faster effective address calculation, is more likely to run up against the bus limit when accessing memory, so your point about bus bandwidth being a limiting factor is valid. One of SI's problems is that it's entirely compute bound on an 8086/88. It shows no difference between an 8088 and an 8086 running at the same clock speed. This is because SI spends about 2/3 of its time on IMUL or IDIV instructions, which are uninfluenced by bus bandwidth. Thus the way to speed up a CPU's SI rating is to speed up its multiplies and divides. Of course, since most code doesn't contain many IMULs or IDIVs, a CPU that speeds those instructions up more than it speeds up the more common ones will do better at SI than it will in 'real life'. That's why SI overstates the performance of many CPUs so drastically.