nelson@bolyard.wpd.sgi.com (Nelson Bolyard) (10/24/90)
In my original posting, I told about my application that runs in Turbo-C's "tiny" model (CS==DS==ES==SS, all fits in 64 k bytes) but intensively uses LONG integers. Performance of 8086 instructions was bad. I was looking for a way to use the 32-bit "extended" registers of the 386, in a DOS environment, without a "DOS extender" if possible. I wanted speed, not more memory. After my summary of replies, I will explain the final results, and how I got them. The final results were 4.3 times faster without any extender. I got 32 responses, including some from people who don't read this newsgroup (comp.os.msdos.programmer) but instead are on the 386-Users@udel.edu mailing list maintained by Bill Davidsen, who is also the moderator of comp.binaries.ibm.pc. About half the responses simply told me about various extenders. Since (it turned out) an extender is not necessary for my purposes, I have not included any of those responses. There were far too many responses to repeat them all here, so I will truly summarizefthem. I asked 4 questions, which were: 1. Do I need a dos extender to be able to use the 32-bit registers? >No. The 32 bit register instructions are available via an escape code byte >on each instruction (not as efficient as running in protected mode, but >still much better than doing 16 bit arithmetic). >No. The 32-bit register are available in both "real" and "virtual-86" >modes, which DOS would use. A dos-extender is primarily used to allow >you to write your program to use the "protected" mode of the processor. >The 32 bit registers can be accessed simply by putting a "size" prefix >byte in front of the instruction: The prefix byte causes what would >normally be a 16 bit register access to become a 32 bit register access >instead. (In 386 protected mode, the default access width can either >be 16 bit or 32 bit: the prefix is used to access the "other" size, >whichever that is.) >The 386 is a strange chameleon beast. All the addressing stuff that used >to be 16 bits can be made 32 bits instead by putting an override byte in >front of an individual instruction. Similarly but separately, all of the >operand stuff that used to be 16 bits can be made 32 bit, by putting an >override byte in front of an individual instruction. It turns out that TASM puts the necessary prefix bytes into the instructions automatically when you use the .386 directive, and the segment is declared as a "USE16" segment. All the 32-bit addressing modes of the 386 work in "real mode", provided that the most significant 16-bits of the address are zero. A few edits were required. I'll show this below. 2. Where can I get a c compiler that will generate code that uses the 32-bit 386 registers efficiently, but still use DOS for I/O? The reponses to this question fell into three categories: a. >I don't know. I'd like to get one of these myself. b. >There are three compilers I know of that generate 32-bit code that >you run under a DOS extender: Zortech, Watcom, and MetaWare. c. Four people suggested that MetaWare had a compiler that would generate code that would use 32-bit extended registers without an extender. I looked into this thoroughly, with results described below. 3. Turbo C has a compiler option to generate 186/286 instructions, that I've never used. What does that do for me? >There are a few additional available instructions such as push immediate, >push/pop all, shift by other than 1 bit, multiply immediate. Turn the option >on, it does reduce the amount of code. >That uses certain instructions that are available on the 186/286 but >are not available on 8088's or 8086's. These are primarily stack frame >instructions such as "PUSH constant", and "RET n". These "extra" >instructions are available in both "real" and "protected" modes of any >intel 80186 or better processor (and on the V20, as well). Because C >spends a lot of time fiddling with stacks and stack frames, this can >make a some improvement in both size and speed. It isn't that significant >an improvement, though, and reduces the portability of the binary. >It does almost nothing for you. You sacrifice compatibility with 8086/8 >machines for only slightly better code size and slightly better speed. >The real gain would be from an ability to use the 32 bit registers on >the '386, and no compiler vendor that I know of offers this. I had >high hopes for the recent C compiler from Topspeed, whose ads mentioned >an ability to generate '386 specific code. But a phone conversation >with their technical staff revealed a singular lack of knowledge about >this reputed ability, and I won't buy a pig in a poke. In short, the 186 code option wasn't the solution. 4. Do you have any suggestions? >Get your compiler to output assembler (-Fa with Microsoft C) and >hack the assembler to use 32 bit instructions. >Or buy the 32 bit compiler and runtime (works great but costs $900). >OR just program in assembler. >I have done all three of the above at various times. >I think it is just plain stupid that nobody makes a C compiler that >runs in 8086 mode but uses 32 bit [registers for] ints!! >you can write the code in assembler using TASM which understands >the 32 bit instruction mnemonics. >Bug your compiler's vendor. I have done so, with little effect. I bet >others have done so too, with little effect. If enough people want it, >maybe they'll do it. For my purposes (and I suspect for many other >people's purposes, too), the ability of a compiler to generate code >specific for the '386 in full generality is not necessary. A simple >change that would simply add inline support for 32 bit arithmetic in >32 bit registers would go a long way towards satisfying me. I am >willing to accept the code size penalty (one byte per 32 bit instruction), >especially since one instruction will often do the work of many 16 bitters. >It makes me mad that the compiler vendors just won't do this. I would >guess that the ability to generate '286 code is bolted onto existing >compilers by peephole optimizing 8086 code, and not by a separate code >generator. The '386 code that I need generated for C involves long >arithmetic and long comparisons, something that a '386 supports almost >directly, even in 16 bit mode. This could all be implemented via >peephole optimization, just as the '286 code generation is. I don't >need 32 bit addressing; I don't insist that a variable of type "int" >be more than 16 bits long. I just want "long" variables to be treated >as efficiently as they can be by the CPU. >Cheapest try: If the speed problems are in just a few hot spots, just >hand-code them in assembler using TASM or whatever so you can generate the >32-bit instructions using prefix-byte overrides. The prefix bytes won't >slow you down much, at a guess. This will be totally dos-portable, cheap, >compatable, and probably go as fast as you can go. Actual solution (results of my investigations): At the suggestion of several respondents, I contacted Watcom, and Metaware (makers of the "HIGH C" compiler). Watcom said it was necessary to use a DOS extender to use any code generated by their 386 compilers. They had no product that would generate 386 code that would run without an extender. Metaware makes a 386 compiler whose code must be run with an extender, and a 8086 compiler with a "386 flag" that causes it to generate code that uses some (a very few) 386 instructions. They offered to take some of my c code and compile it with 3 compilers (8086, 8086 with 386 flag, 386 protected mode). So I sent them one large routine, and after a while they sent back the three .ASM files. I publicly thank them right here for doing that. I had high hopes for the "8086 compiler with 386 flag". Unfortunately, the differences between the code generated by the 8086 and 8086 with 386-flag compilers seemed minor. The 386-flag compiler did not use the 32-bit registers for long arithmetic, but instead used the AX and DX registers for 32-bit operations, just as the 8086 compiler does. Here are some sample side-by-side code differences between the two .ASM files: 386-flag code 8086 code (no 386 flag) --------------------------------------------------- or -4[bp],dx or -4[bp],dx or -2[bp],ax or -2[bp],ax movzbw ax,2[si] | mov al,2[si] cwd | sub ah,ah > sub dx,dx --------------------------------------------------- mov cx,8 < .L002c: | mov dh,dl shl ax,1 | mov dl,ah rcl dx,1 | mov ah,al loop 002c | sub al,al --------------------------------------------------- > mov dl,dh > mov dh,bl > mov bl,bh > sub bh,bh mov cx,14 | mov cx,6 .L0109: .L0109: shr bx,1 shr bx,1 rcr dx,1 rcr dx,1 loop 0109 loop 0109 --------------------------------------------------- The code produced by the 386 compiler was, by comparison, delightful to read. It used 32-bit registers for both addressing, and for integer arithmetic. The idea occurred to me to take the 386 protected mode code and assemble it with TASM for linking with other "real mode" code, after editing it slightly. That idea worked. I took the sample 386 assembly code that MetaWare sent me, and made a few edits, shown below. **** OLD: extrn _mwargstack ; unreferenced OLD: CGROUP group _text OLD: _text segment NEW: .386 ; tell TASM to generate 386 code NEW: _TEXT segment DWORD PUBLIC USE16 'CODE' NEW: DGROUP group _TEXT ; Turbo C segment naming conventions NEW: assume cs:_TEXT,ds:DGROUP,ss:DGROUP Explanation: USE16 tells TASM to generate the prefix bytes needed to run in real mode whenever it encounters extended registers (e.g. eax). **** OLD: _f386 proc near NEW: _f386 proc far Explanation: see (*) below. **** OLD: shr eax OLD: shl edx NEW: shr eax,1 NEW: shl edx,1 Explanation: add missing ",1" to all one bit shifts. **** OLD: dec -8[ebp] NEW: dec dword ptr -8[ebp] Explanation: dword pointer is not the default in TASM. **** OLD: leave NEW: mov esp,ebp NEW: pop ebp Explanation: I could not get TASM to put the necessary prefix byte in front of the leave instruction, so I coded the equivalent two instructions. **** OLD: _text ends NEW: _f386 endp ; missing NEW: _TEXT ends ; Turbo C naming conventions **** (*) Explanation: The protected mode code assumes all parameters in the stack are 32 bits. Near pointers are 32-bit segment offsets. The (near) return address is 32-bits. All the stack offsets (e.g. 12[ebp]) in the code are computed with these assumptions. I didn't want to go through all the assembler code and change all the stack offsets, so I changed the way this procedure is called from Turbo C, to make sure it matched the conventions. To do that, I made it a far procedure (even though it's in the tiny model). That ensured that 32-bits of return address got pushed. I made the following change in the Turbo C declaration of the function so that the parameters on the stack would be as expected by the assembler code. #ifdef old /* before converting to 386 asm code */ int func1( unsigned char * p1, unsigned char * p2, unsigned long * p3, int p4); #else /* code to match 386 asm calling conventions */ long far f386( unsigned long p1, unsigned long p2, unsigned long p3, long p4); #define n2ul(a) (unsigned long) FP_OFF( &a[0] ) #define func1(a,b,c,d) f386( n2ul(a), n2ul(b), n2ul(c), d ) #endif Summary: Generate 32-bit code that will run under DOS, and can be linked with Turbo C code in the tiny and small models, as follows: 1. Compile with protected mode 386 compiler into a .ASM file. 2. Edit the .ASM file, changing segment naming conventions (expecially USE16), pointer sizes, far proc, and the leave instruction. 3. Assemble with TASM (or MASM, I suppose) 4. link with other code. I want to thank the following respondents for their comments: > 6600m00@nucsbuxa.ucsb.edu (Rob) > Anthony Scian <afscian@watmsg.waterloo.edu> > Jeff Prothero <jsp@milton.u.washington.edu> > Mark Alexander <alexande@dri.com> > Norbert Schlenker <nfs@Princeton.EDU> > g9023690@wolfen.cc.uow.edu.au (Phillip Secker) > jme@pacer.Pacer.COM (John Eikanger) > jpn@genrad.com (John P. Nelson) > mcdonald@aries.scs.uiuc.edu (Doug McDonald) > ralerche@lindy.Stanford.EDU (Robert A. Lerche) > shaban@bu-pub.bu.edu (Marwan Shaban) > toma@tekgvs.labs.tek.com (Tom Almy) > uchida@flab.fujitsu.co.jp (Yoshiaki Uchida) ----------------------------------------------------------------------------- Nelson Bolyard nelson@sgi.COM {decwrl,sun}!sgi!whizzer!nelson Disclaimer: Views expressed herein do not represent the views of my employer. -----------------------------------------------------------------------------