david@bdt.UUCP (07/10/88)
Has anyone looked into speeding up compress on the 286? Under Microport System V/AT compress runs really slooow. On a my 10MHz 286 with (only) 2MB RAM a 60K test file generally gives times of: 20.0u 1.0s or there abouts. I assume the slowness is mostly due to the hack to "simulate" larger than 64K arrays (which Xenix and Microport don't handle!). My particular problem may be raleted to swapping, in which case the speeds might be better if I had more RAM. Does anybody else suffer similar performance problems and if so, has anybody looked into speeding it up? -- David Beckemeyer (david@bdt.uucp) | "Yea I've got medicine..." as the Beckemeyer Development Tools | cookie cocks a his Colt, "and if 478 Santa Clara Ave, Oakland, CA 94610 | you don't keep your mouth shut, I'm UUCP: {unisoft,sun}!hoptoad!bdt!david | gonna give you a big dose of it!"
hutch@hubcap.UUCP (David Hutchens) (07/13/88)
From article <347@bdt.UUCP>, by david@bdt.UUCP (David Beckemeyer): > > Has anyone looked into speeding up compress on the 286? Under > Microport System V/AT compress runs really slooow. On a my 10MHz > 286 with (only) 2MB RAM a 60K test file generally gives times of: > > 20.0u 1.0s > > or there abouts. I assume the slowness is mostly due to the hack > to "simulate" larger than 64K arrays (which Xenix and Microport don't > handle!). My particular problem may be raleted to swapping, in which > case the speeds might be better if I had more RAM. > > Does anybody else suffer similar performance problems and if so, has > anybody looked into speeding it up? > -- I don't know about Microport, but I have found that a LOT of time is spent doing long shifts on my Xenix system when I use a 16-bit compress. This is in part because the C compiler generates a call to a routine to do long shifts. What is worse, they coded the routine so that it is space efficient, rather than time efficient (It uses a total of 3 or 4 286 instructions looping through them as many times as the number of bits you wish to shift: i.e. it shifts one bit each time through the loop.) I found that I could write my own routine - using a grand total of 50 more bytes or so, and in doing so I decreased the time required to do a 16-bit compress by about 30%! I don't have the code in front of me but the basic idea was to use the 16-bit shift instructions and OR together the appropriate results. I suspect that for 1 and possibly 2 bit shifts the provided routine is faster, but compress does a lot of shifts of 10 bits or more, and with these, my routine wins by a BIG margin. David Hutchens hutch@hubcap.clemson.edu ...!gatech!hubcap!hutch
wes@obie.UUCP (Barnacle Wes) (07/14/88)
In article <347@bdt.UUCP>, david@bdt.UUCP (David Beckemeyer) writes: > Has anyone looked into speeding up compress on the 286? Under > Microport System V/AT compress runs really slooow. Mine, too. That's why I stopped running compress - I just get everything batched but uncompressed. It's slower, but fast enough, and works well. > ................ I assume the slowness is mostly due to the hack > to "simulate" larger than 64K arrays (which Xenix and Microport don't > handle!). My particular problem may be raleted to swapping, in which > case the speeds might be better if I had more RAM. It might be, but I upgraded my system from 1 meg to 3, and it really didn't help much. The 16-bit compress is right near the limit for process size on V/AT, and it seems to swap a lot regardless of how much memory you have. You might want to look at the 13-bit compress, I understand it is much faster, especially on brain-dead architectures like the '286. -- {hpda, uwmcsd1}!sp7040!obie!wes "Happiness lies in being priviledged to work hard for long hours in doing whatever you think is worth doing." -- Robert A. Heinlein --
jsilva@cogsci.berkeley.edu (John Silva) (07/16/88)
I just finished hacking compress to be MUCH faster on my AT system (SCO 2.2.0g) by replacing the original 32 bit shift routines with a set of hand coded routines. I managed to speed up compress by about 24%! (16 bit compressions spend most of the time shifting around long integers, and the Microsoft compiler uses a one bit at a time shift routine for 32 bit shifts) If anyone would like a copy of these routines (two 8086 asm sources), I would be happy to mail them. However, keep in mind that they may not function correctly on flavors of xenix other than SCO. John P. Silva --- UUCP: ucbvax!cogsci!jsilva DOMAIN: jsilva@cogsci.berkeley.edu
hutch@hubcap.UUCP (David Hutchens) (07/26/88)
New improved version, now with assembly source. Earlier I wrote: > > I don't know about Microport, but I have found that a LOT of time > is spent doing long shifts on my Xenix system when I use a 16-bit > compress. This is in part because the C compiler generates a call > to a routine to do long shifts. What is worse, they coded the > routine so that it is space efficient, rather than time efficient (It > uses a total of 3 or 4 286 instructions looping through them as many > times as the number of bits you wish to shift: i.e. it shifts one > bit each time through the loop.) I found that I could write my > own routine - using a grand total of 50 more bytes or so, and in doing > so I decreased the time required to do a 16-bit compress by about 30%! > > I don't have the code in front of me but the basic idea was to use > the 16-bit shift instructions and OR together the appropriate results. > I suspect that for 1 and possibly 2 bit shifts the provided routine is > faster, but compress does a lot of shifts of 10 bits or more, and with > these, my routine wins by a BIG margin. I received several replies requesting the source. Again, I must caution that these routines are designed to work with Microsoft Xenix 2.0. I don't have any idea whether they work with any other compiler/os. It turns out that the Microsoft compiler uses a non-standard call sequence to call its own built in routines, including the long shift operations. These routines assume that the number to be shifted is in the A (lower order bits) and D (higher order bits) registers at entry (That is where the Microsoft C compiler I'm using puts them). They assume that the number of bits to be shifted is in the CL register. They distroy the CH register (I'm not positive if this is really safe, but it works for the programs I have tried!). I assemble the following with 'as' and link it with the compress source. Best of luck. Remember to test it well before giving it any trust. David Hutchens hutch@hubcap.clemson.edu ...!gatech!hubcap!hutch ---------- CUT HERE ----------- ; Static Name Aliases ; TITLE shift .287 _TEXT SEGMENT BYTE PUBLIC 'CODE' _TEXT ENDS CONST SEGMENT WORD PUBLIC 'CONST' CONST ENDS _BSS SEGMENT WORD PUBLIC 'BSS' _BSS ENDS DGROUP GROUP CONST, _BSS ASSUME CS: _TEXT, DS: DGROUP, SS: DGROUP, ES: DGROUP _TEXT SEGMENT PUBLIC __lshr __lshr PROC FAR cmp cl,15 jle $LSRSMALL sub cl,16 xchg ax,dx sar ax,cl cwd ret $LSRSMALL: mov ch,cl push dx shr ax,cl sub cl,16 neg cl shl dx,cl or ax,dx pop dx mov cl,ch sar dx,cl ret __lshr ENDP PUBLIC __ulshr __ulshr PROC FAR cmp cl,15 jle $ULSRSMALL sub cl,16 xchg ax,dx shr ax,cl sub dx,dx ret $ULSRSMALL: mov ch,cl push dx shr ax,cl sub cl,16 neg cl shl dx,cl or ax,dx pop dx mov cl,ch shr dx,cl ret __ulshr ENDP PUBLIC __lshl __lshl PROC FAR cmp cl,15 jle $LSLSMALL sub cl,16 mov dx,ax shl dx,cl sub ax,ax ret $LSLSMALL: mov ch,cl push ax shl dx,cl sub cl,16 neg cl shr ax,cl or dx,ax pop ax mov cl,ch shl ax,cl ret __lshl ENDP _TEXT ENDS END