[comp.unix.xenix] Compress Speedup Kit for 286

jsilva@cogsci.berkeley.edu (John Silva) (07/20/88)

Due to the number of requests for these routines I have received, I have 
decided to post them.  Please keep in mind that these were written for
286 based SCO V2.2.0g, but may work on your implementation of xenix.

The best way to find out is to try them and see if they crash anything..

Hacking around with adb, I have discovered quite a few library routines
which utilize 32 bit shifts, such as _doprint (the kernel of the printf
routines), and others.  Even the 32 bit math functions.

John P. Silva
Inova Products

-------------------- Cut Here ----------- Cut Here ----------------------

#! /bin/sh
# This is a shell archive.  Remove anything before this line, then unpack
# it by saving it into a file and typing "sh file".  To overwrite existing
# files, type "sh file -c".  You can also feed this as standard input via
# unshar, or by typing "sh <file", e.g..  If this archive is complete, you
# will see the following message at the end:
#		"End of shell archive."
# Contents:  Readme Copyright Lflshift.s Sflshift.s
# Wrapped by root@empire on Sun Jul 17 15:28:15 1988
PATH=/bin:/usr/bin:/usr/ucb ; export PATH
if test -f 'Readme' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'Readme'\"
else
echo shar: Extracting \"'Readme'\" \(2454 characters\)
sed "s/^X//" >'Readme' <<'END_OF_FILE'
XIntroduction
X------------
X
XThe routines in the files Sflshift.s and Lflshift.s are high speed 32 bit
Xshift subroutines intended to replace the library routines _lshr and _lshl.
XThese routines will give a noticable speed increase in those programs which
Xmake heavy use of bit shifts on long integers.
X
XOn 80x86 Xenix routines running the Microsoft compiler, and some others,
Xthe standard 32 bit shift routines are optimized for size rather than
Xspeed.  The algorithm Microsoft chose is essentially "shift one bit, loop".
XThis makes it dreadfully slow for long bit shifts.
X
XMy routines achieve their speed increase by replacing the loop structure with
Xtwo or three integer shift instructions and some minor calculations.  To enable
Xthe use of the 16 bit shift instructions, I had to break up the long into
Xtwo 16 bit chunks and operate on each seperately.  Each routine is comprised
Xof two parts:  a part for shifts of less than 16 bits, and one for > 16 bit 
Xshifts.
X
XDo to the longer algorithm required for < 16 bit shifts, you will notice 
Xthat these routines will be slower than the library routines for small bit
Xshifts, equivalent for about 5 bit shifts, and faster for 6 shifts and larger.
XPast the 16 bit shift mark, the routines really gain in speed since only two
Xinteger shifts are required to achieve the shift as opposed to 3 shifts and
Xan or for < 16 bit shifts.
X
XI had originally written these routines to enhance the speed of compress in
X16 bit mode.  They did:  I acheived a speed increase of about 24%.  (For
Xthose of you who have never hacked the compress sources, compress uses
Xa LOT of 32 bit shifts to get the job done.)
X
XTo install these routines, simply compile your original code and link these 
Xroutines in with the rest of your code.  It's as simple as that.
X
XJohn P. Silva,
XInova Products
X
XUUCP:	ucbvax!cogsci!jsilva
XDOMAIN:	jsilva@cogsci.berkeley.edu
X
XCopyright Notice
X----------------
X
X	This code is NOT in the public domain.  It is a copyrighted work,
X	and as such is protected by law.
X
X	Inova Products places no restrictions on distribution or noncommercial
X	use of this product, as long as this notice is left intact.
X	Commercial use is prohibited without express written permission of 
X	the Author.
X
X	Inova Products IS NOT RESPONSIBLE for damages incurred through use
X	of this package.  We make no warranty of fitness for any particular
X	application, nor that the code actually works as intended.
X
X	Use at your own risk.
X
END_OF_FILE
if test 2454 -ne `wc -c <'Readme'`; then
    echo shar: \"'Readme'\" unpacked with wrong size!
fi
# end of 'Readme'
fi
if test -f 'Copyright' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'Copyright'\"
else
echo shar: Extracting \"'Copyright'\" \(638 characters\)
sed "s/^X//" >'Copyright' <<'END_OF_FILE'
X	Copyright Notice
X	----------------
X
X	Written by John P. Silva
X	Copyright 1988 by Inova Products
X
X	This code is NOT in the public domain.  It is a copyrighted work,
X	and as such is protected by law.
X
X	Inova Products places no restrictions on distribution or noncommercial
X	use of this product, as long as this notice is left intact.
X	Commercial use is prohibited without express written permission of 
X	the Author.
X
X	Inova Products IS NOT RESPONSIBLE for damages incurred through use
X	of this package.  We make no warranty of fitness for any particular
X	application, nor that the code actually works as intended.
X
X	Use at your own risk.
X
END_OF_FILE
if test 638 -ne `wc -c <'Copyright'`; then
    echo shar: \"'Copyright'\" unpacked with wrong size!
fi
# end of 'Copyright'
fi
if test -f 'Lflshift.s' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'Lflshift.s'\"
else
echo shar: Extracting \"'Lflshift.s'\" \(2894 characters\)
sed "s/^X//" >'Lflshift.s' <<'END_OF_FILE'
X;	Faster 32 bit shift routines for 8086 family processors
X;	Written by John P. Silva
X;	Copyright 1988 by Inova Products
X;
X;	This module is written for use with the Medium, Large and Huge
X;	memory models.
X;
X;	Copyright Notice
X;	----------------
X;
X;	This code is NOT in the public domain.  It is a copyrighted work,
X;	and as such is protected by law.
X;
X;	Inova Products places no restrictions on distribution or noncommercial
X;	use of this product, as long as this notice is left intact.
X;	Commercial use is prohibited without express written permission of 
X;	the Author.
X;
X;	Inova Products IS NOT RESPONSIBLE for damages incurred through use
X;	of this package.  We make no warranty of fitness for any particular
X;	application, nor that the code actually works as intended.
X;
X;	Use at your own risk.
X;
X		TITLE flshift
X
XFLSHIFT_TEXT	SEGMENT	BYTE PUBLIC 'CODE'
XFLSHIFT_TEXT	ENDS
X_DATA		SEGMENT	WORD PUBLIC 'DATA'
X_DATA		ENDS
XCONST		SEGMENT	WORD PUBLIC 'CONST'
XCONST		ENDS
X_BSS		SEGMENT	WORD PUBLIC 'BSS'
X_BSS		ENDS
XDGROUP		GROUP	CONST,	_BSS,	_DATA
X		ASSUME	CS: FLSHIFT_TEXT, DS: DGROUP, SS: DGROUP, ES: DGROUP
X_DATA		SEGMENT
X_DATA		ENDS
X_BSS		SEGMENT
X_BSS		ENDS
XFLSHIFT_TEXT	SEGMENT
X
X;	register di = general temporary
X;	register si = shift count save
X
X	PUBLIC	__lshr
X__lshr	PROC FAR
X	push	di
X	push	si
X	xor	ch,ch		;Clear hi byte of cx register
X	mov	si,cx		;Save shift count in si reg
X	cmp	cx,16		;Should we use the 32bit shifter?
X	jge	SHORT LSHR_32
X	mov	di,dx		;Figure bits to be moved to low byte
X	mov	cx,16
X	sub	cx,si
X	shl	di,cl		;di now contains bits to be ored later
X	mov	cx,si
X	sar	dx,cl		;Perform arithmetic shift on high byte
X	shr	ax,cl		;Perform arithmetic shift on low byte
X	or	ax,di		;Replace saved bits into low byte
X	jmp	SHORT LSHR_ex
XLSHR_32:		;Shift routine for >16 bit shifts
X	xor	di,di		;Calculate artificial sign extension
X	test	dh,80h		;If dx is negative, di should be 0
X	jne	SHORT LSHR_32a
X	dec	di		;Make di -1
XLSHR_32a:
X	lea	cx,[si-16]	;Calculate amount to shift
X	sar	dx,cl		;Arithmetically shift high byte
X	mov	ax,dx		;Place freshly shifted high byte into low
X	mov	dx,di		;And place artifical sign extension into high
XLSHR_ex:
X	pop	si
X	pop	di
X	ret
X__lshr	ENDP
X
X	PUBLIC	__lshl
X__lshl	PROC FAR
X	push	di
X	push	si
X	xor	ch,ch		;Clear hi byte of cx register
X	mov	si,cx		;Save shift count in si reg
X	cmp	cx,16		;Should we use the 32bit shifter?
X	jge	SHORT LSHL_32
X	mov	di,ax		;Figure bits to be moved to high byte
X	mov	cx,16
X	sub	cx,si
X	shr	di,cl		;di now contains bits to be ored later
X	mov	cx,si
X	shl	ax,cl		;Shift low byte
X	shl	dx,cl		;Shift high byte
X	or	dx,di		;Replace saved bits into high byte
X	jmp	SHORT LSHL_ex
XLSHL_32:		;Shift routine for >16 bit shifts
X	lea	cx,[si-16]	;Calculate amount to shift
X	shl	ax,cl		;Shift low byte
X	mov	dx,ax		;Place freshly shifted low byte into high
X	xor	ax,ax		;And zero low byte
XLSHL_ex:
X	pop	si
X	pop	di
X	ret
X__lshl	ENDP
X
XFLSHIFT_TEXT	ENDS
XEND
END_OF_FILE
if test 2894 -ne `wc -c <'Lflshift.s'`; then
    echo shar: \"'Lflshift.s'\" unpacked with wrong size!
fi
# end of 'Lflshift.s'
fi
if test -f 'Sflshift.s' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'Sflshift.s'\"
else
echo shar: Extracting \"'Sflshift.s'\" \(2881 characters\)
sed "s/^X//" >'Sflshift.s' <<'END_OF_FILE'
X;	Faster 32 bit shift routines for 8086 family processors
X;	Written by John P. Silva
X;	Copyright 1988 by Inova Products
X;
X;	This module is written to be used only in the Small memory model.
X;
X;	Copyright Notice
X;	----------------
X;
X;	This code is NOT in the public domain.  It is a copyrighted work,
X;	and as such is protected by law.
X;
X;	Inova Products places no restrictions on distribution or noncommercial
X;	use of this product, as long as this notice is left intact.
X;	Commercial use is prohibited without express written permission of 
X;	the Author.
X;
X;	Inova Products IS NOT RESPONSIBLE for damages incurred through use
X;	of this package.  We make no warranty of fitness for any particular
X;	application, nor that the code actually works as intended.
X;
X;	Use at your own risk.
X;
X	TITLE	flshift
X
XFLSHIFT_TEXT	SEGMENT	BYTE PUBLIC 'CODE'
XFLSHIFT_TEXT	ENDS
X_DATA		SEGMENT	WORD PUBLIC 'DATA'
X_DATA		ENDS
XCONST		SEGMENT	WORD PUBLIC 'CONST'
XCONST		ENDS
X_BSS		SEGMENT	WORD PUBLIC 'BSS'
X_BSS		ENDS
XDGROUP		GROUP	CONST,	_BSS,	_DATA
X		ASSUME	CS: FLSHIFT_TEXT, DS: DGROUP, SS: DGROUP, ES: DGROUP
X_DATA		SEGMENT
X_DATA		ENDS
X_BSS		SEGMENT
X_BSS		ENDS
XFLSHIFT_TEXT	SEGMENT
X
X;	register di = general temporary
X;	register si = shift count save
X
X	PUBLIC	__lshr
X__lshr	PROC NEAR
X	push	di
X	push	si
X	xor	ch,ch		;Clear hi byte of cx register
X	mov	si,cx		;Save shift count in si reg
X	cmp	cx,16		;Should we use the 32bit shifter?
X	jge	SHORT LSHR_32
X	mov	di,dx		;Figure bits to be moved to low byte
X	mov	cx,16
X	sub	cx,si
X	shl	di,cl		;di now contains bits to be ored later
X	mov	cx,si
X	sar	dx,cl		;Perform arithmetic shift on high byte
X	shr	ax,cl		;Perform arithmetic shift on low byte
X	or	ax,di		;Replace saved bits into low byte
X	jmp	SHORT LSHR_ex
XLSHR_32:		;Shift routine for >16 bit shifts
X	xor	di,di		;Calculate artificial sign extension
X	test	dh,80h		;If dx is negative, di should be 0
X	jne	SHORT LSHR_32a
X	dec	di		;Make di -1
XLSHR_32a:
X	lea	cx,[si-16]	;Calculate amount to shift
X	sar	dx,cl		;Arithmetically shift high byte
X	mov	ax,dx		;Place freshly shifted high byte into low
X	mov	dx,di		;And place artifical sign extension into high
XLSHR_ex:
X	pop	si
X	pop	di
X	ret
X__lshr	ENDP
X
X	PUBLIC	__lshl
X__lshl	PROC NEAR
X	push	di
X	push	si
X	xor	ch,ch		;Clear hi byte of cx register
X	mov	si,cx		;Save shift count in si reg
X	cmp	cx,16		;Should we use the 32bit shifter?
X	jge	SHORT LSHL_32
X	mov	di,ax		;Figure bits to be moved to high byte
X	mov	cx,16
X	sub	cx,si
X	shr	di,cl		;di now contains bits to be ored later
X	mov	cx,si
X	shl	ax,cl		;Shift low byte
X	shl	dx,cl		;Shift high byte
X	or	dx,di		;Replace saved bits into high byte
X	jmp	SHORT LSHL_ex
XLSHL_32:		;Shift routine for >16 bit shifts
X	lea	cx,[si-16]	;Calculate amount to shift
X	shl	ax,cl		;Shift low byte
X	mov	dx,ax		;Place freshly shifted low byte into high
X	xor	ax,ax		;And zero low byte
XLSHL_ex:
X	pop	si
X	pop	di
X	ret
X__lshl	ENDP
X
XFLSHIFT_TEXT	ENDS
XEND
END_OF_FILE
if test 2881 -ne `wc -c <'Sflshift.s'`; then
    echo shar: \"'Sflshift.s'\" unpacked with wrong size!
fi
# end of 'Sflshift.s'
fi
echo shar: End of shell archive.
exit 0


---
UUCP:	ucbvax!cogsci!jsilva
DOMAIN:	jsilva@cogsci.berkeley.edu