[comp.os.minix] Comic: MSDOS patches and performance

aas@brolga.cc.uq.oz.au (Alex Sergejew) (08/01/90)

     I have ported comic to MSDOS with ease and append context diffs and
a (m|t)asm-compatible version of memrchr.  Apart from syntactical changes
memrchr has two changes worth considering for the minix 8088 version:
the cld to restore direction flags is important, and duplicating the exit
code thus avoiding an extra jump saves over 1% in compression time (!)
     Two tables follow giving compression times and sizes for a variety of
MSDOS compression/archiving programs on an 8 MHz XT clone for ascii and binary
files.  Comic's compression performance is impressive, particularly on text
files.  The difference in speed between Turbo C 2.0 and MSC 5.1 compiled
versions is remarkable, as is the speed-up with an assembly-language version
of memrchr.  It may also be encouraging to see the 3-fold speedup in lharc
between the all C version (1.0) and the standard MSDOS version (1.13c) with
assembly language routines and some algorithmic fine-tuning;  I hope
comic can be improved in speed at least as much!

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Table 1.

Times to compress comic.sha: a 70843 byte MSDOS text shar file of the
comic sources, listed in increasing compression file size order.
For the programs that store additional directory information within
the compressed file the `compress size' column has appended in parentheses
the actual compressed file size as well.

Program/Version		 time	un-time		compress/total size
-------------------------------------------------------------------
comic 1.0B/tcc 2.0	 1285	   20		21451
comic 1.0B/msc 5.1	  946	   18		21451
comic 1.0B/msc+masm	  750	   17		21451
PKzip 1.10		   23	    5		21730(21618)
pak 2.10		   33	    9		22056(22014)
lharc 1.0/msc 5.1   (*1)  134	   30		23301(23269)
lharc 1.0/msc 5.1   (*2)  111	   24		23301(23269)
lharc 1.13c		   42	   13		23303(23271)
lharc 1.13b/tcc+tasm	   42	   13		23303(23721)
larc 3.33		   47	    9		26951(26919)
compress 4.00 -b 16 (*3)   49	   31		29310
compress 4.2  -b 16 (*4)   15	    8		29598
PKarc 3.6		    7	    5		30717(30686)
PKpak 3.61		    7	    6		30717(30686)
dwc A5.01		    8	    6		30995(30934)
compress 4.00 -b 13 	   47	   31		31194
compress 4.2  -b 13	   17	    9		31853
mdcd 1.0		   17	   11		31951(31829)
zoo 2.01		   15	    9		31998(31829)
arc 5.10		   68	   32		37847(37816)


Notes:
(*1) This is C Lharc 1.00 compiled with msc 5.1 "out of the box".
(*2) This is C Lharc 1.00 as distributed in comp.os.minix compiled
     with msc 5.1 with full optimization.
(*3) This is compress 4.00 compiled with msc 5.1 "out of the box".
(*4) This is compress 4.2 as modified by Doug Graham so that the hash table
     is limited to 64K and so that it runs faster under MS-DOS.


Table 2.

Times to compress comic.exe: the 28377 byte MSDOS binary executable file.
Entries are listed in the same order as Table 1.

Program/Version		 time	un-time		compress/total size
-------------------------------------------------------------------
comic 1.0B/tcc 2.0	 1394	   14		17613
comic 1.0B/msc 5.1	 1055	   13		17613
comic 1.0B/msc+masm	  857	   12		17613
PKzip 1.10		   13	    3		17454(17342)
pak 2.10		   15	    6		17914(17872)
lharc 1.0/msc 5.1	   47	   25		17395(17363)
lharc 1.0/msc 5.1	   41	   20		17395(17363)
lharc 1.13c		   19	   11		17395(17363)
lharc 1.13b/tcc+tasm	   19	   11		17395(17363)
larc 3.33		   15	    5		19669(19637)
compress 4.00 -b 16	   28	   16		22289
compress 4.2  -b 16	    9	    5		22289
PKarc 3.6		    6	    4		21637(21606)
PKpak 3.61		    6	    4		21637(21606)
dwc A5.01		    5	    4		21894(21833)
compress 4.00 -b 13 	   25	   17		25483		(*1)
compress 4.2  -b 13	    9	    5		21590
mdcd 1.0		   12	    7		21706(21854)
zoo 2.01		   10	    5		22015(21846)
arc 5.10		   36	   17		23993(23962)


Notes:
(*1) I realize that this is very anomalous but I've checked it.


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The following are context diffs to the distributed comic ver 1.00 sources
to allow compilation under MSDOS MS C 5.1: included are a masm version
of the memrchr routine and a simple makefile.

*** f:comic_h.bak	Mon Jul 16 10:49:22 1990
--- comic.h	Mon Jul 16 19:36:34 1990
***************
*** 17,22 ****
--- 17,26 ----
  #include <assert.h>
  #include <stdio.h>
  
+ #ifdef MSC
+ # include <stdlib.h>
+ #endif
+ 
  /*
  ** Compress constands.
  */
*** f:comic_c.bak	Mon Jul 16 10:42:06 1990
--- comic.c	Mon Jul 16 12:01:20 1990
***************
*** 11,17 ****
--- 11,19 ----
  
  #ifdef DOS
  # define TTY	"CON"		/* The console device under SM-DOS. */
+ #if 0
  # define isatty(i)	1	/* Everything is a tty under MES-DOS. */
+ #endif
  #else
  # define TTY	"/dev/tty"
  #endif /* DOS */
***************
*** 56,70 ****
    char *p;
  #ifdef DOS
    /*  
!   ** DOS needs special atention. Upercase letters and leading path 
!   ** must be removed. 
    */
    p = strrchr (name, '\\');
    if (p != (char *) NULL) 
  	name = p + 1;			/* Kill path prefix. */
!   for (p = name; *p != '\0'; p++)
  	if (isupper (*p)) 
  		*p = tolower (*p);	/* Make name lowercase. */
  #else
    p = strrchr (name, '/');
    if (p != (char *) NULL) 
--- 58,77 ----
    char *p;
  #ifdef DOS
    /*  
!   ** DOS needs special atention. Uppercase letters and leading path 
!   ** must be removed, along with the trailing '.exe'.
    */
    p = strrchr (name, '\\');
    if (p != (char *) NULL) 
  	name = p + 1;			/* Kill path prefix. */
!   for (p = name; *p != '\0'; p++) {
  	if (isupper (*p)) 
  		*p = tolower (*p);	/* Make name lowercase. */
+ 	if (*p == '.') {
+ 		*p = '\0';		/* Kill name suffix. */
+ 		break;
+ 	}
+   }
  #else
    p = strrchr (name, '/');
    if (p != (char *) NULL) 
***************
*** 167,172 ****
--- 174,182 ----
  	fflush (stderr);
  	return 0;		/* Report error. */
    }
+ #ifdef DOS
+   (void) setmode (0, O_BINARY);		/* Set stdin to binary mode */
+ #endif /* DOS */
  
    /*
    ** If the cat flag is set, don't reopen stdout.
***************
*** 200,205 ****
--- 210,218 ----
  	fflush (stderr);
  	return 0;		/* Report fail. */
    }
+ #ifdef DOS
+   (void) setmode (1, O_BINARY);		/* Set stdout to binary mode */
+ #endif /* DOS */
  
    return 1;		/* Return success. */
  }
*** f:bits.bak	Mon Jul 16 10:42:04 1990
--- bits.c	Mon Jul 16 11:43:32 1990
***************
*** 94,100 ****
  */
  static void write_io_buff ()
  {
! #ifdef DOS
    (void) setmode (1, O_BINARY);		/* Fails some times. ;-( */
  #endif /* DOS */
    if (write (1, io_buff, io_index) != io_index)
--- 94,100 ----
  */
  static void write_io_buff ()
  {
! #ifdef DOS_NEVER
    (void) setmode (1, O_BINARY);		/* Fails some times. ;-( */
  #endif /* DOS */
    if (write (1, io_buff, io_index) != io_index)
***************
*** 108,114 ****
  */
  static void read_io_buff ()
  {
! #ifdef DOS
    (void) setmode (0, O_BINARY);		/* Fails for me. ;-( */
  #endif /* DOS */
  
--- 108,114 ----
  */
  static void read_io_buff ()
  {
! #ifdef DOS_NEVER
    (void) setmode (0, O_BINARY);		/* Fails for me. ;-( */
  #endif /* DOS */
  
*** /dev/null
--- memrchr.asm	Wed Aug 01 15:52:46 1990
***************
*** 0 ****
--- 1,71 ----
+ 	page	60, 132
+ 	title	memrchr - reverse memory search for a char
+ 
+ ;  Memrchr: Look for a char. 
+ ;  (p) Jan-Mark Wams					(email: jms@cs.vu.nl)
+ ;  Usage memrchr (char *start, char *stop, char c); 
+ ;  Return value: 0 if char is not found, a pointer to the char other wise. 
+ ;  Note the lib function `memchr()' uses a count, we a stop pointer. 
+ ;  
+ ;  Modified for use with MSC 5.1 and masm: Alex A Sergejew (aas@cc.uq.oz.au)
+ ;  Small Model only!
+ ;  Assemble with (m|t)asm -mx memrchr...
+ 
+ _text	segment	word public 'code'
+ _text	ends
+ _data	segment word public 'data'
+ _data	ends
+ _bss	segment word public 'bss'
+ _bss	ends
+ dgroup	group	_bss, _data
+ 	assume	cs:_text, ds:dgroup, ss:dgroup
+ 
+ _text	segment
+ 	public	 _memrchr	;  Make it accessible for others. 
+ 
+ _memrchr proc near
+ 	push	bp		;  C prolog
+ 	mov	bp, sp		;  
+ 	push	si		;  Get ready  
+ 	push	di		;  for cdsret 
+ ;	push	cx
+ 
+ 	mov	di, 4[bp]	;  Get start 
+ 	mov	si, 6[bp]	;  Get stop. 
+ 	mov	al, 8[bp]	;  Get c in al. 
+ 
+ 	mov	ah, [si]	;  Save original value
+ 	mov	[si], al	;  Put sentinel in place.
+ 
+ 	std			;  Count down, ie. di-- (NO: pushf, popf!)
+ 	mov	cx, -1		;  Set counter to max.
+ 	repne	scasb		;  Loop till byte found.
+ 	inc	di		;  Hip hip... ;-(
+ 	cld			;  Restore direction flag
+ 
+ FOUND:
+ 	mov	[si], ah	;  Restore original value
+ 	cmp	di, si		;  End reached? 
+ 	je	NOTFOUND	;  IF so no match. 
+ 
+ 	mov	ax, di		;  return found loc
+ ;;;	jmp	_cdsret		;;; skip extra jump....
+ ;	pop	cx
+ 	pop	di
+ 	pop	si
+ 	pop	bp
+ 	ret
+ 
+ NOTFOUND:
+ 	xor	ax, ax		;  return (char *) NULL 
+ _cdsret:
+ ;	pop	cx
+ 	pop	di
+ 	pop	si
+ 	pop	bp
+ 	ret
+ 
+ _memrchr endp
+ 
+ _text	ends
+ 	end
*** /dev/null
--- makefile.msc	Mon Jul 16 21:29:02 1990
***************
*** 0 ****
--- 1,44 ----
+ #
+ # Makefile for comic
+ #
+ # (p) Jan-Mark Wams 					(email: jsm@cs.vu.nl)
+ #
+ # MSDOS MS C v5.1 version
+ #
+ 
+ CC=cl
+ AS=masm
+ AFLAGS=-mx
+ CFLAGS=-AS -Ox -Gs -DDOS -DMSC -nologo #-DNDEBUG
+ LFLAGS=-Gs -Fecomic.exe
+ 
+ OBJS=bits.obj comic.obj decode.obj encode.obj getinput.obj gettoken.obj \
+ 	header.obj huffman.obj
+ 
+ comic.exe: $(OBJS) memrchr.obj
+ 	$(CC) $(LFLAGS) $(OBJS) memrchr.obj
+ 
+ .c.obj:
+ 	$(CC) $(CFLAGS) -c $<
+ 
+ 
+ bits.obj: bits.h comic.h bits.c
+ 
+ comic.obj: buffer.h comic.h comic.c
+ 
+ decode.obj: bits.h buffer.h comic.h huffman.h decode.c header.h
+ 
+ encode.obj: bits.h buffer.h comic.h huffman.h encode.c header.h
+ 
+ getinput.obj: buffer.h comic.h getinput.c
+ 
+ gettoken.obj: buffer.h comic.h gettoken.c
+ 
+ huffman.obj: comic.h huffman.h tables.h huffman.c
+ 
+ cmemrchr.obj: comic.h
+ 
+ header.obj: comic.h header.h header.c
+ 
+ memrchr.obj: memrchr.asm
+ 	$(AS) $(AFLAGS) $*;

jms@cs.vu.nl (Jan-Mark) (08/05/90)

In article <1990Aug1.091912.10529@brolga.cc.uq.oz.au>,
	aas@brolga.cc.uq.oz.au (Alex Sergejew) writes:

>      I have ported comic to MSDOS with ease and append context diffs and
> a (m|t)asm-compatible version of memrchr.  Apart from syntactical changes
> memrchr has two changes worth considering for the minix 8088 version:
> the cld to restore direction flags is important, and duplicating the exit
> code thus avoiding an extra jump saves over 1% in compression time (!)

	Nice, but why didn't you email me first. The 1.0 version outdates
	your patches. But I would like 1.0 for MSC. I don't have MSC. ;-(

	SO: DON'T TRY THE PATCHES ON COMIC 1.0. AND PATCHING COMIC 1.0B
	WILL NOT GIVE YOU 1.0.


--

				 (:>	jms
				(_)
			========""======