u9039899@cs.uow.edu.au (Darrin Jon Smart) (04/25/91)
robjung@world.std.com (Robert K Jung) writes: >ARJ is coded in ANSI C for porting purposes. However, ARJ 2.00 for the >IBM PC is optimized to assembler to more than double the compressor >and extractor speeds. I am sorry if this information was buried in >my documentation. (I maintain two source libraries). Here's an idea: How about compiling a C library for compression and decompression only so that programmers can compress raw data without having to shell out to other programs. The idea is that one routine takes a block of data and returns a compressed block of data. Would that be difficult? I remember this was done by an LZW compression writer for CP/M, his modules were pure assembler though. - Darrin
mamba@csd4.csd.uwm.edu (Paul A Deisinger) (04/25/91)
In article <1991Apr25.054249.28057@cs.uow.edu.au> u9039899@cs.uow.edu.au (Darrin Jon Smart) writes: >Here's an idea: How about compiling a C library for compression and >decompression only so that programmers can compress raw data without having >to shell out to other programs. The idea is that one routine takes a block >of data and returns a compressed block of data. Would that be difficult? > >I remember this was done by an LZW compression writer for CP/M, his modules >were pure assembler though. This has been done. There is a library of routines from PKware (Makers of PKzip). These are the implode() and explode() algorythms. They are related to but not identical to the routines used in PKzip with the same name (The library has an adjustable dictionary size of 1,2 or 4k, the one in PKzip has an 8k dictionary) These routines are used by creating a read() and write() routine of your own and passing the addresses of these routines to the library routines which then read and write data as needed. This means you can structure your routines to take and put the data wherever you want, from a buffer to a buffer, a file to the modem, whatever. Disclaimer: I dont' work for PKware, but I have a freind who does. -- Paul Deisinger mamba@csd4.csd.uwm.edu
juul@diku.dk (Anders Juul Munch) (04/27/91)
u9039899@cs.uow.edu.au (Darrin Jon Smart) writes: >robjung@world.std.com (Robert K Jung) writes: > >>ARJ is coded in ANSI C for porting purposes. However, ARJ 2.00 for the >>IBM PC is optimized to assembler to more than double the compressor >>and extractor speeds. I am sorry if this information was buried in >>my documentation. (I maintain two source libraries). >Here's an idea: How about compiling a C library for compression and >decompression only so that programmers can compress raw data without having >to shell out to other programs. The idea is that one routine takes a block >of data and returns a compressed block of data. Would that be difficult? >I remember this was done by an LZW compression writer for CP/M, his modules >were pure assembler though. Like this one? I got this by ftp from the University of Adelaide, after the author announced it in comp.compression. According to the author, it compresses approximately as good as UNIX compress, but is much faster. -- Anders Munch /******************************************************************************/ /* Start of LZRW1.C */ /******************************************************************************/ THE LZRW1 ALGORITHM =================== Author : Ross N. Williams. Date : 31-Mar-1991. 1. I typed the following code in from my paper "An Extremely Fast Data Compression Algorithm", Data Compression Conference, Utah, 7-11 April, 1991. The fact that this code works indicates that the code in the paper is OK. 2. This file has been copied into a test harness and works. 3. Some users running old C compilers may wish to insert blanks around the "=" symbols of assignments so as to avoid expressions such as "a=*b;" being interpreted as "a=a*b;" 4. This code is public domain. 5. Warning: This code is non-deterministic insofar as it may yield different compressed representations of the same file on different runs. (However, it will always decompress correctly to the original). 6. If you use this code in anger (e.g. in a product) drop me a note at ross@spam.ua.oz.au and I will put you on a mailing list which will be invoked if anyone finds a bug in this code. 7. The internet newsgroup comp.compression might also carry information on this algorithm from time to time. /******************************************************************************/ #define UBYTE unsigned char /* Unsigned byte (1 byte ) */ #define UWORD unsigned int /* Unsigned word (2 bytes) */ #define ULONG unsigned long /* Unsigned longword (4 bytes) */ #define FLAG_BYTES 4 /* Number of bytes used by copy flag. */ #define FLAG_COMPRESS 0 /* Signals that compression occurred. */ #define FLAG_COPY 1 /* Signals that a copyover occurred. */ void fast_copy(p_src,p_dst,len) /* Fast copy routine. */ UBYTE *p_src,*p_dst; {while (len--) *p_dst++=*p_src++;} /******************************************************************************/ void lzrw1_compress(p_src_first,src_len,p_dst_first,p_dst_len) /* Input : Specify input block using p_src_first and src_len. */ /* Input : Point p_dst_first to the start of the output zone (OZ). */ /* Input : Point p_dst_len to a ULONG to receive the output length. */ /* Input : Input block and output zone must not overlap. */ /* Output : Length of output block written to *p_dst_len. */ /* Output : Output block in Mem[p_dst_first..p_dst_first+*p_dst_len-1]. */ /* Output : May write in OZ=Mem[p_dst_first..p_dst_first+src_len+256-1].*/ /* Output : Upon completion guaranteed *p_dst_len<=src_len+FLAG_BYTES. */ UBYTE *p_src_first,*p_dst_first; ULONG src_len,*p_dst_len; #define PS *p++!=*s++ /* Body of inner unrolled matching loop. */ #define ITEMMAX 16 /* Maximum number of bytes in an expanded item. */ {UBYTE *p_src=p_src_first,*p_dst=p_dst_first; UBYTE *p_src_post=p_src_first+src_len,*p_dst_post=p_dst_first+src_len; UBYTE *p_src_max1=p_src_post-ITEMMAX,*p_src_max16=p_src_post-16*ITEMMAX; UBYTE *hash[4096],*p_control; UWORD control=0,control_bits=0; *p_dst=FLAG_COMPRESS; p_dst+=FLAG_BYTES; p_control=p_dst; p_dst+=2; while (TRUE) {UBYTE *p,*s; UWORD unroll=16,len,index; ULONG offset; if (p_dst>p_dst_post) goto overrun; if (p_src>p_src_max16) {unroll=1; if (p_src>p_src_max1) {if (p_src==p_src_post) break; goto literal;}} begin_unrolled_loop: index=((40543*((((p_src[0]<<4)^p_src[1])<<4)^p_src[2]))>>4) & 0xFFF; p=hash[index]; hash[index]=s=p_src; offset=s-p; if (offset>4095 || p<p_src_first || offset==0 || PS || PS || PS) {literal: *p_dst++=*p_src++; control>>=1; control_bits++;} else {PS || PS || PS || PS || PS || PS || PS || PS || PS || PS || PS || PS || PS || s++; len=s-p_src-1; *p_dst++=((offset&0xF00)>>4)+(len-1); *p_dst++=offset&0xFF; p_src+=len; control=(control>>1)|0x8000; control_bits++;} end_unrolled_loop: if (--unroll) goto begin_unrolled_loop; if (control_bits==16) {*p_control=control&0xFF; *(p_control+1)=control>>8; p_control=p_dst; p_dst+=2; control=control_bits=0;} } control>>=16-control_bits; *p_control++=control&0xFF; *p_control++=control>>8; if (p_control==p_dst) p_dst-=2; *p_dst_len=p_dst-p_dst_first; return; overrun: fast_copy(p_src_first,p_dst_first+FLAG_BYTES,src_len); *p_dst_first=FLAG_COPY; *p_dst_len=src_len+FLAG_BYTES; } /******************************************************************************/ void lzrw1_decompress(p_src_first,src_len,p_dst_first,p_dst_len) /* Input : Specify input block using p_src_first and src_len. */ /* Input : Point p_dst_first to the start of the output zone. */ /* Input : Point p_dst_len to a ULONG to receive the output length. */ /* Input : Input block and output zone must not overlap. User knows */ /* Input : upperbound on output block length from earlier compression. */ /* Input : In any case, maximum expansion possible is eight times. */ /* Output : Length of output block written to *p_dst_len. */ /* Output : Output block in Mem[p_dst_first..p_dst_first+*p_dst_len-1]. */ /* Output : Writes only in Mem[p_dst_first..p_dst_first+*p_dst_len-1]. */ UBYTE *p_src_first, *p_dst_first; ULONG src_len, *p_dst_len; {UWORD controlbits=0, control; UBYTE *p_src=p_src_first+FLAG_BYTES, *p_dst=p_dst_first, *p_src_post=p_src_first+src_len; if (*p_src_first==FLAG_COPY) {fast_copy(p_src_first+FLAG_BYTES,p_dst_first,src_len-FLAG_BYTES); *p_dst_len=src_len-FLAG_BYTES; return;} while (p_src!=p_src_post) {if (controlbits==0) {control=*p_src++; control|=(*p_src++)<<8; controlbits=16;} if (control&1) {UWORD offset,len; UBYTE *p; offset=(*p_src&0xF0)<<4; len=1+(*p_src++&0xF); offset+=*p_src++&0xFF; p=p_dst-offset; while (len--) *p_dst++=*p++;} else *p_dst++=*p_src++; control>>=1; controlbits--; } *p_dst_len=p_dst-p_dst_first; } /******************************************************************************/ /* End of LZRW1.C */ /******************************************************************************/