nall@sun8.scri.fsu.edu (John Nall) (07/10/90)
This may be of interest to some, so I'll post it. (We have a fairly aggressive version of Pnews here, which makes one feel guilty if one posts something which may not be of general interest. It keeps asking questions like "Are you *sure* that people want to see this?" and "This costs money, you know. Are you *sure* you should do it??") The fairly significant increase in the compression ratio of Comic is attractive, so I played with it some on our Cray Y-MP to see how it does there. It is horrible! (In terms of speed. Worked OK, though, with no changes, under Unicos). The compression is good enough, however, so that it would bear some further investigation. Two possibilities arise (which are not necessarily non-exclusive): (a) vectorizing the code, and (b) profiling it and then going to assembly language for those parts which show up as chewing up all the execution time. If anyone else is doing the same thing, I would be interested in their results. -- John W. Nall | Supercomputation Computations Research Institute nall@sun8.scri.fsu.edu | Florida State University, Tallahassee, FL 32306 "Los dioses eran computadores. ?Que otra cosa podian ser??"
kirkenda@eecs.cs.pdx.edu (Steve Kirkendall) (07/11/90)
In article <228@sun13.scri.fsu.edu> nall@sun8.scri.fsu.edu (John Nall) writes: > >The fairly significant increase in the compression ratio of >Comic is attractive, so I played with it some on our Cray Y-MP >to see how it does there. It is horrible! (In terms of speed. >Worked OK, though, with no changes, under Unicos). ^^^^^^ Damn, I was hoping you had ported Minix to your Cray. :-) of course. The biggest bottleneck is the memrchr() function -- which has already been converted to assembler for the intel crowd. I wrote a version in 68000 assembler (included below) and this cut the compression time in half. It still runs like a one-legged cow, but at least now its a HEALTHIER one-legged cow. The next problem is that comic outputs the compressed data one bit at a time, via the putbit() function. It uses a 1024-byte output buffer, so we don't have a huge system overhead, but still we invest a lot of computing time to the processing of each individual bit. There is also a function for outputting several bits at once, but it just call putbit() repeatedly, in a loop. I suspect that we could gain something by rewriting putnbits() (or whatever its called) to do the bit stuffing directly instead of calling putbit(). ------------------------------------------------------------------------------- Steve Kirkendall kirkenda@cs.pdx.edu uunet!tektronix!psueea!eecs!kirkenda echo x - memrchr68.s sed '/^X/s///' > memrchr68.s << '/' X .sect .text X .sect .rom X .sect .data X .sect .bss X .list X .sect .text X X!-------------------------------------------------------------------------! X! Minix-68k version of the memrchr() function, in 68000 assembly language ! X! Written by Steve Kirkendall, 1990 July 10 ! X!-------------------------------------------------------------------------! X XSTART = 4 ! stack offsets of arguments XSTOP = 8 ! XCH = 13 ! X X_memrchr: ! start function X .globl _memrchr ! X !- X move.l START(sp),a0 ! a0 is "start" X move.l STOP(sp),a1 ! a1 is "stop" X move.b CH(sp),d0 ! d0 is "ch" X !- X move.b (a1),d1 ! d1 holds the value that the sentinal clobbers X !- X move.b d0,(a1) ! install the sentinal X !- X add.w #1,a0 ! compensate for the initial predecrement X !- Xloop: ! X cmp.b -(a0),d0 ! unrolled search loop X beq found ! X cmp.b -(a0),d0 ! X beq found ! X cmp.b -(a0),d0 ! X beq found ! X cmp.b -(a0),d0 ! X beq found ! X cmp.b -(a0),d0 ! X beq found ! X cmp.b -(a0),d0 ! X beq found ! X cmp.b -(a0),d0 ! X beq found ! X cmp.b -(a0),d0 ! X beq found ! X cmp.b -(a0),d0 ! X beq found ! X cmp.b -(a0),d0 ! X beq found ! X cmp.b -(a0),d0 ! X beq found ! X cmp.b -(a0),d0 ! X beq found ! X cmp.b -(a0),d0 ! X beq found ! X cmp.b -(a0),d0 ! X beq found ! X cmp.b -(a0),d0 ! X beq found ! X cmp.b -(a0),d0 ! X bne loop ! X !- Xfound: ! we've found a match, or the sentinel X !- X move.b d1,(a1) ! remove the sentinel X !- X cmp.l a0,a1 ! did we hit the sentinel? X beq notfound ! X !- X move.l a0,d0 ! we found a match, so return a pointer to it X rts ! X !- Xnotfound: ! we hit the sentinel, so return NULL X clr.l d0 ! X rts ! / ------------------------------------------------------------------------------- Steve Kirkendall kirkenda@cs.pdx.edu uunet!tektronix!psueea!eecs!kirkenda
nfs@cs.Princeton.EDU (Norbert Schlenker) (07/11/90)
In article <228@sun13.scri.fsu.edu> nall@sun8.scri.fsu.edu (John Nall) writes: >The compression is good enough, however, so that it would bear >some further investigation. Two possibilities arise (which are >not necessarily non-exclusive): (a) vectorizing the code, and >(b) profiling it and then going to assembly language for those >parts which show up as chewing up all the execution time. Bruce Evans has posted profiling data for 16 bit PC Minix. I have similar (but more outrageous) results from some Unix machines here. The bottleneck is in memrchr(), which takes ~70% of the time used by comic (this is the C version now, cmemrchr.c). Since the PC Minix code already has memrchr() in assembly, it's clear that another approach is required entirely. Kent Williams (I think, but I can't find the note now) has suggested hashing to find strings, a la compress. I suspect it is the only chance that comic has. On a related note, has anyone gotten lharc to compile on a Unix box? Every one I've tried here fails to compile lhio.c without error, apparently because of an incompatibility with <dirent.h>. Is there an easy fix? On a further related note, one can compile lzhuf.c with -DSELFMAIN to get a (simpleminded) version which does essentially what compress does, with much better compression. The C version is about 1/3 as fast as compress (using PC ACK); a few tweaks to the assembly code allow it to run only about 50% slower. Norbert
wayne@csri.toronto.edu (Wayne Hayes) (07/12/90)
In article <946@rossignol.Princeton.EDU> nfs@cs.Princeton.EDU (Norbert Schlenker) writes: >In article <228@sun13.scri.fsu.edu> nall@sun8.scri.fsu.edu (John Nall) writes: >On a further related note, one can compile lzhuf.c with -DSELFMAIN >to get a (simpleminded) version which does essentially what compress >does, with much better compression. The C version is about 1/3 as >fast as compress (using PC ACK); a few tweaks to the assembly code >allow it to run only about 50% slower. The LHARC I posted to Minix works also on Sun 4.3. -- "The number of programs that can be done with the HST has always greatly exceeded the time available for their execution, and this remains true even with the telescope in its current state." -- HST Science Working Group and User's Commitee Report, 1990 June 29. Wayne Hayes INTERNET: wayne@csri.utoronto.ca CompuServe: 72401,3525
cgates@netcom.UUCP (Craig Gates) (07/12/90)
In article <946@rossignol.Princeton.EDU>, nfs@cs.Princeton.EDU (Norbert Schlenker) writes: > > On a related note, has anyone gotten lharc to compile on a Unix box? > Every one I've tried here fails to compile lhio.c without error, > apparently because of an incompatibility with <dirent.h>. Is there > an easy fix? > I compiled lharc using the file makefile.orig with no problem on a Sun 110 running SUN OS 4.0.3. The other makefile didn't seem to work too well on my system. LHARC is none too fast bust it sure beats the pants off of compress. Craig
ast@cs.vu.nl (Andy Tanenbaum) (07/14/90)
In article <228@sun13.scri.fsu.edu> nall@sun8.scri.fsu.edu (John Nall) writes: >Comic is attractive, so I played with it some on our Cray Y-MP >to see how it does there. It is horrible! (In terms of speed. >Worked OK, though, with no changes, under Unicos). > >The compression is good enough, however, so that it would bear >some further investigation. I just got back from the UK UNIX Users' Group meeting in London where I was a speaker, along with Ken Thompson, Dennis Ritchie, Brian Kernighan, Rob Pike, Dave Presotto, Stu Feldman, and quite a few other stellar personalities. All of those guys are great speakers, so if you ever get the chance to hear them at conferences, don't miss it. I definitely feel hono(u)red to be invited to join in with those folks. The conference gave a nice example of man being humbled by nature. The conference organi{z/s}ers thought it would be nice to have terminals available with internet access, so attendees could telnet to their home machines to read mail. They rented about 20 X-terminals and set them up in the conference hotel. British Telecom renegd on its promise to provide a leased line to Imperial College nearby, so the conference people got an infrared link and installed it on the roof of the hotel, along with another one at Imperial College 1 km away. They tested it the night before, and it worked perfectly. However it refused to work in the day time. Only at night. It seems that the heat from the sun striking the hotel's roof causes the air to shimmer, distorting the infrared signal so badly that transmission is impossible. The best laid plans of mice and joysticks ... Anyway, if anyone has good ideas about how to fix comic to make it faster without hurting the compression too much, I'm all for it. The net postings could be considerably reduced if comic were used instead of compress. Furthermore, the number of disks in the distribution from P-H could also be sharply reduced, which would greatly reduce the price. I think 1.5 will cost $169. A substantial reason for the increase from $116 (1.3 in slipcase form, with manual), is the jump from 10 disks to 17. (As I mentioned before, there will be a discount for people who send in a P-H boot disk from 1.1 - 1.3; more about that later when I find out). I'm going away again tomorrow. I'll be back July 31. Andy Tanenbaum (ast@cs.vu.nl)