speck@cit-vax (Don Speck) (01/01/85)
I've been working on my own speed mods for dump - with changes in only ONE source file (/usr/src/etc/dump/dumptape.c), NO kernel changes, it runs the TU77 on our 780 at its full 125 ips. I'll get my chance to try it on a TU80 next week. I hope to mail off the file to unix-sources shortly afterward, if there are no licensing problems. Chris: I'd be interested in seeing your approach. I turned 'dump' into a concurrent program communicating with pipes and signals: while one process writes the tape, the others read the disk and communicate. There's always (exactly) one process writing the tape at any time. Don Speck Caltech (818) 356-6886
chris@maryland (Chris Torek) (01/02/85)
I didn't like the pipes idea, so I wrote a ``mass driver'' device driver that runs other raw devices through their block interface. It manages to keep even the TU78 busy most of the time (125 ips at 6250 bpi, or 781K/sec) when run off Eagle disks in single user mode. I figured pipes would be fairly slow 'cause the network interface limits you to 2048 bytes per transfer (although you can increase it). This means that for a 10240 block write (like dump() uses) you need 5 pairs of context switches, which can beat a 750 into the ground pretty well. Also, it makes it harder to communicate success/failure statuses. Chris
speck@cit-vax (Don Speck) (01/02/85)
The bulk of the data DOESN'T go through pipes. I'd better send you the code before you make any more judgements on it. (Though I had hoped to polish it more before letting anyone see it). I wouldn't mind seeing what you've done - I'd love to see how you get such enormous bandwidth from the raw disk. Now that we've each had one public shot at the other's code, sight unseen, how about if we continue this as a private conversation? Don
dmmartindale@watcgl.UUCP (Dave Martindale) (01/04/85)
In article <6894@brl-tgr.ARPA> chris@maryland (Chris Torek) writes: >I figured pipes would be fairly slow 'cause the network interface >limits you to 2048 bytes per transfer (although you can increase >it). This means that for a 10240 block write (like dump() uses) >you need 5 pairs of context switches, which can beat a 750 into >the ground pretty well. Also, it makes it harder to communicate >success/failure statuses. > Passing *data* though the pipes would indeed be slow. But I suspect that the original author did the same thing I did for a general double-buffered block copy routine: have two processes, one reading and writing even-numbered blocks and one doing odd-numbered ones, and using pipes only for synchronization to make sure that the reads and writes occur in the correct logical sequence (in case either input or output is a tape drive). Actually, the one-byte values passed around via the pipes can also encode various types of special conditions, to ensure that one process quits when the other detects an EOF or error for example. You can come quite close to making the program behave like it was a single process that just overlaps reads and writes.
speck@cit-vax (Don Speck) (01/08/85)
After reading Chris Torek's trial of my dump mods I waited until I could try it on a TU80 myself before saying anything. Sorry if I seemed too silent... I timed the TU77 and found that a tight write() loop only makes it go at 110 ips, not 125. I was mistaking the smooth motion of the reels for top speed - the vacuum column chatter should have told me that it's coming to a stop between every tape write. Trying this on a TU80 on a 750, the TU80 does stream, but keeps going in and out of 100 ips mode every few seconds. (This is in single-user mode). The 750 can just *barely* finish the current write() and start another one before the interrecord gap flys by, even though the TU80 is stretching that interrecord gap as far as it can (1.2 inches). Dump has to do more processing between writes than the tight-loop write() program. When running dump, the TU80 tries 100 ips a few times but discovers that the time between tape blocks is over the limit. Thereafter the TU80 just sticks to 25 ips streaming. So the problem is that overhead on write() is much higher than I'd dreamed. Unless someone knows offhand where the time goes, I'll have to profile the kernel and find out where. (Probably physio()). Btw, how do I get an alternate profiling clock? To those who've asked for the dump mods, please be patient. I don't want to give out any more copies of it until it works *well*. Don Speck Caltech CS (818) 356-6886