leblanc@eecg.toronto.edu (Marcel LeBlanc) (02/10/89)
In article <826@csd4.milw.wisc.edu> jgreco@csd4.milw.wisc.edu (Joe Greco) writes: >]drives. When I decided (2 yrs ago) that 1571 drives didn't give me enough >]storage, I considered getting an IEEE drive, but then I learned about the >]new 1581 drives that had been announced. I managed to get two from >]Commodore, and I'm glad I did. It loads much faster than IEEE drives (my >]assembler LOADs include files, taking maximum advantage of load speed), and >]has good storage capacity (800K vs 1M/floppy for 8250). > >IF you're on a 128, or IF you're on a 64 with some sort of fastloader. >Which is still only a marginal speedup, considering that file based >operations are not enhanced by such devices. (Since that's the major >type of operation I need around here, that's what I look at.) BLITZ! >isn't speeded up at all by FastLoad... hehehe I can appreciate that file operations are very important to you, and a relatively small number of others (maybe 10's of thousands?). But you aren't exactly a typical C64 user, and neither am I. For the millions that use their C64 to play the latest games, all they are concerned about is how fast they can start up the game, or how quickly the next level can be loaded in. Speeding up file operations is a more difficult issue. Let's keep things in perspective here. Although it's possible to speed up sequential file access with 'transparent' speedup software, you will never get as much of a speed increase as is possible on LOADs. This has less to do with the transfer protocol, than with the LOW PERFORMANCE limitations of the C64 kernal. To remain compatible with existing software, speedup software must intercept OPEN, CLOSE, CHKIN, CHKOUT, CHRIN, GETIN, & CHROUT. You can't expect that much speed if you call a subroutine for every byte of a transfer. You CAN expect much more speed if you call a subroutine to transfer large blocks of memory (LOAD & SAVE). All of this assumes at least minimal optimization in the LOAD and SAVE routines. If these are just implemented as loops that repeatedly call the single byte transfer routines, then the performance won't be any better. For example, consider LOAD vs. sequential read on IEEE-488 drives, or C128 burst vs. fast serial. The C128 gives you great burst serial speed (LOAD & SAVE), but using fast serial instead of slow serial doesn't give you that much of a speed increase (maybe 2x). Since software overhead is the dominant factor here, I'll guess that seq read on IEEE drives also gives about a 2x speedup. If somebody has concrete numbers, please post! Before Eric Green decides to flame me :-), I should point out that I haven't forgotten that block transfers can be hidden from applications software. For devices like the 1541/71/81, it's quite reasonable to expect speedup software to transfer pieces of a file in blocks, then read from this buffer when a call is made to CHRIN or GETIN (same for CHROUT on write). This would have been great in the early days of the C64 & 1541 (my GUESS, 3-3.5x speedup)! But today, too much software bypasses CHRIN/CHROUT to use ACPTR/CIOUT directly. It's also a great idea for the C128 if you can spare enough memory to burst load the whole file (or use an REU). So what was the point of this posting? Just that, if the software you use has to do sequential file reads or writes, you are limited in how much it can be speeded up without re-writing it. The main reason I use Buddy 128 (an assembler) is that it LOADs include files (which defaults to burst serial), giving me great speed on a 1581. The SAME speedup factor (10-12x) is possible on the C64 using software only (with 1541/71/81)! This is at least twice as fast as IEEE can load. A complete assembly, which requires 2 passes through 600K of tokenized source, takes about 12 mins. Using seq reads on a C64 would probably take about 1.5 hours, or 50 mins using IEEE drives (I haven't timed these, so they are just guesses). If somebody can suggest a faster method of speeding up seq file accesses, please let us know! What we really need is a new OS for the C64... Marcel A. LeBlanc | University of Toronto -- Toronto, Canada leblanc@eecg.toronto.edu | also: LMS Technologies Ltd, Fredericton, NB, Canada ------------------------------------------------------------------------------- UUCP: uunet!utai!eecg!leblanc BITNET: leblanc@eecg.utoronto (may work) ARPA: leblanc%eecg.toronto.edu@relay.cs.net CDNNET: <...>.toronto.cdn
elg@killer.DALLAS.TX.US (Eric Green) (02/13/89)
in article <89Feb10.182100est.2732@godzilla.eecg.toronto.edu>, leblanc@eecg.toronto.edu (Marcel LeBlanc) says: >>Which is still only a marginal speedup, considering that file based >>operations are not enhanced by such devices. (Since that's the major >>type of operation I need around here, that's what I look at.) BLITZ! >>isn't speeded up at all by FastLoad... hehehe > Let's keep things in perspective here. Although it's possible to speed up > sequential file access with 'transparent' speedup software, you will never > get as much of a speed increase as is possible on LOADs. This has less to > do with the transfer protocol, than with the LOW PERFORMANCE limitations of > the C64 kernal. To remain compatible with existing software, speedup > software must intercept OPEN, CLOSE, CHKIN, CHKOUT, CHRIN, GETIN, & CHROUT. > You can't expect that much speed if you call a subroutine for every byte of > a transfer. Take a look at the C-128's Kernal, and, specifically, the burst-mode load routines (esp. the subroutine at $f4C5). It gets that speedup despite jsr'ing STASH for each byte to store the data into the proper bank of RAM. For an early version of some hardware later discarded due to various problems, I didn't have burst-loads implemented yet, but had standard fast serial working. I called LOAD. It had some speedup, about equivalent to what you get from an old Epyx fastload cartridge, but wasn't a real speed demon. Later I re-wrote LOAD to use burst-load. Loading a large file (I forget the size, somewhere around 100 blocks) dropped from 18 seconds to 5 seconds between the two. The main difference was the protocol. It seems obvious to me that the main reason fast serial isn't as fast as burst mode is the character-by-character handshaking taking place, not subroutine overhead. So, if you had a software product that read 256 bytes worth of SEQ file data into a buffer, it would probably speed up SEQ file access considerably -- although not as fast as LOAD'ing, since there IS overhead involved in reading from a buffer (e.g. see the 1750 REU example below). > For example, consider LOAD vs. sequential read on IEEE-488 drives, or C128 > burst vs. fast serial. Curious, I just wrote a simple benchmark for the C-128. All it does is GETIN from a file until it gets to the end. It took 17 seconds to read a 98 block file, or about the same as it took to LOAD the exact same file off of one of my SFD-1001 IEEE drive (using the Skyles IEEE Flash, admittedly not the fastest IEEE interface). This is about 1.3kbytes/second -- not slow at all, considering that the 1541 is straining to do 400 cps. The IEEE interface doesn't do anything special for LOAD'ing -- it just JSR's ACPTR repeatedly and stashes it, just like the ordinary LOAD routine (in fact, it IS the ordinary LOAD routine -- totally unmodified). Then I tried the same thing using RAMDOS and the 1750. 6 seconds burstloading off of 1571. 7 seconds reading using RAMDOS (loading using RAMDOS is just about instantaneous, using the DMA chip). > Before Eric Green decides to flame me :-), I should point out that I haven't > forgotten that block transfers can be hidden from applications > software. Who me? Flame? ;-). Hiding block transfers would be especially useful for the 1750, because for byte-at-a-time transfers it's reading single bytes instead of doing a DMA transfer of 256 bytes & going from there. Unfortunately, there's not 256 bytes free anywhere to use for such a buffer :-(. > For devices like the 1541/71/81, it's quite reasonable to expect speedup > software to transfer pieces of a file in blocks, then read from this buffer > when a call is made to CHRIN or GETIN (same for CHROUT on write). This > would have been great in the early days of the C64 & 1541 (my GUESS, 3-3.5x > speedup)! But today, too much software bypasses CHRIN/CHROUT to use > ACPTR/CIOUT directly. It could be done. But you'd have to do one of two things: Illegally copy CBM's ROM & modify it, or have RAM and "patch" it. The latter would be expensive, at least at current prices (32Kx8 static RAM is at around $14 right now). After you have a patched ROM image, it's fairly easy to do hardware tricks to swap it into place of the ordinary ROM (but it DOES require at least one jumper into the inside of the computer). In any event, there is STILL a lot of software that uses CHRIN/GETIN... if I had the fastloader expertise to do it, I'd give it a try just to see how much of a speedup it was. > So what was the point of this posting? Just that, if the software you use > has to do sequential file reads or writes, you are limited in how much it > can be speeded up without re-writing it. True. I suspect that the REU timing is about the maximum speed you can get using byte-at-a-time. That time truly reflects JSR overhead. > least twice as fast as IEEE can load. A complete assembly, which requires > 2 passes through 600K of tokenized source, takes about 12 mins. Using seq > reads on a C64 would probably take about 1.5 hours, or 50 mins using IEEE > drives (I haven't timed these, so they are just guesses). 600K?! (wow... boggle-mode activated!). Using HCD128 and the 1750 REU, using SEQ files, it'd take about 20 minutes (if you could fit it on the REU!). Now, on the Amiga, using DASM (a real speed-demon).... I'd be surprised if it took longer than 2 minutes out of RAM:. In any event... nobody's denying that LOADing will generally be faster than READing. Just that SEQ file reading can be speeded up much more than you imply. > What we really need is a new OS for the C64... Amen! But who's going to bother, when they can just go out and buy a "real" computer? Today was the first time I'd touched my 128 in over a week.... -- | // Eric Lee Green P.O. Box 92191, Lafayette, LA 70509 | | // ..!{ames,decwrl,mit-eddie,osu-cis}!killer!elg (318)989-9849 | | \X/ >> In Hell you need 4Mb to Multitask << |
jgreco@csd4.milw.wisc.edu (Joe Greco) (02/14/89)
]>IF you're on a 128, or IF you're on a 64 with some sort of fastloader. ]>Which is still only a marginal speedup, considering that file based ]>operations are not enhanced by such devices. (Since that's the major ]>type of operation I need around here, that's what I look at.) BLITZ! ]>isn't speeded up at all by FastLoad... hehehe ] ]I can appreciate that file operations are very important to you, and a ]relatively small number of others (maybe 10's of thousands?). But you ]aren't exactly a typical C64 user, and neither am I. For the millions that ]use their C64 to play the latest games, all they are concerned about is how ]fast they can start up the game, or how quickly the next level can be loaded ]in. Speeding up file operations is a more difficult issue. I agree that FastLoad has it's place.... but it is about as useful to me as a doorstop is or as a 1670 is. :-) My 1541 is too slow to be useful in any real way. ]Let's keep things in perspective here. Although it's possible to speed up ]sequential file access with 'transparent' speedup software, you will never ]get as much of a speed increase as is possible on LOADs. This has less to ]do with the transfer protocol, than with the LOW PERFORMANCE limitations of ]the C64 kernal. To remain compatible with existing software, speedup ]software must intercept OPEN, CLOSE, CHKIN, CHKOUT, CHRIN, GETIN, & CHROUT. ]You can't expect that much speed if you call a subroutine for every byte of ]a transfer. You CAN expect much more speed if you call a subroutine to ]transfer large blocks of memory (LOAD & SAVE). All of this assumes at least ]minimal optimization in the LOAD and SAVE routines. If these are just ]implemented as loops that repeatedly call the single byte transfer routines, ]then the performance won't be any better. ] ]For example, consider LOAD vs. sequential read on IEEE-488 drives, or C128 ]burst vs. fast serial. The C128 gives you great burst serial speed (LOAD & ]SAVE), but using fast serial instead of slow serial doesn't give you that ]much of a speed increase (maybe 2x). Since software overhead is the ]dominant factor here, I'll guess that seq read on IEEE drives also gives ]about a 2x speedup. If somebody has concrete numbers, please post! I HAD concrete numbers, but rn barfed and then csd4 went down on Sunday. I don't have the exact figures with me, but here are approximations: A c64 with 1541 took about 1:30 to read 30,000 bytes. A c64 with BusCard II and 8050 took more like 0:30. The BusCard II, by the way, is considered a "slower" interface. I will try to make some more tests at home tonight with the 1750 and the fast MSD IEEE interface. I used the following routine to do the reading and a stopwatch to time: ready. b* pc sr ac xr yr sp .;ee4e b0 50 00 00 f6 . ., 033c a2 02 ldx #$02 ., 033e 20 c6 ff jsr $ffc6 ., 0341 a9 00 lda #$00 ., 0343 8d 00 04 sta $0400 ., 0346 8d 01 04 sta $0401 ., 0349 20 e4 ff jsr $ffe4 ., 034c ee 00 04 inc $0400 ., 034f d0 03 bne $0354 ., 0351 ee 01 04 inc $0401 ., 0354 20 b7 ff jsr $ffb7 ., 0357 c9 00 cmp #$00 ., 0359 f0 ee beq $0349 ., 035b 4c cc ff jmp $ffcc ]Before Eric Green decides to flame me :-), I should point out that I haven't ]forgotten that block transfers can be hidden from applications software. ]For devices like the 1541/71/81, it's quite reasonable to expect speedup ]software to transfer pieces of a file in blocks, then read from this buffer ]when a call is made to CHRIN or GETIN (same for CHROUT on write). This ]would have been great in the early days of the C64 & 1541 (my GUESS, 3-3.5x ]speedup)! But today, too much software bypasses CHRIN/CHROUT to use ]ACPTR/CIOUT directly. It's also a great idea for the C128 if you can spare ]enough memory to burst load the whole file (or use an REU). Bad programming form to use calls that one cannot intercept with the vector table. :-) ]So what was the point of this posting? Just that, if the software you use ]has to do sequential file reads or writes, you are limited in how much it ]can be speeded up without re-writing it. The main reason I use Buddy 128 ](an assembler) is that it LOADs include files (which defaults to burst ]serial), giving me great speed on a 1581. The SAME speedup factor (10-12x) ]is possible on the C64 using software only (with 1541/71/81)! This is at ]least twice as fast as IEEE can load. A complete assembly, which requires ]2 passes through 600K of tokenized source, takes about 12 mins. Using seq ]reads on a C64 would probably take about 1.5 hours, or 50 mins using IEEE ]drives (I haven't timed these, so they are just guesses). That's why I refuse to assemble/compile on or work with 1541's. The IEEE drives are "about" five times faster. ]If somebody can suggest a faster method of speeding up seq file accesses, ]please let us know! What we really need is a new OS for the C64... How about UNIX on an Amiga 2500? hehehe -- jgreco@csd4.milw.wisc.edu Joe Greco at FidoNet 1:154/200 USnail: 9905 W Montana Ave PunterNet Node 30 or 31 West Allis, WI 53227-3329 "These aren't anybody's opinions." Voice: 414/321-6184 Data: 414/321-9287 (Happy Hacker's BBS)
jgreco@csd4.milw.wisc.edu (Joe Greco) (02/14/89)
As promised, here are some access times for various sequential mode disk accesses. The files were not all identical, but I am including a CPS rating to account for that. Device(s) Time Filesize CPS %speed ----------------------- ------- -------- ------ ------ Regular 1541/C64 01:35.2 33453 351 100 C64/BusCard II/8050 00:31.8 30234 952 271 C64/Custom MSD/8050 00:17.3 30234 1745 497 C64/RAMDOS 3.2/1750 00:12.2 38760 3188 908 The "Custom MSD" interface pushes the IEEE bus much closer to Commodore's specifications than the BusCard II. Actually, I'm suprised at the huge difference there. The RAMDISK is nearly ten times as fast as the standard serial bus access. It would seem to me that it would well be possible for a more efficient design to be implemented. As a side note: The way my memory recalls, the IEEE bus is actually capable of megabyte/second rates. Of course, my magnetic media is probably flaking again.... -- jgreco@csd4.milw.wisc.edu Joe Greco at FidoNet 1:154/200 USnail: 9905 W Montana Ave PunterNet Node 30 or 31 West Allis, WI 53227-3329 "These aren't anybody's opinions." Voice: 414/321-6184 Data: 414/321-9287 (Happy Hacker's BBS)
elg@killer.DALLAS.TX.US (Eric Green) (02/14/89)
This is the results of benchmarking a) loading, and b) doing GETIN until EOF, from ML, doing nothing inbetween. All tests were done with a 98 block file consisting of the main body of a BBS program. It was just the handiest program that I had available on both SFD and 1541 formats. I put it onto a blank 1541 disk, to prevent fragmentation. It was already first on the SFD disk (the boot disk for the BBS, which I only recently made up). My basic thought was that sequential file access can take place just as fast as LOAD'ing. The benchmark confirms that for IEEE drives and the standard 1541. There's a couple of constraints here. First of all, doing TALK and UNTALK (chkin/clrchn) for each sequentially-read byte is extremely slow. When you do chkin or clrchn, each call sends a command byte out to the drive. So doing fast sequential access means buffering your sequential file data anyhow (e.g. a simple filter program would be best off reading in 256 bytes, filtering them, then writing them to the output file, instead of doing clrchn/chkin/clrchn/chkout for each individual byte -- a 4-to-1 overhead). When you do that, SEQ access isn't slow at all.. just look at these timings: C-64, IEEE Flash, SFD-1001, 'load"bbs",8' : 18 seconds """""""""""""""""""""""" ML loop, seq read: 18 seconds C-64, C-LINK II, SFD-1001 LOAD: 14 seconds READ: 14 seconds 128 Ramdos: READ: 9 seconds 64 Ramdos: READ: 8 seconds 128 -- 1571 -- load -- 8 seconds read -- 26 seconds in 64 mode, with 1571: load -- 60 seconds read -- 62 seconds 64 mode, with Epyx fastload cart. -- LOAD 26 seconds with Mike J. Henry's "fastboot v2": 26 seconds Unfortunately I couldn't see if the Super Snapshot was faster than the Epyx or fastboot product. My brother sold ours because it was incompatible with his C-64 (a very early production model), and because "it wasn't any faster than the fastload cartridge" (his words, not mine -- I never even used the darn thing). Some trivia: the main difference between LOAD'ing (burst mode) and READ'ing (fastmode) on the 1571 is that fast mode negotiates a transaction for each byte, while burst mode negotiates on a per-block basis. Burst mode is unique in that manner -- even the IEEE drives negotiate on a per-byte basis (probably why they're slower than burst mode, despite fairly equivalent hardware). Some other trivia: Using ACPTR should be faster than using GETIN, if subroutine overhead is as big a problem as some hint. GETIN has to do all sorts of testing to see where to dispatch to -- is it keyboard, or is it disk? This overhead should be noticible when compared to LOAD, which calls ACPTR directly. But for both the IEEE drives and the 1541, there was no significant difference between LOAD and GETIN times, implying that transfer speed, and not internal Kernal overhead, was the limitation. -- | // Eric Lee Green P.O. Box 92191, Lafayette, LA 70509 | | // ..!{ames,decwrl,mit-eddie,osu-cis}!killer!elg (318)989-9849 | | \X/ >> In Hell you need 4Mb to Multitask << |
izot@f171.n221.z1.FIDONET.ORG (Geoffrey Welsh) (02/14/89)
> From: jgreco@csd4.milw.wisc.edu (Joe Greco) > Message-ID: <955@csd4.milw.wisc.edu> > Device(s) Time Filesize CPS %speed > ----------------------- ------- -------- ------ ------ > Regular 1541/C64 01:35.2 33453 351 100 > C64/BusCard II/8050 00:31.8 30234 952 271 > C64/Custom MSD/8050 00:17.3 30234 1745 497 > C64/RAMDOS 3.2/1750 00:12.2 38760 3188 908 Add to the list (result from my memory): C128/C64-Link II/SFD 1900 HyperPET/D9060 Bloody fast - I'll get specs! The C128 was running at 2 MHz. The "HyperPET" is a 4 MHz 4032. > As a side note: The way my memory recalls, the IEEE bus is actually > capable of megabyte/second rates. Of course, my magnetic media is > probably flaking again.... The IEEE-488-1979 spec says that the data transfer rate shall not exceed 1 megabyte per second, but the handshake is designed to slow the transfers down to the slowest selected device on the bus. Since it takes several 1 MHz clock cycles to program the I/O chips to send the handshake signals, megabyte per second speeds are way out of the question. There is also the question of how quickly the data can be "lifted" from the disk. Even with most IEEE drives' 2-processor design, there is a severe limit to the speed with which the data can be put on the bus. Nevertheless, some sort of automated hardware handshake and more tightly coded ROMs in the drive would lead to vastly improved performance. =========================================================================== Internet: Geoffrey.Welsh@f171.n221.z1.fidonet.org | 66 Mooregate Crescent Usenet: watmath!isishq!izot | Suite 602 FidoNet: Geoffrey Welsh on 1:221/171 | Kitchener, Ontario PunterNet: 7/Geoffrey Welsh | N2M 5E6 CANADA BBS: (519) 742-8939 24h 7d 300/1200/2400bps | (519) 741-9553 =========================================================================== | "I don't need a disclaimer. No one pays any attention to what I say." | =========================================================================== -- Geoffrey Welsh - via FidoNet node 1:221/162 UUCP: ...!watmath!isishq!171!izot Internet: izot@f171.n221.z1.FIDONET.ORG
leblanc@eecg.toronto.edu (Marcel LeBlanc) (02/15/89)
In article <7143@killer.DALLAS.TX.US> elg@killer.Dallas.TX.US (Eric Green) writes: >This is the results of benchmarking > >a) loading, and >b) doing GETIN until EOF, from ML, doing nothing inbetween. > ... >My basic thought was that sequential file access can take place just >as fast as LOAD'ing. The benchmark confirms that for IEEE drives and >the standard 1541. There's a couple of constraints here. First of all, ... >clrchn/chkin/clrchn/chkout for each individual byte -- a 4-to-1 >overhead). When you do that, SEQ access isn't slow at all.. just look >at these timings: > >C-64, IEEE Flash, SFD-1001, 'load"bbs",8' : 18 seconds > """""""""""""""""""""""" ML loop, seq read: 18 seconds ^^^^ >C-64, C-LINK II, SFD-1001 LOAD: 14 seconds > READ: 14 seconds ^^^^ Yes, times are identical. Please read on. >in 64 mode, with 1571: load -- 60 seconds > read -- 62 seconds Almost identical. This supports what I said in my original posting on this subject. Here's an excerpt: ... as much of a speed increase as is possible on LOADs. This has less to do with the transfer protocol, than with the LOW PERFORMANCE limitations of the C64 kernal. To remain compatible with ... As you pointed out in an earlier posting, the standard C64 load routine does nothing but repeatedly call ACPTR! This is a LOW PERFORMANCE limitation when you have a decent transfer protocol, but it's of no importance when you have to use the standard serial protocol of the C64! But then you say SFD-1001 (IEEE interface) isn't low performance? It is reasonably fast, but since they just speed up ACPTR/CIOUT without changing the LOAD routine, the ML loop that you have written should give the same results as LOAD (since it's basically the same loop), and it does. This DOESN'T mean that SEQ read is as fast as block transfers (LOAD), it just means that you have to optimize ("speed up") the block transfer software as well as the transfer protocol. This is even better demonstrated by the following numbers: >128 Ramdos: READ: 9 seconds >64 Ramdos: READ: 8 seconds AND, as you stated before, LOAD is virtually instantaneous! >128 -- 1571 -- load -- 8 seconds > read -- 26 seconds >64 mode, with Epyx fastload cart. -- > LOAD 26 seconds > with Mike J. Henry's "fastboot v2": 26 seconds Doesn't it seem unusual that C128 fast serial, Epyx FastLoad, and Mike Henry's fastboot all take the same amount of time (26 secs)? [This really isn't intended to sound like a flame.] Here's what you wrote earlier in the article: >All tests were done with a 98 block file consisting of the main >body of a BBS program. It was just the handiest program that I had >available on both SFD and 1541 formats. I put it onto a blank 1541 >disk, to prevent fragmentation. It was already first on the SFD disk >From the numbers listed above, I would guess that you copied from the SFD-1001 to a _1571_. Unless you set the interleave yourself (using "U0>"+chr$(interleave#)), the 1571 saves using a 6 sector interleave, even when it's in 1541 mode. The C128 burst mode can easily keep up with a 6 sector interleave, but Mike Henry's fastboot needs at least 8 sectors to decode and transfer, and Epyx FastLoad needs at least 10 (the 1541 standard). On a fresh disk, the program would be saved near the directory track. On this part of the disk, I think the sectors/track is 18. Since FastLoad, fastboot V2, and C128 fast serial can't keep up with an interleave of 6 sectors, they are forced to wait a full revolution or 18+6 = 24 sectors! The interleave forces a speed difference of 24/6 = 4 times slowdown! Of course, in a 98 block file, not all sectors can be stored exactly 6 apart, so this is just a GOOD approximation. This is very close to the 26 sec/ 8 sec ratio (3.25) given by the above numbers. Weren't we talking about SEQ file speed up? :-) What I'm getting at is that 1541/71 and SFD-1001 aren't good drives to use when studying byte-at-a-time transfer overhead. That's because the dos in those drives only buffers a sector at a time, which forces it to use an interleave scheme. It's possible to get around this (and Super Snapshot V4 does, for LOAD only), but I haven't seen an implementation yet that attempts to do this for SEQ file accesses. And since the transfer times involved in this performance range (about 4-5 secs for 100 blocks, far beyond standard CBM IEEE) are less than the overhead for byte-at-a-time transfers, you wouldn't be able to get close to LOAD speedup for SEQ accesses. Here's my speedup summary: (by "State of the Art" I mean 1541 interleave INDEPENDENT serial fast loaders and C128 burst mode with optimal interleave, NOT IEEE.) std blocks non-std blocks A. "State of the Art" LOAD 12-15x 20-25x B. not yet attempted, "State of the Art" READ 6-7x (guess) 8-9x (guess) C. Classical Fast I/O LOAD 5-6x n.a. e.g. Epyx FastLoad (interleave = 10) D. Classical fast I/O READ 3-4x (guess) n.a. E. Standard LOAD & READ 1x n.a. The IEEE interfaces that various people have discussed so far probably fit in with "C". This is only because they are using the standard load routine with faster ACPTR (to get any speedup they would have to SAVE with a tighter interleave or execute custom LOAD routines within the IEEE device, but I doubt that any of the IEEE owners on the net would want to have anything to do with this :-) ). A good way to see byte-at-a-time overhead is to use RAMDOS or a 1581, which buffers half a track (one physical cylinder, not a full logical track). >Unfortunately I couldn't see if the Super Snapshot was faster than the >Epyx or fastboot product. My brother sold ours because it was ... >... "it wasn't any faster than the fastload cartridge" (his words, SS V1 and SS V2 were "classical" fast loader implementations, so the speed was only marginally faster than Epyx FastLoad (5.5x vs. 5x). The actual transfer routines were significantly faster, but the 10 sector interleave of the 1541 limited all these products to the same speed range. The marginal speedup came from significantly faster head stepping routines. You could SAVE at a different interleave to get some extra speedup, but it wasn't usually worth the trouble. SS V3 and SS V4 use a MUCH faster interleave independent technique. The speedup over Epyx FastLoad and similar products is very noticeable. >Some trivia: the main difference between LOAD'ing (burst mode) and >READ'ing (fastmode) on the 1571 is that fast mode negotiates a >transaction for each byte, while burst mode negotiates on a per-block >basis. Burst mode is unique in that manner -- even the IEEE drives >negotiate on a per-byte basis (probably why they're slower than burst >mode, despite fairly equivalent hardware). I agree, per-block is the only way to get great speed. You have probably noticed that the Burst mode examples in the 1571 user's manual avoid using subroutine calls to get each byte as it arrives. With transfer rates in the range used by Burst mode, this could slow you down. However, it turns out that there's a fair bit of time to waste at the bit rate that CBM decided to use for Burst mode. >Some other trivia: Using ACPTR should be faster than using GETIN, if >subroutine overhead is as big a problem as some hint. GETIN has to >do all sorts of testing to see where to dispatch to -- is it keyboard, >or is it disk? This overhead should be noticible when compared to >LOAD, which calls ACPTR directly. But for both the IEEE drives and the >1541, there was no significant difference between LOAD and GETIN >times, implying that transfer speed, and not internal Kernal overhead, >was the limitation. Again, this just shows how slow the standard ACPTR routine is, and how important interleave limitations are no matter how fast the transfer protocol is. Once you have overcome the limitations of interleave, either by buffering whole tracks or by doing other nasty manipulations :-), the real speed of the transfer protocol can really shine. After all, IEEE interfaces should be capable of much faster transfers. For those who don't believe that interleave is as important as I've said, try the following: Create a file on a 1541, then compare the time required to LOAD it using a classical fastloader like Epyx FastLoad with the time required to SCRATCH the file. You should get the same results from a C128 with a 1571 (using burst mode vs. SCRATCH). SCRATCH has to follow the chain of sectors that are used in the file. Since the only transfers involved in SCRATCH are internal, all the time required is to follow the 10 sector interleaved chain (6 if you're using a 1571). I think this posting was already too long about half way through! :-) Marcel A. LeBlanc | University of Toronto -- Toronto, Canada leblanc@eecg.toronto.edu | also: LMS Technologies Ltd, Fredericton, NB, Canada ------------------------------------------------------------------------------- UUCP: uunet!utai!eecg!leblanc BITNET: leblanc@eecg.utoronto (may work) ARPA: leblanc%eecg.toronto.edu@relay.cs.net CDNNET: <...>.toronto.cdn
izot@f171.n221.z1.FIDONET.ORG (Geoffrey Welsh) (02/15/89)
> From: elg@killer.DALLAS.TX.US (Eric Green) > Message-ID: <7143@killer.DALLAS.TX.US> Eric: Your benchmarks are interesting and informative, but I'd like to point out something you said which is mildly misleading: > Some trivia: the main difference between LOAD'ing (burst mode) and > READ'ing (fastmode) on the 1571 is that fast mode negotiates a > transaction for each byte, while burst mode negotiates on a per-block > basis. Burst mode is unique in that manner -- even the IEEE drives > negotiate on a per-byte basis (probably why they're slower than burst > mode, despite fairly equivalent hardware). On true (parallel) IEEE drives, there is nothing to negotiate. While burst mode transactions on a 1571 or 1581 do have to be set up in slow (i.e. 1541 speed) mode, there is no need for such arrangements on the parallel drive... in fact, the parallel drives have lower overheads because the handshake suffices for inter-byte holdoffs. Furthermore, the IEEE drives do NOT have "fairly equivalent hardware"... burst mode handshaking is done in hardware (at a rate dependent on gate timings, but the gates operate at less than 50 nanoseconds each), while IEEE handshaking is done in software at a rate dependent on instructions taking two to six clock cycles taking 1,000 nanoseconds per. Given hardware handshaking and DMA, some IEEE-488 bus instruments achieve data transfer rates in the hundresd of thousands of bytes per second. > Some other trivia: Using ACPTR should be faster than using GETIN, if > subroutine overhead is as big a problem as some hint. GETIN has to > do all sorts of testing to see where to dispatch to -- is it keyboard, > or is it disk? This overhead should be noticible when compared to > LOAD, which calls ACPTR directly. But for both the IEEE drives and the > 1541, there was no significant difference between LOAD and GETIN > times, implying that transfer speed, and not internal Kernal overhead, > was the limitation. The subroutine overhead, as they say, is a drop in the bucket. Using GETIN will be slower than using ACPTR, but only marginally. =========================================================================== Internet: Geoffrey.Welsh@f171.n221.z1.fidonet.org | 66 Mooregate Crescent Usenet: watmath!isishq!izot | Suite 602 FidoNet: Geoffrey Welsh on 1:221/171 | Kitchener, Ontario PunterNet: 7/Geoffrey Welsh | N2M 5E6 CANADA BBS: (519) 742-8939 24h 7d 300/1200/2400bps | (519) 741-9553 =========================================================================== | "I don't need a disclaimer. No one pays any attention to what I say." | =========================================================================== -- Geoffrey Welsh - via FidoNet node 1:221/162 UUCP: ...!watmath!isishq!171!izot Internet: izot@f171.n221.z1.FIDONET.ORG
seeley@dalcsug.UUCP (Geoff Seeley) (02/15/89)
In article <7143@killer.DALLAS.TX.US>, elg@killer.DALLAS.TX.US (Eric Green) writes:
< 64 mode, with Epyx fastload cart. --
< LOAD 26 seconds
< with Mike J. Henry's "fastboot v2": 26 seconds
< Unfortunately I couldn't see if the Super Snapshot was faster than the
< Epyx or fastboot product. My brother sold ours because it was
< incompatible with his C-64 (a very early production model), and
< because "it wasn't any faster than the fastload cartridge" (his words,
< not mine -- I never even used the darn thing).
I have just recently bought Super Snapshot v4, and after using the Epyx
fastload for 2 or 3 years, I can safely say that the Super Snapshot loads
much faster than the fastload. It will even save faster, something which the
fastload didn't do. I don't know what SS version you had, but the latest
version (v4) is one of the best commodore utilities I have seen.
--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Geoff Seeley UUCP: dalcsug!seeley | Why the hell didn't they have
Dalhousie University BITN: csay0026@dalac | ``Teenage Mutant Ninja Turtles'' Halifax, Nova Soctia BEST: The local bar. | when I was a kid?
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
jgreco@csd4.milw.wisc.edu (Joe Greco) (02/15/89)
In comp.sys.cbm article <7143@killer.DALLAS.TX.US>, elg@killer.Dallas.TX.US (Eric Green) wrote: ]My basic thought was that sequential file access can take place just ]as fast as LOAD'ing. The benchmark confirms that for IEEE drives and Why shouldn't it? Same routines. LOAD simply calls ACPTR in a loop. ]the standard 1541. There's a couple of constraints here. First of all, ]doing TALK and UNTALK (chkin/clrchn) for each sequentially-read byte ]is extremely slow. When you do chkin or clrchn, each call sends a ]command byte out to the drive. So doing fast sequential access means ]buffering your sequential file data anyhow (e.g. a simple filter Or, if you are not performing any output operations on the serial bus, simply do not chkin/clrchn. I use that technique fairly often. ]program would be best off reading in 256 bytes, filtering them, then ]writing them to the output file, instead of doing ]clrchn/chkin/clrchn/chkout for each individual byte -- a 4-to-1 I once devised a very short ML copy routine for an ancient (1985?) release of my BBS program. It worked on a byte-by-byte basis, and was nearly as slow as the BASIC loop it had replaced. I use buffering techniques constantly now (and not just for disk I/O). ]Some other trivia: Using ACPTR should be faster than using GETIN, if ]subroutine overhead is as big a problem as some hint. GETIN has to ]do all sorts of testing to see where to dispatch to -- is it keyboard, ]or is it disk? This overhead should be noticible when compared to ]LOAD, which calls ACPTR directly. But for both the IEEE drives and the ]1541, there was no significant difference between LOAD and GETIN ]times, implying that transfer speed, and not internal Kernal overhead, ]was the limitation. Using non-vectored Kernal calls is bad form. The slight SLIGHT increased overhead time is not usually noticeable, especially when working with disk. -- jgreco@csd4.milw.wisc.edu Joe Greco at FidoNet 1:154/200 USnail: 9905 W Montana Ave PunterNet Node 30 or 31 West Allis, WI 53227-3329 "These aren't anybody's opinions." Voice: 414/321-6184 Data: 414/321-9287 (Happy Hacker's BBS)
jgreco@csd4.milw.wisc.edu (Joe Greco) (02/15/89)
In comp.sys.cbm article <1606.23F8D15A@isishq.FIDONET.ORG>, izot@f171.n221.z1.FIDONET.ORG (Geoffrey Welsh) wrote: ] ] > From: jgreco@csd4.milw.wisc.edu (Joe Greco) ] > Message-ID: <955@csd4.milw.wisc.edu> ] > Device(s) Time Filesize CPS %speed ] > ----------------------- ------- -------- ------ ------ ] > Regular 1541/C64 01:35.2 33453 351 100 ] > C64/BusCard II/8050 00:31.8 30234 952 271 ] > C64/Custom MSD/8050 00:17.3 30234 1745 497 ] > C64/RAMDOS 3.2/1750 00:12.2 38760 3188 908 ] ] Add to the list (result from my memory): ] ] C128/C64-Link II/SFD 1900 Relative comparisons between my MSD interface and a Link I showed the MSD to be faster. And as I recall, the Link I was a bit faster than the Link II. Ahhhh.... well.... ] HyperPET/D9060 Bloody fast - I'll get specs! ] ] The C128 was running at 2 MHz. ] ] The "HyperPET" is a 4 MHz 4032. Wish I had a HyperPET. Of course, I also wish I had a D9090. Then again, I wish... oops better not start on that line again. ] The IEEE-488-1979 spec says that the data transfer rate shall not exceed 1 ]megabyte per second, but the handshake is designed to slow the transfers down ]to the slowest selected device on the bus. Since it takes several 1 MHz clock ]cycles to program the I/O chips to send the handshake signals, megabyte per ]second speeds are way out of the question. That's what I meant (the spec itself).... I realize, of course, that such speeds are NOT possible at these clock rates. ] There is also the question of how quickly the data can be "lifted" from the ]disk. Even with most IEEE drives' 2-processor design, there is a severe limit ]to the speed with which the data can be put on the bus. It would be nice for a hard disk! grin grin grin ] Nevertheless, some sort of automated hardware handshake and more tightly ]coded ROMs in the drive would lead to vastly improved performance. And more tightly coded software on the computer. -- jgreco@csd4.milw.wisc.edu Joe Greco at FidoNet 1:154/200 USnail: 9905 W Montana Ave PunterNet Node 30 or 31 West Allis, WI 53227-3329 "These aren't anybody's opinions." Voice: 414/321-6184 Data: 414/321-9287 (Happy Hacker's BBS)
janhen@wn2.sci.kun.nl (Jan Hendrikx) (02/16/89)
In article <89Feb14.171816est.2394@godzilla.eecg.toronto.edu>, leblanc@eecg.toronto.edu (Marcel LeBlanc) writes: > That's because the dos in those drives only buffers a > sector at a time, which forces it to use an interleave scheme. That is not true. 1541 DOS does do read-ahead when there are enough free buffers. When a new buffer is needed, and all are occupied, one of the read-ahead buffers is discarded. Source: Inside Commodore DOS, and a ROM disassembly. > Marcel A. LeBlanc | University of Toronto -- Toronto, Canada -Olaf Seibert
leblanc@eecg.toronto.edu (Marcel LeBlanc) (02/18/89)
In article <335@wn2.sci.kun.nl> janhen@wn2.sci.kun.nl (Jan Hendrikx) writes: >In article <89Feb14.171816est.2394@godzilla.eecg.toronto.edu>, leblanc@eecg.toronto.edu (Marcel LeBlanc) writes: >> That's because the dos in those drives only buffers a >> sector at a time, which forces it to use an interleave scheme. > >That is not true. 1541 DOS does do read-ahead when there are enough >free buffers. When a new buffer is needed, and all are occupied, >one of the read-ahead buffers is discarded. > >Source: Inside Commodore DOS, and a ROM disassembly. I don't remember under what situations the DOS will do read-ahead, but the point was that since the DOS follows the interleave chain, it won't send the file any faster than the interleave allows, no matter what sort of interface you are using. In the case of a 1541, following the interleave chain will only get you about a 5-6x speedup with the standard 10 sector interleave. If this problem isn't addressed, using faster hardware buys you nothing. Since you brought it up, under what situations will the DOS do read-ahead? Marcel A. LeBlanc | University of Toronto -- Toronto, Canada leblanc@eecg.toronto.edu | also: LMS Technologies Ltd, Fredericton, NB, Canada ------------------------------------------------------------------------------- UUCP: uunet!utai!eecg!leblanc BITNET: leblanc@eecg.utoronto (may work) ARPA: leblanc%eecg.toronto.edu@relay.cs.net CDNNET: <...>.toronto.cdn
janhen@wn2.sci.kun.nl (Jan Hendrikx) (02/19/89)
In article <89Feb18.010253est.2384@godzilla.eecg.toronto.edu>, leblanc@eecg.toronto.edu (Marcel LeBlanc) writes: > I don't remember under what situations the DOS will do read-ahead, but the > point was that since the DOS follows the interleave chain, it won't send the > file any faster than the interleave allows, no matter what sort of interface > you are using. That was not the point that I was trying to make. What you say is of course true. I just wanted to inform the person who thought the DOS does no read-ahead at all. > Since you brought it up, under what situations will the DOS do read-ahead? As far as I remember without my references around, (I don't have a 64 anymore for over two years now), the algorithm is about the following: If a file is opened for sequential reading, an 'active' buffer is allocated, which is filled from the disk. Active means that the next byte that can be requested from the computer, must come from that buffer. Also, an 'inactive' buffer is allocated, if possible. That buffer is filled in from disk asyncronously. When the active buffer is emtied by read requests from the computer, buffers are switched from active to inactive and vv. Of course, if the inactive buffer has not yet finished reading from disk, the computer must wait. There is a routine which takes a buffer number and sets a pointer to that buffer somewhere. It maintais a Least Recently Used stack, based on the requests it gets. Whenever a buffer is needed, but no unused buffer is available, an inactive buffer is found based on the LRU stack. The idea is that the buffer that is not used for the longest time, will not likely be used again soon. So it is 'safe' to use it for something else. The routine that switches buffers knows that inactive buffers may disappear. If they do, it does something reasonable. I am not sure if it just reads the next block into the one buffer that is left, or if it first tries to get a new second buffer. In any case, the buffer that was read-ahead must be re-read from the disk. Relative files have two active buffers: one for a side-sector, and one for actual data. I am not sure if it also reads-ahead. Some other things I would need to look up are whether files open for writing also have an inactive (write-behind) buffer, and whether (trying to) allocate an inactive buffer may cause another inactive buffer to be discarded. As you can see, the operating system in the drive is considerably more complex than that in the computer... > Marcel A. LeBlanc | University of Toronto -- Toronto, Canada > leblanc@eecg.toronto.edu | also: LMS Technologies Ltd, Fredericton, NB, Canada -Olaf Seibert
janhen@wn2.sci.kun.nl (Jan Hendrikx) (02/20/89)
In article <338@wn2.sci.kun.nl>, I wrote: > As far as I remember without my references around, (I don't have a 64 > anymore for over two years now), the algorithm is about the following: To be complete, I now have an Amiga and a 64 emulator :-) > There is a routine which takes a buffer number and sets a pointer to > that buffer somewhere. It maintais a Least Recently Used stack, based > on the requests it gets. After looking at the disassembly for some time, I found the following: The LRU table is not a table of disk buffers, but of Logical INDeXes. That is something like an internal file number for the disk. Every time a block is read or written through a LINDX, the LRU is updated. > Some other things I would need to look up are whether files open for > writing also have an inactive (write-behind) buffer, and whether > (trying to) allocate an inactive buffer may cause another inactive > buffer to be discarded. Both of these are true. It even appears that when switching to the other (inactive) buffer, there is _always_ a need for a (temporary) second buffer. If no such buffer can be found (or stolen), you get an error #70, no channel. I have not found, however, any guarantee that such a temporary buffer "almost always" will be available. So maybe one of you netters can go out and write a program that produces error #70 when you are just writing to (or reading from) an ordinary file... -Olaf Seibert