mcdonald@uxe.cso.uiuc.edu (01/23/89)
> ... I have rammed into >a wall in trying to access a flat ascii data file with 14,000 records in >it. Naturally, I could read the file one record at a time, but the >end user would probably expire due to old age if I wrote this program >in that manner. >I am not familiar with any of the "Ctree" type file managers, but I did >have a similar problem on a UNIX system. We had a file full of 4 byte records >My solution? Buffer the stuff up. Instead of reading 4 bytes at a time, >I read 512 bytes (128 records) at a time. This reduced the number of disk >accesses/syscalls from roughly 4000 per record to 30. Runtime is now >15 minutes (good conditions) to 45 minutes (bad conditions). I have tried this sort of stuff of MS-DOS, and it doesn't seem to do much good. Has anyone else gotten improvements this way? What DOES do some good is to get a good disk cache program. I think the previous two included paragraphs may only apply to (certain) multitasking OS's. Doug McDonald
bradb@ai.toronto.edu (Brad Brown) (01/26/89)
In article <225800111@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes: > >> ... I have rammed into >>a wall in trying to access a flat ascii data file with 14,000 records in >>it. Naturally, I could read the file one record at a time, but the >>end user would probably expire due to old age if I wrote this program >>in that manner. >>[...] > >I have tried this sort of stuff of MS-DOS, and it doesn't seem to >do much good. Has anyone else gotten improvements this way? What >DOES do some good is to get a good disk cache program. I have done things like this in MS-DOS and it works *really well*. I have a tiny flatfile manager that uses lseek and read to goto and read specific records, and it works much faster than using streams. (That is, I use open() to open a file, *not* fopen().) If you really have to move through a lot of data, why don't you write your program so that it reads a large bunch of records (the larger the better) at once then processes them in memory. I think this should help you a lot because most of your overhead is going to be in waiting for the disk if you have to do an individual disk read for each record. Caching may or may not help you, depending on the type of processing you do. If you are just making a single pass through the data, a cache will not make anything go faster than you can do by reading several records at a time anyway. That's because you still have to read each record once and it never gets read again from the cache, so you don't save anything. If you skip around the database a lot, you might want to think about writing a record cache into the database part of your program. A record cache will have a large pool of record slots and will fill in an empty one when you read a new record. If you request a record that was recently read, it will return a pointer to the record without reading if it's in the cache. You should be able to go *even faster* this way compared to a disk cache of the same size, though writing an efficient cache can be hairy. (-: Brad Brown :-) bradb@ai.toronto.edu
bagpiper@oxy.edu (Michael Paul Hunter) (01/27/89)
In article <225800111@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes: > >> ... I have rammed into >>a wall in trying to access a flat ascii data file with 14,000 records in [stuff] >>My solution? Buffer the stuff up. Instead of reading 4 bytes at a time, >>I read 512 bytes (128 records) at a time. This reduced the number of disk >>accesses/syscalls from roughly 4000 per record to 30. Runtime is now >>15 minutes (good conditions) to 45 minutes (bad conditions). > >I have tried this sort of stuff of MS-DOS, and it doesn't seem to >do much good. Has anyone else gotten improvements this way? What [stuff] >Doug McDonald Under ms-dos file buffering is already done for you. One thing to try is to read a whole lot more then 512bytes (which is the size of the file buffer I think) and see if you get any speed up. But, I don't think that this will change the number of accesses since ms-dos just read sizeof(buffer) number of chars each time (that is assuming seq access). For random access, determining adjacency and reading a large number of adjacent items would probably help if you could organize what you wanted to do to just work with adjacent items (where a adjacent to b is true if a and b are read on the same pass). Mike
mcdonald@uxe.cso.uiuc.edu (01/28/89)
>I have tried this sort of stuff of MS-DOS, and it doesn't seem to >do much good. Has anyone else gotten improvements this way? What >DOES do some good is to get a good disk cache program. >I have done things like this in MS-DOS and it works *really well*. I have >a tiny flatfile manager that uses lseek and read to goto and read specific >records, and it works much faster than using streams. (That is, I use >open() to open a file, *not* fopen().) Perhaps our mileage varies because we are driving different kinds of disk drives. I volunteer to make a scientific survey. Any computer mag out there want to pay me? It would make a nice article. No, I won't do it for free. Doug McDonald