[comp.unix.internals] Need efficient way to read file in reverse

jetfuel@csusac.csus.edu (Dave Jenks) (09/19/90)

In article <1990Sep18.152818.1303@phri.nyu.edu> roy@alanine.phri.nyu.edu (Roy Smith) writes:
>	Is there a standard, portable, efficient way to read a file in
>reverse?  I'm doing random seeks in a file and occassionally want to be able
>to find the beginning of the line into the middle of which I just seeked.
>Right now, I'm using a simple-minded revgetc() I wrote which is basically:

I had to do the very same thing a number of years ago.  The most
efficient method was to read using read(2) for BLKSIZ bytes (#defined
in <stdio.h> or <stand.h> or someplace like that) and search the
buffer manually.  With a simple searching algorithm, this can be very
fast - even with a poor algorithm, it'll still be much faster than
asking for i/o for each character.  It's even portable.
-- 
=======================================================================
	    "Pro is to con, as progress is to Congress..."
                                                                      
                                           ...!uunet \

guy@auspex.auspex.com (Guy Harris) (09/27/90)

 >Alas, neither stat->st_blksize nor setbuffer() are universally
 >available, not even in a **IX environment.  You may need to use
 >setvbuf() instead, which itself isn't universally available.
 >
 >Modifying the meaning of "portable" still further, perhaps we could
 >settle for it to mean "works under 4.3BSD and System V Release 4".  Now
 >you're talking!

How about "works under 4.4BSD (which, if they offer the promised ANSI
C compatibility, will have 'setvbuf', as well as having 'st_blksize' in
the 'stat' structure) and System V Release 4 (which has 'st_blksize' in
the 'stat' structure, as well as having 'setvbuf')"?

chuck@Morgan.COM (Chuck Ocheret) (09/27/90)

I recommend that you check to see if your system supports some form of
memory mapping of a file.  On some systems there is the mmap(2) call on
others a version of shmat(2) allows you to do mostly the same thing.
Not all systems provide such a mechanism.  When you map a file in using
either of these you get a pointer to memory which maps directly to a
specified file on the file system (can also be a device).  This gives
you full random access to all bytes in the file.  Read it backwards,
forwards, or however you wish; your file has just become a char *.

~chuck
-- 
+--------------------+   Chuck Ocheret, Sr. Staff Engineer   +---------------+
|chuck@APT.Morgan.COM|       Morgan Stanley & Co., Inc.      |(212) 703-4474 |
|    Duty now ...    |19th Floor, 1251 Avenue of the Americas|for the future.|
+--------------------+      New York, N.Y.  10020 USA        +---------------+

tmb@ai.mit.edu (Thomas M. Breuel) (09/29/90)

In article <1802@s5.Morgan.COM>, chuck@Morgan.COM (Chuck Ocheret) writes:
|> I recommend that you check to see if your system supports some form of
|> memory mapping of a file.  On some systems there is the mmap(2) call on
|> others a version of shmat(2) allows you to do mostly the same thing.
|> Not all systems provide such a mechanism.  When you map a file in using
|> either of these you get a pointer to memory which maps directly to a
|> specified file on the file system (can also be a device).  This gives
|> you full random access to all bytes in the file.  Read it backwards,
|> forwards, or however you wish; your file has just become a char *.

Believing in magic won't make your code run fast. Just because you can
access the components of your file in random order by dereferencing a
"char*" doesn't mean that that method will be particularly efficient.
Depending on how mmap(2) is implemented and what paging algorithms your
kernel is using, this may in fact be one of the least efficient methods
for reading a file backwards or in random order.

If you want high performance disk access for some particular access
pattern, the proven and reliable way is to work out a buffering
scheme, thinking carefully about how much real memory you have
available, what parts of the file you need to access how often, etc.

For reading a file backwards, this is fortunately straightforward,
since this case is essentially identical to sequential access of
in the "forward" direction.