tneff@bfmny0.UU.NET (Tom Neff) (02/02/90)
I too have noticed that folders tend to get huge. When they do, Mush's performance gets excruciatingly slow. Loading them means scanning the entire file searching for message starts. This can take forever! And yet breaking the folders up into smaller ones is an unsatisfying solution, because you lose the ability to manipulate the entire collection of messages with "pick," "sort" etc. This is what we use Mush for to begin with - hate to give it up. So another idea occurs to me - how about INDEXING huge folders? Storing a list of start-of-message pointers in a separate index file (in the same directory as the folder) would let you access a huge folder in seconds. The header fields Mush displays for the current screenful of messages could be grabbed in a few seek-and-reads. As you need other headers, you go get them. (Mush could even keep loading headers in the background after displaying the current set and entering the shell in the foreground.) How it might work: The user decides that the currently opened folder ("+mysave") is huge and should be indexed. He issues the Mush command: index This creates "+X.mysave" containing start-of-message file pointers for the folder. Mush remembers the folder is indexed and will update the index file whenever the folder itself is updated. In a later Mush session the user selects folder "+mysave" and Mush notices that "+X.mysave" also exists. If it is newer than "+mysave" then the index is loaded and its pointers used to achieve a fast "scan" of the folder; only the needed messages are actually read from the folder file. If the index exists but is older than the folder file Mush gets smart. If the old indices LOOK like they point to messages, and the folder is just bigger, then Mush fast-scans the "old" portion and brute force reads the "new" before display. This is the normal case when a mail delivery agent appends new messages to a folder. But if the indices look WRONG now (indicating that somebody edited or otherwise touched the folder with some other program since the last Mush session), Mush warns the user "Index obsolete - rebuild? [y]" and prompts. (I haven't thought in any depth about what Mush does if the user answers "no", but clearly Mush doesn't use the index.) A final optimization for huge folders would be to update the original file IN PLACE if the user's changes don't require moving any text around, e.g., deleting new messages while leaving old ones untouched. I realize not everyone's OS permits this, but it would make a nice compile time switch.
schaefer@ogicse.ogc.edu (Barton E. Schaefer) (02/03/90)
First, some introductory remarks on folder compression: In article <5C98ACE2A8@crdos1> davidsen@crdos1.crd.ge.com writes: } } If this gets added, and I would love to see it, provision should be } made to provide a compress and uncompress string, which, if defined, } would be loaded with the parameters and executed. This would allow not } only compress, but also things like arc, zoo, zip, lzhuf, etc, archives } of folders. Many of these run on DOS as well as UNIX. } } Make it work right Of course. As I pointed out in an earlier article, the main difficulty with compressed folders is loading from a pipe. If you're willing to live with two temp files -- uncompress to the first temp and load it into the "working" temp from there, then recompress back to the original after update -- it could be implemented easily. But in that case you might as well use the "zfolder" cmd and scripts I'm about to post new versions of, because that's what they do. } This should be integrated into the save commands, too. No use allowing } a compresses folder if you can't add to it. And perhaps an option to not } recompress until exit, so going thru your mail and sending stuff into } folders would not thrash folders if multiple things were added. See my earlier comments on the infeasibility of "save" into a compressed folder. I see what you're driving at -- uncompress when the first save to that folder is issued, then remember that you need to compress again at exit time -- but I really think my scheme of saving to a secondary folder that is not kept compressed, and then merging when necessary, is more efficient both in terms of time (assuming that the most recently saved messages are the ones that are most frequently needed, accesses are quicker because you need not uncompress) and in terms of disk space. In article <48677cdf.20b6d@apollo.HP.COM> ced@apollo.HP.COM (Carl Davidson) writes: } } I, too, have some mail folders that are huge (> 2 Mbytes). The ability } to compress/decompress folders "on-the-fly" would be nice. Even better } would be to store mail messages in a hypertext database. Goodness, not asking for much, are we? :-) } I also realize that this is a pipe dream, so I would gladly settle for } auto compress/decompress. If that other Davids*n's user-specifable packing/unpacking strings get implemented, you can probably use them to connect folder loading to any kind of database you like. It's a little beyond what mush is designed to be to have that kind of database manager built in. (Dan is free to contradict me on this. :-) In article <15147@bfmny0.UU.NET> tneff@bfmny0.UU.NET (Tom Neff) writes: } I too have noticed that folders tend to get huge. When they do, Mush's } performance gets excruciatingly slow. Loading them means scanning } the entire file searching for message starts. This can take forever! } } So another idea occurs to me - how about INDEXING huge folders? Various means for implementing this very thing have been under discussion for some time. What hasn't been solved is detection of corrupted folders or index files, when the index appears valid to external checks (like the modification times) but actually doesn't agree with the folder. The algorithms for doing this validation are understood, but implementation appears to require a complete rewrite of the folder loading code (which was hard enough to get right in the first place). In other words, it's on our "some day" list. } A final optimization for huge folders would be to update the original } file IN PLACE if the user's changes don't require moving any text } around, e.g., deleting new messages while leaving old ones untouched. Hmmm .... -- Bart Schaefer "Live and don't learn, that's us." -- Hobbes schaefer@cse.ogi.edu (used to be cse.ogc.edu)
tneff@bfmny0.UU.NET (Tom Neff) (02/03/90)
One other wish list item would make life with huge (and less than huge) folders easier over slow baud rate connections: Allow a "+initial_command" switch on the Mush invocation line, and obey it the command before initial display. This would be generally useful, like the "+cmd" feature in 'vi' and 'less'. Specifically what I would tend to do on slow speed dialup lines is say mush -f mylist +last-msg # or +$ so that I enter the folder at the BACK rather than having to sit fidgeting through the entire initial display and THEN switch to the final screenful of messages. So if the general +initial_command facility is too hard, it would at least be great to add a -G switch to jump to the end of the specified folder before display. OK smoke em if you got em. :-)
schaefer@ogicse.ogc.edu (Barton E. Schaefer) (02/05/90)
In article <15151@bfmny0.UU.NET> tneff@bfmny0.UU.NET (Tom Neff) writes: } One other wish list item would make life with huge (and less than } huge) folders easier over slow baud rate connections: } } Allow a "+initial_command" switch on the Mush invocation line, and obey } it the command before initial display. This would be generally useful, } like the "+cmd" feature in 'vi' and 'less'. Just the other day I was trying to figure out how to implement a -e option, ala sed, perl, etc., which would be pretty much equivalent. You also ought to be able to use multiple -I or -F options, which you can't at the moment. } Specifically what I would tend to do on slow speed dialup lines } is say } } mush -f mylist +last-msg # or +$ } } so that I enter the folder at the BACK rather than having to sit } fidgeting through the entire initial display and THEN switch to } the final screenful of messages. I take it you have "alias mush 'mush -C'" or the like, so that it wouldn't work to use "mush -N"? Note also that the "curses" mode can now be turned on and off in the .mushrc file, so you can if $TERM == slow-dialup-terminal-type # whatever curses off endif or, alternately, get rid of the alias for -C and use if $TERM == fast-at-the-office-type curses endif If the two types are the same I'm sure you can figure out some way to differentiate; e.g. put "setenv TTY `tty`" in your .login and then if $TTY =~ *ttyd* curses off endif I'm not rejecting your suggestion, I'm just offering workarounds for the present situation. -- Bart Schaefer "Live and don't learn, that's us." -- Hobbes schaefer@cse.ogi.edu (used to be cse.ogc.edu)