nelson@sun.soe.clarkson.edu (Russ Nelson) (01/01/91)
Here's a suggestion for someone with time on their hands: Some sites only let users read news through nntpd. Therefore, the actual storage of the news articles is hidden from the user. So, there is no reason why the news spool cannot be compressed. There are at least three levels of compression: 0) None. Each article is stored in its own file, as is currently the case. 1) The mere catenation of N articles into one file. This saves space because many news articles are a small multiple of the disk block size. The unused part of the last block allocated to the article file is wasted, and it's often significant relative to the size of the article file. 2) A restartable data compression algorithm is used to compress a level one file. I posted my restartable Huffman decoder to alt.sources some while back. 3) A different compression algorithm that runs slower but compresses more? Obviously an index is needed for levels one and two. But, the good design of news's history file saves us. Instead of creating a separate index file, the group/number as stored in ${NEWSLIB}/history is converted into the index. For example, after a level zero article is changed into level one, its history entry might read: <1@foo.com> 888888888~77777777 compress#1.alt.foo/566567567 Where compress#1 is the name of the file holding the compressed articles, and the number following the / is the seek offset into the file. This particular representation might not work -- I don't know what constraints there might be on the history file format. But certainly something can be worked out. It would be reasonable to keep the most recent day's articles in level 0 (since you're adding to them), yesterdays in level 1, and the far, ancient past (two or more days ago :-) in level 2. Expire could be a problem. However, the program that schedules increasing levels of compression could also group together only articles with identical expire dates. Why did I think of this weirdo scheme? Well, 1) I've got a fever, and my brain's running amok to little purpose, 2) the daily volume is ever marching onward, and 3) I've got this "little" Xenix system with four processors, 32 MB of memory, and 600MB of disk, but Xenix only gives me 65535 inodes per partition, so I have lots of disk space, but not much to do with it past 65535 article (references!, remember cross-posting counts up an inode). -- --russ (nelson@clutx [.bitnet | .clarkson.edu]) FAX 315-268-7600 It's better to get mugged than to live a life of fear -- Freeman Dyson I joined the League for Programming Freedom, and I hope you'll join too.