rbraun@spdcc.COM (Rich Braun) (05/07/91)
Years ago, I wrote a bulletin board program for the DEC-10 which stored what amounted to a newsgroup in a single file, with an index table. Each message was basically a record in a database, containing fields like From, Date, and so on, with a variable-length field containing the text. This provided not only efficient file storage in terms of space, but also made retrieval and search operations extremely fast. (By caching a newsgroup's index table in memory, retrieval of any given message could be done in 0 or 1 seeks of the disk.) Alas, this code was written in assembly language and is no longer useful. I've mentioned to a few acquaintances the possibility of creating a new news system (or modifying the existing one) which stored newsgroups as databases rather than as directories of individual files, and the common response is "I'd lose the use of all my familiar Unix tools", probably just meaning that slow-as-molasses 'grep' wouldn't work. Is there a netnews development project anywhere with which I could correspond on this type of design issue? I've toyed with the idea of creating a complete communications package, at the heart of which would be an improved netnews system. But if someone else is working on one I'd desperately love to talk to that person. My other interest in this, naturally, is to create a completely portable (and free) system which could run on systems like DOS, OS/2, Macintosh, and VAX/VMS. Netnews is simply too valuable to be relegated only to the Unix world. -rich
nelson@sun.soe.clarkson.edu (Russ Nelson) (05/08/91)
Is there a netnews development project anywhere with which I could correspond on this type of design issue? Not that I know of. But I've proposed a similar idea to yours. Hide the database behind NNTP. The draft for a revised NNTP even has a "grep" call, so you couldn't need to do that by hand... Expire is the problem, as always. I've thought about doing it using "buckets". Each bucket would be labelled "expire in 1 days", "expire in 2 days", "expire in 3 days", etc. The problem with that is that you'd really like to expire as the disk gets full, rather than trying to express things in terms of days. Perhaps a better algorithm would be to keep articles in "chunks" with some maximum size, say 100K. These chunks would be queued up for expiration. Each chunk would have a "velocity" parameter, that dictated its speed through the queue. talk.* could have a velocity of 20, comp.* could have a velocity of 10, and comp.archives a velocity of 0 (Hi, Ed!). When it came time to find space for new news, the chunk with the greatest displacement would be deleted, and its database entries removed. An algorithm would be needed to balance the chunk size versus its displacement. Once an chunk has reached a certain displacement, it shouldn't be added to. Perhaps 10% of the highest displacement? -- --russ <nelson@clutx.clarkson.edu> I'm proud to be a humble Quaker. It's better to get mugged than to live a life of fear -- Freeman Dyson I joined the League for Programming Freedom, and I hope you'll join too.