flee@dictionopolis.cs.psu.edu (Felix Lee) (12/12/90)
News volume is something like 15 megabytes a day (and growing). expire needs about 10M to rewrite the history file. Disk space used by the news system varies by 25M or more on a daily basis. Why not expire constantly? Every time you receive some articles, remove some other articles and free an equivalent amount of space. In a steady-state news system, disk space usage is easier to control and requires less care and feeding. This is an extreme form of space-based expiry and inherits all of its control problems. Deciding which articles to expire may be difficult. Has anyone figured out how space-based expiry should work? -- Felix Lee flee@cs.psu.edu
henry@zoo.toronto.edu (Henry Spencer) (12/12/90)
In article <Faivp#r3@cs.psu.edu> flee@dictionopolis.cs.psu.edu (Felix Lee) writes: >This is an extreme form of space-based expiry and inherits all of its >control problems. Deciding which articles to expire may be difficult. >Has anyone figured out how space-based expiry should work? We thought about it a bit for C News, and concluded that the policy issues are complicated (it's simple if you expire everything at the same time, but the interactions with selective expiry get messy) and we didn't feel like solving them. -- "The average pointer, statistically, |Henry Spencer at U of Toronto Zoology points somewhere in X." -Hugh Redelmeier| henry@zoo.toronto.edu utzoo!henry
scs@lokkur.dexter.mi.us (Steve Simmons) (12/12/90)
flee@dictionopolis.cs.psu.edu (Felix Lee) writes: >News volume is something like 15 megabytes a day (and growing). >expire needs about 10M to rewrite the history file. Disk space used >by the news system varies by 25M or more on a daily basis. >Why not expire constantly? Every time you receive some articles, >remove some other articles and free an equivalent amount of space. We've recently gone to twice-daily expires. Since Cnews expire runs so quick, it's not been a significant load on the systems and had done wonders to level out disk usage in /usr/spool/news. We expire on the average after 4 days, so that cuts the peaks by about 12%. Definately worth it. We've been keeping about 12 months of stats on how space is distributed in /usr/spool/news. Most of the data wouldn't mean much to another site unless it used *exactly* our expire pattern, but I was quite surprised to see how much variance there was from week start to week end and longer term by (apparently) the school year. If I ever come up with copious free time I'll try and work those figures up into something rational. -- "SO be it! The fate of the UNIVERSE is in your hands!" "Talk about job-related stress."
jef@well.sf.ca.us (Jef Poskanzer) (12/12/90)
In the referenced message, henry@zoo.toronto.edu (Henry Spencer) wrote:
}it's simple if you expire everything at the same time,
I've thought about doing this too, in my Copious Free Time(tm).
Seems like it would be good a good thing to add to the C news
package as an option.
---
Jef
Jef Poskanzer jef@well.sf.ca.us {ucbvax, apple, hplabs}!well!jef
"Necessity is the plea for every infringement of human freedom. It is
the argument of tyrants; it is the creed of slaves." -- William Pitt
gary@proa.sv.dg.com (Gary Bridgewater) (12/12/90)
I really like this idea! In article <1990Dec11.231124.24426@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes: >In article <Faivp#r3@cs.psu.edu> flee@dictionopolis.cs.psu.edu (Felix Lee) writes: >>This is an extreme form of spacebased expiry and inherits all of its >>control problems. Deciding which articles to expire may be difficult. Keep a FIFO file per expire rule - i.e. a 1day file, 2day file, 1week file. Maintaining the files is a pain but you could keep a separate tell() index to the current actual beginning and only rebuild rebuild from time to time. New articles just go on the end via append. This assumes that the current history file is split in two - one file to hold the IDs and another (set of) file(s) that contains the posted/expire/article# info. It might be handy to keep the size as well. This data is all jammed together now since it is processed at the same time. The problem is or has been coordinating them - i.e. knowing when to drop articles and when IDs can be let go. But I don't think they need to be coordinated much beyond keeping an ID _at least_ as long as you keep the article. It should be possible to devise a method to drop IDs separatly from the date - perhaps an 8 bit pseudo-time stamp. That is, e.g., this is week 0 so we scan the database and drop all old week 1 values, next week is week 1 and we scan and drop all week 2 values. After week 0xff we just wrap. I don't know where the 8 bits are going to come from - maybe 4 or 6 is enough. It's just a thought. >>Has anyone figured out how spacebased expiry should work? This should be answered on a per site basis. You know where you store news and how to check the current utilization and limits. The software would invoke your function and could either ask "How much should we dump?" or "Should we dump more?" or some simple question for which the answer can be determined. When you have built the module you "compile" the expire system - i.e. run the script producing script. >We thought about it a bit for C News, and concluded that the policy issues >are complicated (it's simple if you expire everything at the same time, >but the interactions with selective expiry get messy) and we didn't feel >like solving them. Don't apologize for it. Breaking up the functionality of posting and expiring was a _Good_ _Thing_. Thanks. It opens up a lot of possibilities and lets effort be spent on individiual components without destroying your whole news system. Hacking B-news expire can be very scary. Granted that adding all this file manipulation is a pain - dealing with 15 Mbytes a day is already a pain and it isn't going to get better. -- Gary Bridgewater, Data General Corporation, Sunnyvale California gary@sv.dg.com or {amdahl,aeras,amdcad}!dgcad!gary C++ - it's the right thing to do.
zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) (12/12/90)
>>Why not expire constantly? Every time you receive some articles, >>remove some other articles and free an equivalent amount of space. > >We've recently gone to twice-daily expires. Since Cnews expire >runs so quick, it's not been a significant load on the systems and >had done wonders to level out disk usage in /usr/spool/news. We I went to automatic, on demand, expires (run from rnews) a long time ago. Much better than any kind of guessing needed disk space approach. I agree that a smarter continuous expire could be even more efficient (and more difficult to implement). Sources are available via anon ftp from ais.org:~ftp/pub/cnews.speedups.Z -- Jon Zeeff (NIC handle JZ) zeeff@b-tech.ann-arbor.mi.us
brad@looking.on.ca (Brad Templeton) (12/13/90)
Yes, I have long thought that expire-as-it-arrives is the best solution. You don't have to be stupid about it either. For example, you could have a process run nightly (or less than that, as need be) which prepares a list of articles on the system, sorted by their "value" The value is up to you, but it would no doubt be a factor of their age, newsgroup, author/site (ie. keep local articles longer) modified by things like explicit expiry, etc. You can, of course, get age, explicit expiry, newsgroup and posting site from the current C news history file. You need one other thing, namely size. (My space based expire use to factor size in as well, reducing the value of big files, so that they went slightly earlier -- theory being it was better to toss one 40K article than 20 2K articles. This might not apply in source groups.) Anyway, you sort the articles based on their value, and you thus produce a list, from the least valuable upwards, with the message-id and size in disk blocks. (You might add more to make it go faster on subsequent nights, since you could, in theory, calculate value on new articles, figuring the value of known articles from their old value and the elapsed time.) On the other hand, you only need to store the least valuable N megabytes of articles. When your news database program (relaynews) comes along, it counts the disk space it uses. As it uses space, it goes through the expire file and frees up enough files to get back that space -- keeping track of extra stuff freed in some sequence file. (It needs to store a seek address into the value file too.) And thus news takes exactly a fixed amount of space, but with some groups going faster than others, etc. etc. Since the purpose of expire is to keep news to a limited amount of disk space, and not a limited number of days, this seems to me the ideal way to do an expire. -- Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473
clear@cavebbs.gen.nz (Charlie Lear) (12/13/90)
In article <1990Dec12.093657.1488@proa.sv.dg.com> gary@proa.sv.dg.com (Gary Bridgewater) writes: >I really like this idea! >In article <1990Dec11.231124.24426@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes: >>We thought about it a bit for C News, and concluded that the policy issues >>are complicated (it's simple if you expire everything at the same time, >>but the interactions with selective expiry get messy) and we didn't feel >>like solving them. > >Granted that adding all this file manipulation is a pain - dealing with >15 Mbytes a day is already a pain and it isn't going to get better. I've used the MS-Dos version of Waffle for around a year, and I've been running an AT&T 3B2 for four months. With some expert help, we've got cnews and trn working just fine. But we keep running out of disk. (Running out of inodes was solved at the first repartitioning cycle!) The MS-Dos Waffle expire gets rid of articles on a numbered, rather than timestamped, basis. You say you want to keep the last 500 articles in rec.pyrotechnics, fine. You're not interested in comp.sys.obscure.4bit then set it to expire at four or five articles. My default is /keep=50 articles. I find that to be an excellent system and haven't come across any problems with it at all. Using cnews, if we get delays connecting to our host then we can get fifteen or twenty megs of compressed news in in one hit. That gets uncompressed and sits there for a couple of days, unless I get desperate and manually cruise through the news directories deleting unwanted files. Why wouldn't a numerical expire work under Unix? Has it been done before and rejected as unworkable? PS: Great software, Henry. I hope they pay you what you're worth... 8-) -- -------------------------------------------------------------------------- Charlie "The Bear" Lear | clear@cavebbs.gen.nz | Kawasaki Z750GT DoD#0221 The Cave MegaBBS +64 4 643429 V32 | PO Box 2009, Wellington, New Zealand --------------------------------------------------------------------------
ske@pkmab.se (Kristoffer Eriksson) (12/15/90)
In article <1990Dec12.213956.6544@looking.on.ca> brad@looking.on.ca (Brad Templeton) writes: >The value is up to you, but it would no doubt be a factor of their age, >newsgroup, author/site (ie. keep local articles longer) modified by things >like explicit expiry, etc. I think the most natural system for fixed-space expiry is just to look at space down at the level of individual newsgroups, assigning a fixed space for each group. Either that, or just expire in FIFO-order for the whole system, which will get you newgroups that are sized relative to each other according to their volume of postings. -- Kristoffer Eriksson, Peridot Konsult AB, Hagagatan 6, S-703 40 Oerebro, Sweden Phone: +46 19-13 03 60 ! e-mail: ske@pkmab.se Fax: +46 19-11 51 03 ! or ...!{uunet,mcsun}!sunic.sunet.se!kullmar!pkmab!ske