news@massey.ac.nz (USENET News System) (03/14/91)
I like c-news a lot, but really miss the ability to expire articles by the Date: line, rather than the arrival time, as b-news allowed. It's much slower, of course, but I can think of two situations where this is very important. The first is when heaps of old articles appear, like in comp.os.minix recently. 5Mb of garbage must be gotten rid of, but it's not nice to have to expire recent articles along with the garbage. The second is just a larger example of the first. We're a leaf node with one feed. I keep articles about 10 days and run the news partition as close to full as I can without constant intervention. If we loose our feed for a few days for whatever reason (and it happens more often than one would hope), we then get say two days of news all at once when things come right. With b-news, I'd switch expiry over to date posted for a while, and the late articles expire roughly the same time they would have normally and without bothering any other articles. With c-news, I'm forced to set the expiry time for all groups very low to get rid of this `lump'. Otherwise, ten days later, the partition will fill, stopping the feed for two days until the lump is expired and then receive a new lump and off we go again. Am I the only one who feels this is important? Are there plans to add this feature to c-news? If not, what is the rational? Thanx much, -- K.Spagnolo@massey.ac.nz
henry@zoo.toronto.edu (Henry Spencer) (03/15/91)
In article <1991Mar14.012332.20774@massey.ac.nz> K.Spagnolo@massey.ac.nz (Ken Spagnolo) writes: >... I can think of two situations where >this is very important. The first is when heaps of old articles appear, >like in comp.os.minix recently. 5Mb of garbage must be gotten rid of... This is being solved in a better way: the garbage will be thrown away on arrival rather than waiting for expire to do it. >If we loose our feed for a few days for whatever reason (and it happens >more often than one would hope), we then get say two days of news all >at once when things come right... >... I'm forced to set the expiry time for all >groups very low to get rid of this `lump'. My own philosophy on this one tends to be "if your system doesn't have enough resources in reserve to handle surges, then running news is a poor idea". Flipping the expiry criterion back and forth between arrival date and posting date exacts a hideous penalty in execution time, because to do expiry by posting date requires *reading* tens of thousands of articles every time, to discover their posting dates. One of the original motives behind doing a new expire -- which is sort of what got C News started -- was getting away from scanning every article, which was intolerably slow even when traffic was an order of magnitude lower. The only way to make this practical, really, would be to store the posting date centrally. I'd rather avoid revamping the history-file format *again*. -- "But this *is* the simplified version | Henry Spencer @ U of Toronto Zoology for the general public." -S. Harris | henry@zoo.toronto.edu utzoo!henry
jerry@olivey.ATC.Olivetti.Com (Jerry Aguirre) (03/15/91)
In article <1991Mar14.194554.12750@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes: >My own philosophy on this one tends to be "if your system doesn't have >enough resources in reserve to handle surges, then running news is a >poor idea". While I tend to agree in general I think there is merit in the two cases mentioned. In the case of being down, or having your incoming feed down, the amount of news that can queue up can be as much as your feed keeps. The same thing can happen with recirculated old articles. I could, right now and very easily, put 362 Meg of old news onto the net. Could your system handle that kind of "surge"? >was getting away from scanning every article, which was intolerably slow >even when traffic was an order of magnitude lower. The only way to make >this practical, really, would be to store the posting date centrally. >I'd rather avoid revamping the history-file format *again*. What is one more <magic character>668997597 in the history file among friends? :-) One idea I considered a while back was to force the articles modified time stamp, using utimes(2), to be the posting date. That would reduce the cost to stat-ing the file instead of reading and parsing its headers, and with no additional storage overhead. It would also provide for a fairly efficient method for news readers to sort the presentation by posting date. Jerry Aguirre
flee@cs.psu.edu (Felix Lee) (03/15/91)
> I could, right now and very easily, put 362 Meg of old news onto the > net. Could your system handle that kind of "surge"? Not yet, but soon. I've figured out how to implement steady-state news, and it might even be efficient. Maybe in a few months. -- Felix Lee flee@cs.psu.edu
kurt@rufus.almaden.ibm.com (Kurt Shoens) (03/16/91)
I think the underlying problem is that expire gives one the wrong control knob. I don't usually want to remove articles older than 15 days (say). I want to make enough space available for the news that's coming in. Normally, expiring by the number of days an article has lived on my system has the desired effect. But when the news volume ramps up for some reason, it becomes more apparent that expiring by age is not what I need. What I would rather do is give expire two objectives: get me back B blocks of free space and I free inodes. Then, expire should essentially rank the articles that I currently have and delete the least precious (typically, the oldest, but you have to take into account the Expires: header) until the objectives have been met. If the news flow slows down because of, e.g., Spring Break, then I get to keep a little more. If it picks up, I keep a little less. With this sort of control, I don't think that folks would be flipping between posting date and arrival date as the expiration criterion. Or does CNews expire already support what I'm suggesting? -- Kurt Shoens
rmtodd@servalan.uucp (Richard Todd) (03/16/91)
kurt@rufus.almaden.ibm.com (Kurt Shoens) writes: >What I would rather do is give expire two objectives: get me back B >blocks of free space and I free inodes. Then, expire should >essentially rank the articles that I currently have and delete the >least precious (typically, the oldest, but you have to take into >account the Expires: header) until the objectives have been met. If >the news flow slows down because of, e.g., Spring Break, then I get to >keep a little more. If it picks up, I keep a little less. >With this sort of control, I don't think that folks would be flipping >between posting date and arrival date as the expiration criterion. >Or does CNews expire already support what I'm suggesting? Hmm. I once implemented something sort-of along the lines you suggest, back when I was trying to fit all of Unix plus a small newsfeed on the 80M internal drive on my Mac (can you say "cramped", boys and girls?). It took a rather brute-force approach, simply having a bunch of progressively tighter "explist" files which a modified version of C News's $NEWSBIN/expire/doexpire would run through, invoking expire with each explist file until a certain pre-set amount of free space was cleared up. Like I said, brute force. One nice thing about the system is that since the explists are completely arbitrary, you can tailor the expiry behaviour to some extent (i.e. make it expire news.groups to the bone before starting in on comp.unix.lizards :-) I eventually had a shell script set up to automatically generate all my explists from a single master file. The scheme did a fairly good job of tracking variations in the newsflow, adjusting the amount of expiration done as required. (Learned some interesting things that way, too, like that newsflow drops substantially on the weekends--enough so that the "auto-adjusted" expire times always increased by at least a day). I've still got the code lying about, even though it doesn't see much cause to fiddle expire times ever since I got a bigger disk. Let me know if you're interested. For the record, I recall that someone else on the net (Chip Salzenberg, maybe?) thought up the idea of progressive expires for adaptive handling of expiration. His scheme was somewhat more elaborate, in that it would actually compute a new explist on each pass, instead of relying on explists already created in advance. I went for simple instead of elaborate... -- Richard Todd rmtodd@uokmax.ecn.uoknor.edu rmtodd@chinet.chi.il.us rmtodd@servalan.uucp
henry@zoo.toronto.edu (Henry Spencer) (03/17/91)
In article <50464@olivea.atc.olivetti.com> jerry@olivey.ATC.Olivetti.Com (Jerry Aguirre) writes: >could, right now and very easily, put 362 Meg of old news onto the net. >Could your system handle that kind of "surge"? No, and neither could yours, since your proposed changes just make it expire earlier -- they don't eliminate it on arrival, which is what is really needed. >One idea I considered a while back was to force the articles modified >time stamp, using utimes(2), to be the posting date. That would reduce >the cost to stat-ing the file instead of reading and parsing its >headers... Still pretty expensive, unfortunately. Name lookups cost a lot, even on systems with namei caches. -- "But this *is* the simplified version | Henry Spencer @ U of Toronto Zoology for the general public." -S. Harris | henry@zoo.toronto.edu utzoo!henry
flee@cs.psu.edu (Felix Lee) (03/18/91)
>What I would rather do is give expire two objectives: get me back B >blocks of free space and I free inodes. Working on it. You will be able to say something like keep 15M free /news/spool and a continuously-running expire process will try to ensure that there's always at least 15 megabytes of free space in /news/spool. -- Felix Lee flee@cs.psu.edu
henry@zoo.toronto.edu (Henry Spencer) (03/20/91)
In article <580@rufus.UUCP> shoens@ibm.com writes: >What I would rather do is give expire two objectives: get me back B >blocks of free space and I free inodes. Then, expire should >essentially rank the articles that I currently have and delete the >least precious ... >Or does CNews expire already support what I'm suggesting? No provision for it at present. I thought about this a bit, long ago, but getting a precise definition of "least precious", in the presence of complications like different expiry times for different groups, is tricky. I decided that I didn't know what the policy should be and so I wouldn't try. -- "[Some people] positively *wish* to | Henry Spencer @ U of Toronto Zoology believe ill of the modern world."-R.Peto| henry@zoo.toronto.edu utzoo!henry
jerry@olivey.ATC.Olivetti.Com (Jerry Aguirre) (03/20/91)
In article <1991Mar17.012032.9351@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes: >In article <50464@olivea.atc.olivetti.com> jerry@olivey.ATC.Olivetti.Com (Jerry Aguirre) writes: >>could, right now and very easily, put 362 Meg of old news onto the net. >No, and neither could yours, since your proposed changes just make it >expire earlier -- they don't eliminate it on arrival, which is what is >really needed. Well, actually, I think it could. NNTP is going to stop accepting xfers when my free disk gets down to 5 Meg. When the regular expire does not free up enough space the script will run "expire -n junk -p -e 7 -E 60". That will get rid of the old postings. The second expire runs fairly quickly as it only has to look at the "junk" postings, not every bit of news. Granted there would be a hickup in the flow but I expect more of that would be from my feeds' problems rather than mine. Of course if the feeds were via UUCP .... But back to the issue of handling old articles. I am a little leary of just trashing them. Suppose the problem is not with the articles but rather with the system date. Every once an a while the service guys will run something that clobbers the machine date real good. (NTP has helped reduce this problem.) I dislike the idea of the system just trashing the articles though B news's technique of putting them in junk for two weeks is not that great either. If one does trash them then should one add them to the history file? If not then they can be resent and trashed several times. If they are then one can not get them again after the system date is corrected. How about adding old article IDs to the history file but with the posting date instead of the arrival date? That way they will expire thru the normal process and even the ID will flush out of the history file. One has to parse the posting date anyway and this would only apply to old articles so it would not effect normal operation. One could even install the articles in the normal groups with the knowledge that they will not outlast the next regular expire. (Of course they should not get forwarded on.) Jerry Aguirre