dce@smsc.sony.com (David Elliott) (12/27/89)
Our site gets a pretty full newsfeed and has 250MB of space set aside for news articles. I've been trying to get the C news explist stuff set up for maximum use of the space, but I am a little paranoid when it comes to raising the expiration times because I've seen too many instances of news logjams getting unstuck and filesystems overflowing. I've seen programs that expire based on disk space, allowing you to prioritize newsgroups and expire until there is a minimum amount of free space, but this may come to late. It seems to me that a better idea would be to have a program that generates a list of files to remove in order of "removability". When the unbatcher starts working, it would look at the space available, and while there wasn't enough, it would remove files from the top of said list. This could even be used in concert with the current expiration mechanism to allow for a general smooth removal of articles that really are out of date. The "removability" of a file would be a function of newsgroup name, newsgroup size, and file age. One might use a formula like: removability = (X*size^2 + Y*age^2) - usefulness(newsgroup) The "usefulness" function would be a table of constants supplied by the administrator. The values of X and Y are supplied for each newsgroup to give weight to these items. For example, the table entries # Group Usefulness Size Age rec.music 15 3 5 rec 10 1 1 This says that rec.music.* rates better than the other rec groups, but that the newsgroup should be smaller, and the articles become useless pretty fast. Comments? -- David Elliott dce@smsc.sony.com | ...!{uunet,mips}!sonyusa!dce (408)944-4073 "But Pee Wee... I don't wanna be the baby!"
davidsen@sixhub.UUCP (Wm E. Davidsen Jr) (12/28/89)
I set my threshold quite high and leave a good bit of space when I stop uncompressing news. On a regular basis I check the space in the spool partition and if it is getting tight enter a loop like so: read the expiration (-e) time for high volume and low usefulness groups read the names of the groups in these categories while `not enough space' and `expiration > 0' expire the high volume groups expire the high noise groups (ie. alt.flame, etc) decrease the expiration time by one day check the space again # end loop Since this doesn't happen very often, and the expire can make space faster than news can come in (this is the trick), I have no problems. The check usually runs about 700ms of CPU when all is normal, so I hit it every half hour. It's inefficient when it starts working, but once or twice a month I can stand it. -- bill davidsen - sysop *IX BBS and Public Access UNIX davidsen@sixhub.uucp ...!uunet!crdgw1!sixhub!davidsen "Getting old is bad, but it beats the hell out of the alternative" -anon
woods@robohack.UUCP (Greg A. Woods) (12/28/89)
In article <1989Dec27.033817.9953@smsc.sony.com> dce@smsc.Sony.COM (David Elliott) writes: > The "removability" of a file would be a function of newsgroup > name, newsgroup size, and file age. One might use a formula > like: > > removability = (X*size^2 + Y*age^2) - usefulness(newsgroup) > > The "usefulness" function would be a table of constants supplied > by the administrator. The values of X and Y are supplied for > each newsgroup to give weight to these items. > > For example, the table entries > > # Group Usefulness Size Age > rec.music 15 3 5 > rec 10 1 1 This is similar to some ideas I had recently. Your article has inspired me to put my thoughts on paper, so to speak: I would rather still have expire do the expiring, rather than rnews. This allows more flexibility, not to mention archive support, etc. I would definitely not want relaynews to do expiring too! Your usefulness field would be a factor, between 0 and MAXINT, used to prioritize newsgroups. The size field would be the desired number of articles to be kept in the spool. This number would be decremented, taking into account the usefulness factor, if space was really tight. Expire would still pay attention to the Expires: header, with the same three value control field as it currently has, in place of your suggested age field. The 'retention' value would have highest priority, but with the usefulness factor applied if space was really tight. The 'normal' value would be of lower priority than size, and if null the Expires: header would be followed explicitly, unless the 'purge' date over-rides it. The 'purge' value would also outweigh both usefulness and size, but could of course be left null, or set quite large. In addition, expire would be given a goal of free space) to be achieved. (i.e. a '/freespace/' line like '/history/'.) Expire would still use spacefor to determine its success. Expire would then become a multi-pass process, but I don't think this would impair its speed much. In order to enhance performance, I would place the article byte size in the history file (though block size would be more useful). Since all cross-references are already noted by newsgroup, it is very easy to calculate the potential gain if an article is expired, while keeping in mind the various explist control lines for the article. There could even be a flag to determine the effect on cross posted articles. Either the quickest, or the longest, expire could be used for all links, or each link could be expired separately, with space gained only upon expiration of the last link. In case rnews runs out of space in spooling incoming news, it can simply wait for space, as it normally does. I currently have the newswatcher script run hourly and it runs an emergency expire when space becomes tight. For now I have a series of expire scripts which are run in sequence until sufficient space is freed. With a goal oriented expire, this would be unnecessary, and indeed an emergency expire would only be required during news floods. Of course there must be sufficient space in your spool directory for incoming uucp jobs while expire runs. I am always careful to isolate /usr/spool/news, and I usually have a separate /usr/spool/uucp, and if not, at least a separate /usr/spool. Also on the disk space vs. news issue, I've been thinking of changes that would be nice in spacefor and it's users in order to have finer control in identifying space in in.coming, news spool, out.going, uucp spool, etc. -- Greg A. Woods woods@{robohack,gate,tmsoft,ontmoh,utgpu,gpu.utcs.Toronto.EDU,utorgpu.BITNET} +1 416 443-1734 [h] +1 416 595-5425 [w] VE3-TCP Toronto, Ontario; CANADA
dce@smsc.sony.com (David Elliott) (12/29/89)
In article <1989Dec28.063932.13720@robohack.UUCP> woods@robohack.UUCP (Greg A. Woods) writes: >I would rather still have expire do the expiring, rather than rnews. >This allows more flexibility, not to mention archive support, etc. I >would definitely not want relaynews to do expiring too! Actually, I was thinking more in terms of having newsrun doing the expiring as part of its loop. The big problem as I see it is that expire is slow (at least the B news version was), especially if you start adding special heuristics based on usefulness and group size and file age and number of subscribers and so forth. If expire generated a list of files to expire once a day, you could still archive the files, and maintain flexibility, but when it's time for them to go to make room for other files, it's easy and fast, and until that time comes, they're still available. -- David Elliott dce@smsc.sony.com | ...!{uunet,mips}!sonyusa!dce (408)944-4073 "But Pee Wee... I don't wanna be the baby!"
brad@looking.on.ca (Brad Templeton) (12/29/89)
Back when I had smaller disks, I ran a spaced based expire that I wrote. I had the cron wake up every 15 minutes and run it if the amount of free disk space got too low. Space based expire *is* the way to do it. Particulary if you ever get things like news stoppages lasting a day, or high-volume days with big fat binaries and source distributions. There is a certain elegance to inews doing the expire, by using a list of 'next to go' articles that is created every night by a background expire program. This deals with batching well. But the same result can probably come from having inews simply record how much space it has used since the last check, and having an hourly program do an expire of that much space, resetting the count. Either way time based expire is a loser. The purpose of expire is to keep down the amount of disk space (and sometimes inodes) used by news, isn't it? -- Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473
henry@utzoo.uucp (Henry Spencer) (12/29/89)
In article <1989Dec28.171830.13130@smsc.sony.com> dce@Sony.COM (David Elliott) writes: >>I would rather still have expire do the expiring, rather than rnews. >>This allows more flexibility, not to mention archive support, etc. I >>would definitely not want relaynews to do expiring too! > >Actually, I was thinking more in terms of having newsrun doing the >expiring as part of its loop. Folks have done that with C News, although it's not something we support officially. Possibly we should, but the obvious technique -- dynamically generating expire's control file and cranking down the numbers until space is adequate -- interacts awkwardly with some of the fancier things you can do in the control file. If I can think of some graceful way to deal with this, I'll probably make it available as an option. >The big problem as I see it is that expire is slow (at least the B >news version was), especially if you start adding special heuristics >based on usefulness and group size and file age and number of >subscribers and so forth. C News expire is essentially entirely I/O-bound and dbm-bound (I haven't yet run detailed timings with dbz, although I'll do it soon), so adding a *little* complexity to the decision process would not be disastrous. We were very close to adding the size of the file as another subfield in the history file's middle field, so that it could be used as input for decision making. Alas, it's *not* easy to define exactly how such policies should work in the presence of complications like per-group expiry settings, and we tend to believe in the theory that you should not collect data until you have some idea what you're going to do with it. >If expire generated a list of files to expire once a day, you could >still archive the files, and maintain flexibility, but when it's time >for them to go to make room for other files, it's easy and fast, >and until that time comes, they're still available. I thought a bit about breaking expire into a decision part and an implementation part, so to speak, like this. I wasn't convinced that it offered enough advantages to be worth the effort and possible problems. *However*... note that expire's -t option does almost exactly what the decision module would do: it prints a description of what expire would do, but doesn't do it. The output is *almost* an executable shell file -- at one point it was one, until I noticed that there are some complications like creating directories that are hard to deal with simply -- and picking out the file names would not be hard. I will write up the format in the documentation, so folks can depend on it. -- 1972: Saturn V #15 flight-ready| Henry Spencer at U of Toronto Zoology 1989: birds nesting in engines | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
brad@looking.on.ca (Brad Templeton) (12/29/89)
I found using the size of a file as a parameter to be useful. My expire assigned a score to each article. The base of the score was the age of the file in seconds (I ignored any explicit expiry date -- I didn't want outsiders deciding how long their pearls of wisdom would stay on my system when I only had 3000 blocks free, thank you.) In any group, one could add to the score based on the group, so that some groups hung around longer than others. In addition, I did set it so that the size of the file (multiplied by a constant of your choice) was added to the score. The scores were then sorted, and the files with the highest scores were removed until enough disk space had been freed -- or rather until the remaining disk space was my fixed allocation for news. Adding the size made it the case that one really big article would go instead of a dozen small ones. This kept the average number of days of articles kept higher than it would have been otherwise. So Henry, there's one point of data. I would post this very simple expire here if people want it -- it's quite short -- but there's a lot it doesn't do. For one, it ignores the history file all together, and just gets ages from stat(). It doesn't update the database or the active file. That means you need to run expire -rebuild every few days to get things back in mesh. This program controls the disk space problem and lets the real expire keep track of the databases and active file. -- Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473
zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) (12/29/89)
My rnews.c that was recently posted does progressive expires if there isn't enough space. It works quite well. I agree that you could come up with something that would be more efficient, although it's not that easy to eliminate doing multiple passes through the history file (which is all expire -r does). -- Jon Zeeff zeeff@b-tech.ann-arbor.mi.us or b-tech!zeeff
storm@texas.dk (Kim F. Storm) (12/30/89)
woods@robohack.UUCP (Greg A. Woods) writes: >There could even be a flag to determine the >effect on cross posted articles. Either the quickest, or the longest, >expire could be used for all links, or each link could be expired >separately, with space gained only upon expiration of the last link. I cannot see what benefits this should give you? Either you expire an article because disk-space is sparse (or due to some other resource related policy), or you keep the article. The problem with your idea is that it makes the "Newsgroups:" line unreliable which may fool some news readers (users and software :-) who will say - oh, I will see this article again later in group XYZ (so I wont read it now), or - oh, I already saw that article in group ZYX (so I can skip it). The only reason I can think of for doing what you suggest would be to make the improved expire run faster compared to what it has to do if calculating the "combined" usefulness of the article. I really don't believe you can save any significant time on this "hack", and I therefore fail to see that the inconsistency that would be imposed by this method can be justified by the marginal time savings on expire (and those savings may be more than wasted on rewriting the history file to reflect the narrowed set of groups in which the article occur). An easy rule to calculate the combined usefulness of an article would be the maximum usefulness of the article in any of the groups to which it is cross-posted. This will mean that if an article is important enough to be kept in one group, it is important enough to keep in all its groups. -- Kim F. Storm storm@texas.dk Tel +45 429 174 00 Texas Instruments, Marielundvej 46E, DK-2730 Herlev, Denmark No news is good news, but nn is better!
henry@utzoo.uucp (Henry Spencer) (12/30/89)
In article <68634@looking.on.ca> brad@looking.on.ca (Brad Templeton) writes: >Space based expire *is* the way to do it... >Either way time based expire is a loser. The purpose of expire is to >keep down the amount of disk space (and sometimes inodes) used by >news, isn't it? "Keep down" does not mean "strictly bound". Given constraints on things like resource consumption, and a user preference for predictable behavior, it's not obvious that time-based expire is bad. Much depends on details of the system's environment. -- 1972: Saturn V #15 flight-ready| Henry Spencer at U of Toronto Zoology 1989: birds nesting in engines | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
davecb@yunexus.UUCP (David Collier-Brown) (12/30/89)
>In article <68634@looking.on.ca> brad@looking.on.ca (Brad Templeton) writes: >>Either way time based expire is a loser. The purpose of expire is to >>keep down the amount of disk space (and sometimes inodes) used by >>news, isn't it? henry@utzoo.uucp (Henry Spencer) writes: [...] Given constraints on things >like resource consumption, and a user preference for predictable behavior, >it's not obvious that time-based expire is bad. To expand on the above a bit, news has never been a well-behaved user of space, strictly because of the temporal dimension... News is trying to present information in a timely fashion, keep it around until the majority of readers have a chance to read (and save) it and then discard it as "old news". This is hard. In the multi-site case, the delays make it **very** hard. We approximate the discarding of news after use by the expirey scheme, which is really trying to do two things 1) recover space (News appears to think it runs on an infinite-disk-machine (:-)) 2) provide a simple rule to its client base: for example, "you must read category c in one week, category s in 5 days and the rest daily, or you will miss material". The parameterization in expire reflects the author's desire to have the local site set the policy it needs, or not as the case may be. Excessive concern with space (ie, an implementation problem) can cause the behavior of the system to be mysterious and unpredictable to its users. Regrettably, the news flow variability tends to crash up against disk limits a lot, making a time-based expire dangerous: with older news systems news was simply lost when one got a burst that overflowed your disk. [Something with which I am all to familiar]. This leaves us with a contradiction: we have two needs, both quite real, which draw us in opposite directions. The reader needs the illusion of reliability and regular expiery limits. The system needs to trade off space against flow. This tends to make an elegant solution hard. My best attack on the problem is to define a hierarchy of requirements, and satisfy them in order: 1) news shall not drop articles on the floor [0] 2) articles shall be kept around for not less than the "standard" time to forward them to directly-connected systems, plus a safety factor [1+3] 3) articles in groups which are NOT being read locally shall be available for a period of time sufficient to allow a new subscriber to find on or more articles in the group, so they will not mistake the group as inactive. [1+x, x defined by mean time between messages] 4) articles in groups which are being read locally shall be kept for a period known to the readership, shall disappear soon after that time and are in general unrecoverable after they disappear. [1+y] This implys one can usefully probe user's .newsrc files to see if groups belong in category 3 or 4, but will have to deal in policy to make other decisions: a) What groups do you send & recieve, and how much space must you have just for transfer, plus packing and handling. (Indeed, must you have a separate uucp spool...). What agreements about new hierarchies & groups do you have with your feeds. b) What groups and hierarchies do you provide locally. (Why.) What is your minimum residence time. What minimum amount of space must you provide for them, if all were considered "unread". c) What groups/hierarchies are currently read. What additional space is required per day of residency. d) What is the expected increase in volume and readership per year. What does that do to all of the above. e) Do you have categories of groups (ie, comp vs talk). What is your criteria for this categorization. What will changes in category cost in space. So most of the questions are non-technical... and less than exciting to consider. At the technical level (as I implied before), the best model I can suggest is paging, with expire (the reaper!) putting the messages on the deletable list based on as complex a criteria set as you'd like, the news inspooler (space user-upper) trashing them to make room for unpacked new articles, and a optional rescuer grabbing them back if they are re-referenced later. [This last is a gut-feel speculation on my part]. --dave (out of time to write & ideas, simultaneously) c-b -- David Collier-Brown, | davecb@yunexus, ...!yunexus!davecb or 72 Abitibi Ave., | {toronto area...}lethe!dave Willowdale, Ontario, | Joyce C-B: CANADA. 416-223-8968 | He's so smart he's dumb.
brad@looking.on.ca (Brad Templeton) (12/31/89)
Perhaps the idea is to break up expire neatly into two parts. One prepares the list of articles to go according to whatever criterion, and the other removes them, updates the database and active file etc. The removing part could be part of inews, or an independent program. I haven't done anything, but it seemed to me that if you want a really fancy expire, something like a newsclip program might make an interesting front end. You could keep or expire or weight articles based on anything -- from what group they're in, to who posted them, to what thread they're in to whether they contain patterns. Of course, it needn't be newsclip, it could be any scanning program, from those that just read the history file to anything else you want to code. The bad part is that this of course requires reading every article, which was the slow thing about B news expire. Fortunately, once you calculate the score for an article, the only thing that affects it is the passage of time, so you could arrange to keep a history like file with calculated scores, the time they were calculated, and the multiplier to be used when adding the time since to get the final score. In this case, you only have to scan new articles and add to the file. This is not enough, however. The program that sorts and decides who is first to go needs some smarts beyond this. The simplest thing to do is just keep the N blocks of articles with the highest (lowest?) scores. But if you want to get fancy, you might want to instead assign Y blocks per group. ie. news.software.b always keeps 300K, talk.bizarre always keeps 150K, etc. You might also want to keep a fixed number of articles in a group. -- ie. keep 10 articles in groups that nobody currently reads, or keep a minimum of 5 articles in any group, even if it's super-low volume. All in all, a messy problem... -- Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473
dce@smsc.sony.com (David Elliott) (12/31/89)
In article <6118@yunexus.UUCP> davecb@yunexus.UUCP (David Collier-Brown) writes: > 2) provide a simple rule to its client base: for example, "you must > read category c in one week, category s in 5 days and the rest > daily, or you will miss material". ... > 4) articles in groups which are being read locally shall be kept for > a period known to the readership, shall disappear soon after that > time and are in general unrecoverable after they disappear. [1+y] I agree with most of David's points, but I think that "will" and "shall" in the above should be softened by adding "probably". That is, people should know that news will be around for no less than the given expiry date, and it might be around after, but should not be counted on. This is intimated in the statement "general[ly] unrecoverable". One thing I wonder about is the mechanism to use for grabbing the subscriber info. You can't rely on .newsrc being used or being available. On our network, for example, people read news using NFS, and they may not even have accounts on the main news machine (they only need it to post). Of course, we also have people who don't like the idea of being in a network, so they read news on the main news machine. In other words, I don't even have a good set of rules to follow. -- David Elliott dce@smsc.sony.com | ...!{uunet,mips}!sonyusa!dce (408)944-4073 "But Pee Wee... I don't wanna be the baby!"
greenber@utoday.UUCP (Ross M. Greenberg) (12/31/89)
Howzabout going through everyone's .newsrc, determining what groups are not being read by anyone and expiring them with extreme prejudice - nobody would notice, really. Next, expire articles that everybody has already read, starting with the least popular newsgroups (by frequency in .newsrc) and heading towards the most popular last. Only problem: some user who consistantly opts to not read a given newsgroups but doesn't mark it as read, either. -- Ross M. Greenberg, Technology Editor, UNIX Today! greenber@utoday.UUCP 594 Third Avenue, New York, New York, 10016 Voice:(212)-889-6431 BIX: greenber MCI: greenber CIS: 72461,3212 To subscribe, send mail to circ@utoday.UUCP with "Subject: Request"
tale@cs.rpi.edu (David C Lawrence) (12/31/89)
In article <1120@utoday.UUCP> greenber@utoday.UUCP (Ross M. Greenberg) writes: > Only problem: some user who consistantly opts to not read a given newsgroups > but doesn't mark it as read, either. That is not the only problem. Some people like to stay unsubscribed from groups but look in on them when they have some extra time. Additionally, there are those times when I see mention of an article in a group to which I am unsubscribed but it nevertheless interests me. Both of these scenarios are affected by the proposed expiry method. Dave -- (setq mail '("tale@cs.rpi.edu" "tale@ai.mit.edu" "tale@rpitsmts.bitnet"))
brad@looking.on.ca (Brad Templeton) (12/31/89)
Yes, if you kill (or don't feed) groups that nobody reads, then you lose the ability to resubscribe and have articles present. But so what? If saving disk space is important, then this is a small price to pay. It's a cute feature, but not worth megs of disk. And if you have gigs to spare, you don't have to play around with fancy expire tricks. I liked Eric Raymond's idea the best. Scan all the .newsrc files. 'and' together the 'read' bits. Expire those articles marked read. Thus once everybody's read it, it's history. Means you can't go back, but it also means you can save a *lot* of disk space. A slightly more relaxed scheme would and all the 'read' bits and queue the result for deletion in N days. Article stay N days after everybody has read them. N could vary by group. In addition, roots of big trees might stick around. Alternately, compress and archive all articles that have been read by everybody. Cut news disk space more than in half. Hard to do this with NNTP, though. But on a machine where disk space is important, like a PC, it's the way to go. -- Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473
woods@robohack.UUCP (Greg A. Woods) (12/31/89)
In article <432@texas.dk> storm@texas.dk (Kim F. Storm) writes: > woods@robohack.UUCP (Greg A. Woods) writes: > > >There could even be a flag to determine the > >effect on cross posted articles. Either the quickest, or the longest, > >expire could be used for all links, or each link could be expired > >separately, with space gained only upon expiration of the last link. > > I cannot see what benefits this should give you? > > Either you expire an article because disk-space is sparse (or due to > some other resource related policy), or you keep the article. And since inodes are also a resource, this is a resource conserving option. On many machines it would not be difficult to have a partition which would have lots of free blocks, and no free inodes if, for example, the average article size dropped from 3Kb to 1Kb. I also came to like this option for some of the reasons opposite to those you mentioned. I read news on what is primarily a single user site (this, my home machine). Some groups have had a significant amount of cross posting. Since I might want some groups to disappear faster than others, but not those articles which appeared in a more interesting group , I might choose the "longest" option. On the other hand, the cross postings might be mostly noise, as in unix-pc to comp.sys.att. In this case I might want to choose the "quickest". Since I beleive that a shift of the majority of news readers to smaller machines, with fewer fellow readers is happening, personal control over expiry will have many distinct advantages for an increasing number of people. -- Greg A. Woods woods@{robohack,gate,tmsoft,ontmoh,utgpu,gpu.utcs.Toronto.EDU,utorgpu.BITNET} +1 416 443-1734 [h] +1 416 595-5425 [w] VE3-TCP Toronto, Ontario; CANADA
smaug@eng.umd.edu (Kurt Lidl) (12/31/89)
In article <69654@looking.on.ca> brad@looking.on.ca (Brad Templeton) writes: [...] >I liked Eric Raymond's idea the best. Scan all the .newsrc files. 'and' >together the 'read' bits. Expire those articles marked read. Tough to do with a setup like ours -- about 250 readers, based on 8 different fileservers, with one common NNTP news server. Just getting to all the .newsrc's to read them is a small challange. >Thus once everybody's read it, it's history. Means you can't go back, but >it also means you can save a *lot* of disk space. This also means that you cannot go back and retreive an article that you have passed by. >A slightly more relaxed scheme would and all the 'read' bits and queue the >result for deletion in N days. Article stay N days after everybody has >read them. N could vary by group. In addition, roots of big trees might >stick around. This is a good idea. But I'm not sure how hard it would be to implement. >Hard to do this with NNTP, though. But on a machine where disk space is >important, like a PC, it's the way to go. This is very, very true. I have a hard enough time just trying to get some sort of statistics on how many read what groups in our setup. >Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473 -- /* Kurt J. Lidl (smaug@eng.umd.edu) | X Windows: Power Tools */ /* UUCP: uunet!eng.umd.edu!smaug | for Power Fools */
brian@radio.astro.utoronto.ca (Brian Glendenning) (01/01/90)
If you're going to do something like this you need to put the .newsrc format in some rfc. At least one newsreader (Gnews) does not use .newsrc files, it uses .gnewsrc files that are full of emacs lisp stuff. -- Brian Glendenning - Radio astronomy, University of Toronto brian@radio.astro.utoronto.ca uunet!utai!radio!brian glendenn@utorphys.bitnet
greenber@utoday.UUCP (Ross M. Greenberg) (01/01/90)
In article <`QF52&@rpi.edu> tale@cs.rpi.edu (David C Lawrence) writes: >That is not the only problem. Some people like to stay unsubscribed >from groups but look in on them when they have some extra time. >Additionally, there are those times when I see mention of an article >in a group to which I am unsubscribed but it nevertheless interests >me. Both of these scenarios are affected by the proposed expiry method. > Although I can appreciate that, at some point an SA simply has to draw the line and say "Not yet, sorry" when somebody wants a group. As SA at utoday, I recently pulled the plug on some high vol newsgroups due to disk space considerations. One person here complained, and I'll be reinstating that group as soon as I get the disk space problem resolved. Nobody said the job of SA was gonna be easy -- that's why we get the big bucks! :-) -- Ross M. Greenberg, Technology Editor, UNIX Today! greenber@utoday.UUCP 594 Third Avenue, New York, New York, 10016 Voice:(212)-889-6431 BIX: greenber MCI: greenber CIS: 72461,3212 To subscribe, send mail to circ@utoday.UUCP with "Subject: Request"
frank@ladc.bull.com (Frank Mayhar) (01/01/90)
In article <1989Dec30.212935.1570@smsc.sony.com> dce@Sony.COM (David Elliott) writes: >One thing I wonder about is the mechanism to use for grabbing the >subscriber info. You can't rely on .newsrc being used or being >available. On our network, for example, people read news using NFS, and >they may not even have accounts on the main news machine (they only >need it to post). Of course, we also have people who don't like the >idea of being in a network, so they read news on the main news >machine. In other words, I don't even have a good set of rules to >follow. Well, if you're talking about doing it *right*, one approach you could take would be to centralize the subscriber/subscription list. That is, keep all the .newsrc files in one place, for example in /usr/lib/news/subscribers, in the form of "<subscriber name>.newsrc" or something. Keep a copy of it in the subscriber's $HOME directory, force them to match whenever he runs his news reader. For NNTP readers, the subscriber file could be something like "machine:<login>.newsrc", and a new NNTP server command to get the subscriber's .newsrc. Certainly this would require some changes to the news readers, and to NNTP. But I think it would be worthwhile, and not too difficult to implement. -- Frank Mayhar frank@ladc.bull.com (..!{uunet,hacgate,rdahp}!ladcgw!frank) Bull HN Information Systems Inc. Los Angeles Development Center 5250 W. Century Blvd., LA, CA 90045 Phone: (213) 216-6241
moraes@cs.toronto.edu (Mark Moraes) (01/01/90)
Um, not everyone runs the same news configuration - on some sites, it is impossible to find out all the .newsrc files -- our news machine is server for close to 100 machines in this building, (maybe more -- I haven't an easy way of telling:-) all of which NFS-mount /news. (We use NFS for reading news, NNTP for posting news, and a continuously running NNTP for exchanging news:-) Many of the machines that NFS mount the partition are in ADMINSTRATIVELY separate domains. News maintainers do not have root, occasionally do not have accounts on all subscriber machines. I don't think we want to force every newsreader to be written to have a .newsrc either. (eg. I have a news scanning script that uses it's own files to keep track of what news it has scanned/forwarded. Many of the newsgroups it scans are not in my .newsrc) Or even worse, complicate the already, er, convoluted internals of most newsreaders to provide central subscriber lists. For us, centralizing .newsrcs is technically hard (we prefer less interdependency between our servers, not more) and politically impossible. The idea of putting any more load on our considerably overloaded news machine would not go over well -- a lot of effort has been put into trimming wasted CPU on that machine to keep performance bearable. Yes, I know, that's our problem. But I suspect we're not alone in running news on machines that have to perform other duties (Real Work) to earn their keep. It's much simpler to run pessimistic time-based expires on newsgroups that we consider less than vitally important. Mark --- "It's only netnews" -- Geoff Collyer, loosely paraphrasing Peter Honeyman.
tale@cs.rpi.edu (David C Lawrence) (01/01/90)
In article <89Dec31.171430est.2251@neat.cs.toronto.edu> moraes@cs.toronto.edu (Mark Moraes) writes: > For us, centralizing .newsrcs is technically hard (we prefer less > interdependency between our servers, not more) and politically > impossible. [ And other stuff about how it isn't such a hot idea at his site.] Indeed. This site is quite the same way; nevertheless the model is acceptable for many other sites on the net. I avoided bringing up the issue that it won't work with sites of our nature because that isn't entirely relevant. If a lot of people can benefit from expiry of the nature proposed, then it is useful work in spite of the fact that it isn't useful to us. The mistake, of course, would be to make this the only way expiry could be done. I don't recall seeing anyone make such a ludicrous suggestion as that though -- besides, the distributed sites could always just keep what we've got now. :-) Dave -- (setq mail '("tale@cs.rpi.edu" "tale@ai.mit.edu" "tale@rpitsmts.bitnet"))
brad@looking.on.ca (Brad Templeton) (01/01/90)
What would be really useful would be to define an extensible .newsrc format, just as the header format is extensible. There are many thinks I wanted to add to the .newsrc. So did RN. But you can't. Just the options line. So we get Rn's last and soft files etc. and my .newsrclas file. Let's define an extensible format, hack rn and readnews to understand it, and then everybody can use it. Before doing that it might be a good idea to consider if the 1-10,12,30-40 style is the best. It is a bit cumbersome, and requires memory re-allocs in the software. But it is reasonably compact for a non-binary format. Some extensions I have in mind are: a) RN wants to keep pointers into the active file b) I want to keep a 'last article filtered' counter. (if you do complex filtering on a high-volume group, you want your filter to run in the background, and have your reader only show you articles your filter has processed) c) Various folks would like flags on groups, pointers to files or options associated with the group. d) Eric was going to put message-ids to kill in the .newsrc e) readnews puts its own options there f) There might be more options than subscribed and unsubscribed. For example, filtered. I am sure people can think of others. Which is why you need an extensible format. Perhaps something simple like: groupname[:!] [fieldname=data;]* with fields delimited by something like colons or semicolons, and the default field (ie anything starting with a digit) is the 'seen articles list' -- thus degenerating to the current format. Leave : and ! as the delim after the group name, but add extra fields for other kinds of subscription. -- Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473
dricejb@drilex.UUCP (Craig Jackson drilex1) (01/02/90)
Might I point out that those sites with 100 machines mounting /news, or 100 machines accessing news via nntp, aren't going to benefit from newsrc-based expiration anyway? In a population as large as 100, the union of all newsgroups-subscribed-to is likely to be large. Also, somebody is probably out of town, so they're way behind in that group. It would seem to me that newsrc-based expiration would really only be interesting when there are fewer than 20 readers or so. As for the issue of NFS, NNTP, and wierd newsreaders: if 'newsrc' based expiration is desirable, it may be necessary (and useful) to mandate an additional means for indicating subscription and 'read' status. SMOP :-)... -- Craig Jackson dricejb@drilex.dri.mgh.com {bbn,axiom,redsox,atexnet,ka3ovk}!drilex!{dricej,dricejb}
woods@eci386.uucp (Greg A. Woods) (01/02/90)
In article <1989Dec31.083610.10649@robohack.UUCP> woods@robohack.UUCP (Greg A. Woods) writes: > In article <432@texas.dk> storm@texas.dk (Kim F. Storm) writes: > > woods@robohack.UUCP (Greg A. Woods) writes: > > >Either the quickest, or the longest, > > >expire could be used for all links, or each link could be expired > > >separately, with space gained only upon expiration of the last link. > > > > Either you expire an article because disk-space is sparse (or due to > > some other resource related policy), or you keep the article. > > And since inodes are also a resource, this is a resource conserving > option. Open mouth, insert foot. OOPS. My reasoning and deduction logic seems to have skipped a step. For some reason I'd forgotten that links only require a directory entry, and as such it'll take quite a few cross-post deletions, and a very smart filesystem, and a sudden halt in news flow before you'll ever gain any disk space! This was kindly pointed out to me in mail by Brad Templeton. The other reasons I mentioned are still relevant for those of us who don't really like using kill files, or can't. Kind of like having a truely global kill file! But I can see my argument for this option slowly breaking down.... I would also like to point out that having newsrun do the expire's is not of much help for those of us who run rnews.immed. By the time newsrun gets going, it's too late. Besides, having the input handler's manage disk space is confusing the functionality. If you have space problems, use newswatch to look out for such conditions and do something about them. That's what (I assume) it's for. -- Greg A. Woods woods@{eci386,gate,robohack,ontmoh,tmsoft,gpu.utcs.UToronto.CA,utorgpu.BITNET} +1-416-443-1734 [h] +1-416-595-5425 [w] VE3-TCP Toronto, Ontario CANADA
amanda@mermaid.intercon.com (Amanda Walker) (01/03/90)
In article <69903@looking.on.ca>, brad@looking.on.ca (Brad Templeton) writes: > Let's define an extensible format, hack rn and readnews to understand it, > and then everybody can use it. There seems to be an underlying assumption here. Fewer and fewer people are using rn, readnews, or even UNIX-based news readers. Rather than trying to infer information from an ever-muddier environment by rooting through .{news,gnus,gnews,...}rc files, maybe a better approach would be to make it explicit. Once you start introducing news reading via NFS, NNTP, PCMAIL, or whatever, the difficulty of picking up readership information "for free" starts to skyrocket. Horsepower-poor sites, which are where the biggest crunches are occuring, are exactly the same sites that are most likely to start distributing the load via the approaches above, and thus will have the hardest time picking up readership information for free. Look at the arguments about Arbitron's accuracy these days... Maybe news reading needs to become more transaction oriented; I don't know. Various people have done hacks to NNTP that show that it's at least a fruitful approach, and it keeps things simple by not requiring news *reading* software to store and maintain information required by the database *maintenance* software (inews, expire, et al.). Keeping the two separate is a good thing, IMHO. Saves headaches all around. But, if you really want to define an extensible format, how about not reinventing the wheel too much--something like a printable-ASCII (so you can edit it with a text editor if necessary), easily-parsable block structured thing, such as a printable version of ASN.1. This way, programs can skip over new things that they don't know about, without having to be recompiled every time somebody adds Yet Another Flag or Option. $.02, Amanda Walker InterCon Systems Corporation --
woods@robohack.UUCP (Greg A. Woods) (01/03/90)
In article <7171@drilex.UUCP> dricejb@drilex.UUCP (Craig Jackson drilex1) writes: > Might I point out that those sites with 100 machines mounting /news, or 100 > machines accessing news via nntp, aren't going to benefit from newsrc-based > expiration anyway? In a population as large as 100, the union of all > newsgroups-subscribed-to is likely to be large. Also, somebody is probably > out of town, so they're way behind in that group. > > It would seem to me that newsrc-based expiration would really only be > interesting when there are fewer than 20 readers or so. And here, where there are fewer than 10 readers, one of them being me, I don't want to have newsrc driven expires. I want a space based, goal driven, expire! 0.25 :-) I think such a beast would also be quite useful for both large and small sites. In looking at the references line I'd guess that only 1/2 of the participants in this thread are on systems running C News. I think this shows that the problem is with the basic idiom behind the expire control file, not with any particular expire. I think all of this discussion has been interesting, but it has wandered far from what seems practical and feasible, at least in the short term. Once we get expire to work in the way that we (I) have been thinking about news expiry since day one, then maybe we can think of ways to control this new expire to suite the local culture. Since there was such a volume of discussion on this topic, I will assume that I'm not the only one not happy with the current state of affairs. Since I have also spent some time inside C News, (working on porting, installation features, and tuning, since the alpha version), I will think about implementing a scheme similar to what I discribed. I won't guarantee I'll get anywhere, as I have several dozen projects on the go now, but I'll try. If anyone has any really terrific ideas for a space based, goal driven, expire, let me know. If anyone is already doing this, please let me know. -- Greg A. Woods woods@{robohack,gate,tmsoft,ontmoh,utgpu,gpu.utcs.Toronto.EDU,utorgpu.BITNET} +1 416 443-1734 [h] +1 416 595-5425 [w] VE3-TCP Toronto, Ontario; CANADA
zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) (01/03/90)
Given that you can't count on there being .newsrc files and you don't want to modify the news readers, a remaining option is to have a program that watches the access and modification times of the articles and gradually learns what groups are being read. It's not to hard to determine that if an article has been around for many days and the access_time = mod_time then it's likely that no one is reading the group. -- Jon Zeeff zeeff@b-tech.ann-arbor.mi.us or b-tech!zeeff
henry@utzoo.uucp (Henry Spencer) (01/04/90)
In article <1990Jan2.152917.15117@eci386.uucp> woods@eci386.UUCP (Greg A. Woods) writes: >I would also like to point out that having newsrun do the expire's is >not of much help for those of us who run rnews.immed. By the time >newsrun gets going, it's too late... Well, not necessarily. If you are running with small or zero margins, then yes, you're in trouble if you blow them even slightly... but with substantial and well-chosen margins (notably, "articles" margin less than "incoming" margin, so that newsrun notices trouble before rnews starts throwing away files), it still makes sense. >Besides, having the input >handler's manage disk space is confusing the functionality... Disk space is one of those ugly global issues that really has to be everybody's job. The "right" solution is just to have enough reserve space that nobody ever has to worry about it, but many systems don't have that luxury. >If you >have space problems, use newswatch to look out for such conditions and >do something about them. That's what (I assume) it's for. Actually, newswatch was motivated by the discovery that since C News stuff is very patient about waiting for locks, a locking problem could go unnoticed for a long time. However, using it to keep an eye on space problems is not unreasonable. -- 1972: Saturn V #15 flight-ready| Henry Spencer at U of Toronto Zoology 1990: birds nesting in engines | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
amanda@mermaid.intercon.com (Amanda Walker) (01/04/90)
In article <NVHHF|@b-tech.uucp>, zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) writes: > It's not to hard to > determine that if an article has been around for many days and the > access_time = mod_time then it's likely that no one is reading the > group. Now that's a nice idea. I like it. Since every news reader, whether local or NNTP, has to actually read the article file at some point, this shouldn't either break existing readers or be broken in turn by news ones. Of course, you only get a "read/not read" result, not a measure of how popular a group is, but then again expiration policy should not necessarily be tied directly to popularity. Amanda Walker InterCon Systems Corporation --
zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) (01/04/90)
>want to modify the news readers, a remaining option is to have a >program that watches the access and modification times of the articles >and gradually learns what groups are being read. It's not to hard to Before someone points this out to me, I do realize that you have to account for other accesses to the articles (eg. outgoing feeds). -- Jon Zeeff zeeff@b-tech.ann-arbor.mi.us or b-tech!zeeff
brad@looking.on.ca (Brad Templeton) (01/04/90)
Nice idea, but lots of programs run around accessing articles. Old expire for one. And anybody who decides to do a search of the whole News database. (Which I do with newsclip programs from time to time.) -- Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473
zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) (01/05/90)
>for a space based, goal driven, expire, let me know. If anyone is >already doing this, please let me know. The rnews.c I posted does progressive expires based on disk space. It does keep the disk very close to being full without ever getting full. It's for C News and so it does allow the flexible per group specification for expiration time. The things I'd like to see are greater efficiency (a one pass system) and more smarts about what groups are being read. -- Jon Zeeff zeeff@b-tech.ann-arbor.mi.us or b-tech!zeeff
zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) (01/05/90)
Re: judging news readership based on article access times >Nice idea, but lots of programs run around accessing articles. Old >expire for one. And anybody who decides to do a search of the whole >News database. (Which I do with newsclip programs from time to time.) These programs (and outgoing feeds) tend to access the whole news database so if you are using some kind of score based system, it wouldn't effect the outcome. You could reset the mod time for accesses that don't count. It's not a very pure form of information but looking at it often and long enough would probably provide correct conclusions about interest in a group. -- Jon Zeeff zeeff@b-tech.ann-arbor.mi.us or b-tech!zeeff
fmayhar@ladc.bull.com (Frank Mayhar) (01/09/90)
OK, Henry, I concede that it's effectively impossible to change *all* the newsreaders in existence. And it's impractical to store all .newsrc files in one central location. Still, it should be possible to keep a list of subscribers, the machines that they live on, the last article they've seen in each group, and the time they saw it. If you do it right (e.g. by constructing a set of library routines to maintain the stuff), you should be able to retrofit this into existing newsreaders, and into NNTP. Ignore any entries that have "expired," i.e. their last access time is too long ago. Retrofit this into NNTP, rn, and a couple of the other most popular readers, and run with it. If a sysadmin has a newsreader that doesn't support the subscription list, and he wants it, he can add it; he has the libraries that support it. This would solve the problem of having enough information for a goal-driven expire, and of running arbitron in a distributed environment. How's that? -- Frank Mayhar fmayhar@ladc.bull.com (..!{uunet,hacgate,rdahp}!ladcgw!fmayhar) Bull HN Information Systems Inc. Los Angeles Development Center 5250 W. Century Blvd., LA, CA 90045 Phone: (213) 216-6241
wayne@dsndata.uucp (Wayne Schlitt) (01/10/90)
In article <1990Jan8.230624.8684@ladc.bull.com> fmayhar@ladc.bull.com (Frank Mayhar) writes: > > Still, it should be possible to keep a list of > subscribers, the machines that they live on, the last article they've seen > in each group, and the time they saw it. If you do it right (e.g. by > constructing a set of library routines to maintain the stuff), you should > be able to retrofit this into existing newsreaders, and into NNTP. > [ ... ] excellent idea. of course, you should make sure your elisp library doesnt use any feature more recent than 18.40 or so. you wouldnt want to cause people with old versions of emacs to have too many problems using your routines. :-> (yes folks, the _only_ two news readers that i have ever used have been written in emacs lisp. i am so happy with gnus that i doubt that i would ever spend the time to switch to another reader, even if it was "better"...) -wayne
fmayhar@ladc.bull.com (Frank Mayhar) (01/11/90)
In article <WAYNE.90Jan9160339@dsndata.uucp> wayne@dsndata.uucp (Wayne Schlitt) writes: >[sarcasm deleted] >(yes folks, the _only_ two news readers that i have ever used have >been written in emacs lisp. i am so happy with gnus that i doubt that >i would ever spend the time to switch to another reader, even if it >was "better"...) All this means is that gnus won't use the subscription capability right away. When someone decides to add it, it will. If that person is you, so much the better. But the capability will be there, in NNTP and (possibly) in the news maintenance mechanism, to support it when you're ready for it. Just because it's not easily feasible to add the capability to *every* news reader *immediately* is no reason to not design it and implement it in *some* news readers. When system administrators need it, it will be there (in an RFC, perhaps, and in a C library), and they can add it to the readers that they and their users use. Over time, most of the commonly-used news readers will pick it up. And any new ones can have it designed into them. The thing about subscription lists is that, once you have them, it's possible to do other things, like restricting certain newsgroups to certain subscribers (or not allowing certain users to subscribe to certain newsgroups), or collecting better readership statistics, or goal-driven expires, or several other useful things. Certainly, the end user doesn't get very much from the capability, but that's not the point, is it? It's the sysadmin that needs it. -- Frank Mayhar fmayhar@ladc.bull.com (..!{uunet,hacgate,rdahp}!ladcgw!fmayhar) Bull HN Information Systems Inc. Los Angeles Development Center 5250 W. Century Blvd., LA, CA 90045 Phone: (213) 216-6241