alexis@panix.uucp (Alexis Rosen) (06/28/91)
Currently we are getting almost a thousand articles per day in marginal newsgroups that we're not interested in. We let them fall into junk, and expire in 3 days. This is OK, but I'm curious about a speed issue. What takes more time, for a large feed (~20MB/day)? 1) Process 2-3 MB of junk articles per day 2) Change the "ME" sys file line to trash these hierarchies. We'd basically have to add about a dozen "!hierarchy," to the groups field. Option 2 would save time on writing junk articles to disk, but would take up more time for every article by increasing the time to compare each article's group header to the sys file line. I understand that the answer would depend entirely on the reletive speeds of our CPU and disks. Figure that our performance profile probably isn't vastly different from a Sun 3/60 with a decent local disk. Thanks, --- Alexis Rosen Owner/Sysadmin, PANIX Public Access Unix, NY alexis@panix.com {cmcl2,apple}!panix!alexis
henry@zoo.toronto.edu (Henry Spencer) (06/29/91)
In article <1991Jun28.105916.806@panix.uucp> alexis@panix.uucp (Alexis Rosen) writes: >What takes more time, for a large feed (~20MB/day)? >1) Process 2-3 MB of junk articles per day >2) Change the "ME" sys file line to trash these hierarchies. We'd basically > have to add about a dozen "!hierarchy," to the groups field. I'd be very surprised if (2) wasn't substantially cheaper. The newsgroup- matching code that does those comparisons got a lot of attention and is quite fast. Doing filesystem operations (e.g. to file articles) is costly, especially the filename lookups (yes, even on systems with namei caches). -- Lightweight protocols? TCP/IP *is* | Henry Spencer @ U of Toronto Zoology lightweight already; just look at OSI. | henry@zoo.toronto.edu utzoo!henry
alexis@panix.uucp (Alexis Rosen) (06/29/91)
henry@zoo.toronto.edu (Henry Spencer) writes: >alexis@panix.uucp (Alexis Rosen) writes: >>What takes more time, for a large feed (~20MB/day)? >>1) Process 2-3 MB of junk articles per day >>2) Change the "ME" sys file line to trash these hierarchies. We'd basically >> have to add about a dozen "!hierarchy," to the groups field. > >I'd be very surprised if (2) wasn't substantially cheaper. [...] Great. Now, for option #3, which I forgot (and someone graciously reminded me of via email): 3) Trash the individual groups via active file type-x entries. This is likely, at first guess, to be much closer to #2, but I'm not sure it wouldn't still be a bit slower. It also has the double-edged "feature" of requiring you to x-ify each new subgroup as it's created. So you need to intervene by hand. On the other hand, you might decide you _were_ interested in that particular sub-group, after all. Thanks, --- Alexis Rosen Owner/Sysadmin, PANIX Public Access Unix, NYC alexis@panix.com {cmcl2,apple}!panix!alexis