alexis@panix.uucp (Alexis Rosen) (06/28/91)
Currently we are getting almost a thousand articles per day in marginal
newsgroups that we're not interested in. We let them fall into junk, and
expire in 3 days. This is OK, but I'm curious about a speed issue.
What takes more time, for a large feed (~20MB/day)?
1) Process 2-3 MB of junk articles per day
2) Change the "ME" sys file line to trash these hierarchies. We'd basically
have to add about a dozen "!hierarchy," to the groups field.
Option 2 would save time on writing junk articles to disk, but would take
up more time for every article by increasing the time to compare each
article's group header to the sys file line.
I understand that the answer would depend entirely on the reletive speeds of
our CPU and disks. Figure that our performance profile probably isn't vastly
different from a Sun 3/60 with a decent local disk.
Thanks,
---
Alexis Rosen
Owner/Sysadmin, PANIX Public Access Unix, NY
alexis@panix.com
{cmcl2,apple}!panix!alexishenry@zoo.toronto.edu (Henry Spencer) (06/29/91)
In article <1991Jun28.105916.806@panix.uucp> alexis@panix.uucp (Alexis Rosen) writes: >What takes more time, for a large feed (~20MB/day)? >1) Process 2-3 MB of junk articles per day >2) Change the "ME" sys file line to trash these hierarchies. We'd basically > have to add about a dozen "!hierarchy," to the groups field. I'd be very surprised if (2) wasn't substantially cheaper. The newsgroup- matching code that does those comparisons got a lot of attention and is quite fast. Doing filesystem operations (e.g. to file articles) is costly, especially the filename lookups (yes, even on systems with namei caches). -- Lightweight protocols? TCP/IP *is* | Henry Spencer @ U of Toronto Zoology lightweight already; just look at OSI. | henry@zoo.toronto.edu utzoo!henry
alexis@panix.uucp (Alexis Rosen) (06/29/91)
henry@zoo.toronto.edu (Henry Spencer) writes: >alexis@panix.uucp (Alexis Rosen) writes: >>What takes more time, for a large feed (~20MB/day)? >>1) Process 2-3 MB of junk articles per day >>2) Change the "ME" sys file line to trash these hierarchies. We'd basically >> have to add about a dozen "!hierarchy," to the groups field. > >I'd be very surprised if (2) wasn't substantially cheaper. [...] Great. Now, for option #3, which I forgot (and someone graciously reminded me of via email): 3) Trash the individual groups via active file type-x entries. This is likely, at first guess, to be much closer to #2, but I'm not sure it wouldn't still be a bit slower. It also has the double-edged "feature" of requiring you to x-ify each new subgroup as it's created. So you need to intervene by hand. On the other hand, you might decide you _were_ interested in that particular sub-group, after all. Thanks, --- Alexis Rosen Owner/Sysadmin, PANIX Public Access Unix, NYC alexis@panix.com {cmcl2,apple}!panix!alexis