[news.software.b] Speed tradeoffs in C-news?

alexis@panix.uucp (Alexis Rosen) (06/28/91)

Currently we are getting almost a thousand articles per day in marginal
newsgroups that we're not interested in. We let them fall into junk, and
expire in 3 days. This is OK, but I'm curious about a speed issue.

What takes more time, for a large feed (~20MB/day)?
1) Process 2-3 MB of junk articles per day
2) Change the "ME" sys file line to trash these hierarchies. We'd basically
   have to add about a dozen "!hierarchy," to the groups field.

Option 2 would save time on writing junk articles to disk, but would take
up more time for every article by increasing the time to compare each
article's group header to the sys file line.

I understand that the answer would depend entirely on the reletive speeds of
our CPU and disks. Figure that our performance profile probably isn't vastly
different from a Sun 3/60 with a decent local disk.

Thanks,
---
Alexis Rosen
Owner/Sysadmin, PANIX Public Access Unix, NY
alexis@panix.com
{cmcl2,apple}!panix!alexis

henry@zoo.toronto.edu (Henry Spencer) (06/29/91)

In article <1991Jun28.105916.806@panix.uucp> alexis@panix.uucp (Alexis Rosen) writes:
>What takes more time, for a large feed (~20MB/day)?
>1) Process 2-3 MB of junk articles per day
>2) Change the "ME" sys file line to trash these hierarchies. We'd basically
>   have to add about a dozen "!hierarchy," to the groups field.

I'd be very surprised if (2) wasn't substantially cheaper.  The newsgroup-
matching code that does those comparisons got a lot of attention and is
quite fast.  Doing filesystem operations (e.g. to file articles) is costly,
especially the filename lookups (yes, even on systems with namei caches).
-- 
Lightweight protocols?  TCP/IP *is*     | Henry Spencer @ U of Toronto Zoology
lightweight already; just look at OSI.  |  henry@zoo.toronto.edu  utzoo!henry

alexis@panix.uucp (Alexis Rosen) (06/29/91)

henry@zoo.toronto.edu (Henry Spencer) writes:
>alexis@panix.uucp (Alexis Rosen) writes:
>>What takes more time, for a large feed (~20MB/day)?
>>1) Process 2-3 MB of junk articles per day
>>2) Change the "ME" sys file line to trash these hierarchies. We'd basically
>>   have to add about a dozen "!hierarchy," to the groups field.
>
>I'd be very surprised if (2) wasn't substantially cheaper. [...]

Great. Now, for option #3, which I forgot (and someone graciously reminded me
of via email):
3) Trash the individual groups via active file type-x entries.

This is likely, at first guess, to be much closer to #2, but I'm not sure it
wouldn't still be a bit slower. It also has the double-edged "feature" of
requiring you to x-ify each new subgroup as it's created. So you need to
intervene by hand. On the other hand, you might decide you _were_ interested
in that particular sub-group, after all.

Thanks,
---
Alexis Rosen
Owner/Sysadmin, PANIX Public Access Unix, NYC
alexis@panix.com
{cmcl2,apple}!panix!alexis