[news.software.b] Newsgroup hierarchy and sys file confusion

fitz@wang.com (Tom Fitzgerald) (03/02/90)

Apparently C-news handles newsgroup name matching in sys files differently
than B-news did, and I'm not sure how to get the old behavior back.  What
I'm trying to do is feed a parent group and one of its subgroups, without
feeding any of the other subgroups of the parent.

Just as a purely theoretical example, suppose someone wanted alt.sex and
alt.sex.bondage, but didn't want to get alt.sex.masturbation, ...bestiality,
...carasso or whatever anyone is likely to create next week.  Under B-news
I did that with a sys file entry:

...,alt.sex,!alt.sex.all,alt.sex.bondage,...

and everything worked fine.  Under C-news, alt.sex.bondage doesn't show up.
A quick eyeballing of ngmatch.c made it look like the groupname match with
the largest number of parts is the one that controls the feed (which makes
sense).  But *.all has the same effective length as *.individual and ties
between a yes vote and a no vote are decided in favor of the no vote, so
the group doesn't get through.

I can solve it in the short run with a much larger sys file entry:

...,alt.sex,!alt.sex.masturbation,!alt.sex.bestiality,!alt.sex.carasso,...

which will work until someone creates alt.sex.henry.spencer and it gets
passed on until I get around to changing the sys file to shut it off.  Also,
I'm already running up against awk limits with my sys file entries, and I'd
rather not make them any longer.

Any guesses?

---
Tom Fitzgerald   fitz@wang.com
Wang Labs        ...!uunet!wang!fitz
Lowell MA, USA   1-508-967-5278

geoff@utstat.uucp (Geoff Collyer) (03/02/90)

>...,alt.sex,!alt.sex.all,alt.sex.bondage,...

Hmmm, that's a nasty case that I hadn't considered.  I don't think you
can easily get what you want with the current ngmatch; sorry.  It sure
would be nice to have a specification for newsgroup matching.  In the
absence of one, I need to think harder about what to do.
-- 
Geoff Collyer		utzoo!utstat!geoff, geoff@utstat.toronto.edu

flee@shire.cs.psu.edu (Felix Lee) (03/02/90)

>...,alt.sex,!alt.sex.all,alt.sex.bondage,...

I wrote an &ngmatch function in perl that handles this "correctly".
It uses the best match, where "best" is defined as the one with the
least number of pieces matching "all".

The other thing B news matches that C news doesn't is patterns like
"comp.all.binaries".  (My perl function doesn't handle this yet.)
--
Felix Lee	flee@shire.cs.psu.edu	*!psuvax1!flee

henry@utzoo.uucp (Henry Spencer) (03/03/90)

In article <1990Mar1.225310.6802@wang.com> fitz@wang.com (Tom Fitzgerald) writes:
>Apparently C-news handles newsgroup name matching in sys files differently
>than B-news did, and I'm not sure how to get the old behavior back...

The problem is that there has never been a coherent and precise spec for
newsgroup matching.  The definition we used for C News was the best we
could come up with.

>... feed a parent group and one of its subgroups, without
>feeding any of the other subgroups of the parent.

That's one we hadn't thought of.  Sigh.  We'll have to go back and take
a look to see if we can improve this.
-- 
MSDOS, abbrev:  Maybe SomeDay |     Henry Spencer at U of Toronto Zoology
an Operating System.          | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

nagel@ics.uci.edu (Mark Nagel) (03/03/90)

flee@shire.cs.psu.edu (Felix Lee) writes:

>>...,alt.sex,!alt.sex.all,alt.sex.bondage,...

>I wrote an &ngmatch function in perl that handles this "correctly".
>It uses the best match, where "best" is defined as the one with the
>least number of pieces matching "all".

>The other thing B news matches that C news doesn't is patterns like
>"comp.all.binaries".  (My perl function doesn't handle this yet.)

Here's one that does, but doesn't count the number of "all" matches
(I use this in my local backend inews).  Maybe we should merge our
versions?

sub ngmatch {
  local($ng,$pt) = @_;
  local(@ng_w,@pt_w);

  #
  # immediate match?
  #
  return 1 if ($ng eq $pt || $pt eq "all" || $pt eq "backbone");

  #
  # check for match on each word, using "all" as a wildcard
  #
  @ng_w = split(/\./, $ng);
  @pt_w = split(/\./, $pt);
  while ($#ng_w >= 0) {
    if ($#pt_w < 0) {
      shift(ng_w);
    } elsif ($ng_w[0] eq $pt_w[0] || $pt_w[0] eq "all") {
      shift(pt_w) if ($pt_w[0] ne "all" || $#pt_w >= $#ng_w);
      shift(ng_w);
    } else {
      return 0;
    }
  }
  return ($#ng_w == $#pt_w);
}
--
Mark Nagel
UC Irvine Department of ICS   +----------------------------------------+
ARPA: nagel@ics.uci.edu       | Six plus six equals fourteen for large |
UUCP: ucbvax!ucivax!nagel     | values of six -- Dave Ackerman         |

flee@shire.cs.psu.edu (Felix Lee) (03/03/90)

I was mistaken when I wrote:
>The other thing B news matches that C news doesn't is patterns like
>"comp.all.binaries".

This was because I had used "gngp" without the "-a" flag on the active
file, which produces anomalous results because the stuff after the
newsgroup name is treated as part of the last token.  So "all.test"
doesn't match "misc.test xxxx" since "test" doesn't match "test xxxx".
--
Felix Lee	flee@shire.cs.psu.edu	*!psuvax1!flee

brad@looking.on.ca (Brad Templeton) (03/03/90)

It would have been nice to have regular expressions instead of the
odd patterns we now have.  (Curse whoever decided that 'all' would be the
pattern that matches anything.)

Of course, the large number of dots in group names would make regexps ugly.

Ever thought that rec.sport.baseball matches rec.sport.baseboard?
-- 
Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473

henry@utzoo.uucp (Henry Spencer) (03/04/90)

In article <106190@looking.on.ca> brad@looking.on.ca (Brad Templeton) writes:
>It would have been nice to have regular expressions instead of the
>odd patterns we now have.  (Curse whoever decided that 'all' would be the
>pattern that matches anything.) ...
>Ever thought that rec.sport.baseball matches rec.sport.baseboard?

No, because it doesn't.  The "all" is magic only when it's an entire name
component.  That's what the relevant documents imply, that's what B News
did I think, and that's certainly what C News does.
-- 
MSDOS, abbrev:  Maybe SomeDay |     Henry Spencer at U of Toronto Zoology
an Operating System.          | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

peter@ficc.uu.net (Peter da Silva) (03/04/90)

In article <106190@looking.on.ca> brad@looking.on.ca (Brad Templeton) writes:
> Of course, the large number of dots in group names would make regexps ugly.

I think in this context shell wildcards would do as well as regexps.
-- 
 _--_|\  Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>.
/      \
\_.--._/ Xenix Support -- it's not just a job, it's an adventure!
      v  "Have you hugged your wolf today?" `-_-'