[alt.sources] An AWK script to check "junk" for newsgroups

Sepp@ppcger.ppc.sub.org (Josef Wolf) (02/20/91)

dglo@ADS.COM (Dave Glowacki) writes:

] Since, as a rule, EVERY C or shell program posted must be followed up
] by a PERL script, here's my version of NEWJUNK.

Well. Fine fine. But what about using standard-tools? Which *IX is
_delivered_ with Perl?

Now here is my version of NEWJUNK. It could have been better, but older
versions of gawk have these ugly memory-leak, so you have to sort out
the 'Newsgroups:' lines and pipe it into gawk :-(

The awk-version will most likely be slower than the C-version and the
Perl-Version, but it schould run on most *IX with little modifications.

This version uses 3 config files:
/usr/lib/news/newjunk.active    this are the newsgroups, I am interested
/usr/lib/news/newjunk.trash     to throw away the entire Newsgroups:-line
/usr/lib/news/newjunk.junk      Newsgroups, I don't want

In the config-files you can use regular expressions. Here is my
newjunk.active, for example:

---- snipp ----
# newjunk.active
#
# these are the newsgroups I want to have complete, if they will be
# found in junk
^comp\.sys\..*
^comp\.os\..*
^comp\.mail.*
^dnet\..*
^eunet\..*
^mnet\..*
^ppc\..*
^sub\..*
#               I want all.sources.all
.*\.sources.*
#               and all.os9.all
.*\.os9\..*
---- snipp ----

Here goes newjunk.awk. Just pipe all '^Newsgroups:' into 'awk -f newjunk.awk'

---- snipp---
BEGIN {

# read in active
  FS = ":";
#      ^^^  my news-system needs this one
  while (getline <"/usr/lib/news/active" > 0)
    if (length ($1))
      active [activecount++] = $1;

# read in config files
  while (getline tmp <"/usr/lib/news/newjunk.active" > 0)
    if (length (tmp) && !match (tmp, "^#"))
      nactive [nactivecount++] = tmp;

  while (getline tmp <"/usr/lib/news/newjunk.trash" > 0)
    if (length (tmp) && !match (tmp, "^#"))
      trash [trashcount++] = tmp;

  while (getline tmp <"/usr/lib/news/newjunk.junk" > 0)
    if (length (tmp) && !match (tmp, "^#"))
      junk [junkcount++] = tmp;

  FS = ",";
# newsgroups are separated with kommas
}

function insert_newsgroup(ng) {

# if newsgroup is alraedy inserted, we can save some time
  for (k = 0; k < newcount; k++)
    if (ng == newgroups [k])
      return;

# skip newsgroup if it is already active
  for (k = 0; k < activecount; k++)
    if (ng == active [k])
      return;

# insert newsgroup
  newgroups [newcount++] = ng;
}


// {

# check every newsgroup given in input line
  for (j = 1; j <= NF; j++) {

# do we want this newsgroup?
    for (i = 0; i < nactivecount; i++) {
      if (match ($j, nactive [i])) {
        insert_newsgroup($j);
#        break;
# don't know why I get some bus-error at this break -- sigh!
# but the script runs without this too (grinn :-)
      }
    }

# is there any trash-newsgroup?
    for (i = 0; i < trashcount; i++)
      if (match ($j, trash [i]))
        next;

# no trash-groups -> sort out the junk-newsgroups
    to_insert_count = 0;
    for (i = 0; i < junkcount; i++)
      if (!match ($j, junk [i]))
        to_insert [to_insert_count++] = $j;
  }

# insert them now
  for (i = 0; i < to_insert_count; i++)
    insert_newsgroup(to_insert [i]);
}

END {
  for (i = 0; i < newcount; i++) {
# insert the command for YOUR inews here
    cmd = "inews -ad=local '-c=newgroup:" newgroups[i] "' </nil";
    system (cmd);
    print newgroups [i];
  }
}
---- snipp ----

Greetings
        Sepp

Disclaimer: I had no time to make much tests of this version of newjunk.awk 
            If there are bugs, please let me know :-)

| Josef Wolf, Germersheim, Germany | +49 7274 8047  -24 Hours- (call me :-)
| ...!ira.uka.de!smurf!ppcger!sepp | +49 7274 8048  -24 Hours-
|     sepp@ppcger.ppc.sub.org      | +49 7274 8967  18:00-8:00, Sa + Su 24h
|  "is there anybody out there?"   | all lines 300/1200/2400 bps 8n1

lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (02/22/91)

In article <u4+nMD@ppcger.ppc.sub.org> Sepp@ppcger.ppc.sub.org (Josef Wolf) writes:
: dglo@ADS.COM (Dave Glowacki) writes:
: 
: ] Since, as a rule, EVERY C or shell program posted must be followed up
: ] by a PERL script, here's my version of NEWJUNK.
: 
: Well. Fine fine. But what about using standard-tools? Which *IX is
: _delivered_ with Perl?

Convex's OS.  BSD 4.4 will have it.  It's included with GNU.  Eventually
everyone will include it, I suspect.

: Now here is my version of NEWJUNK. It could have been better, but older
: versions of gawk have these ugly memory-leak, so you have to sort out
: the 'Newsgroups:' lines and pipe it into gawk :-(

Well. Fine fine. But what about using standard-tools? Which *IX is
_delivered_ with gawk?  :-)

: The awk-version will most likely be slower than the C-version and the
: Perl-Version, but it schould run on most *IX with little modifications.

Right.

On my Vax:

$ awk -f newjunk.awk
awk -f newjunk
awk: syntax error near line 6
awk: illegal statement near line 6
awk: syntax error near line 11
awk: illegal statement near line 11
awk: syntax error near line 12
awk: illegal statement near line 12
awk: syntax error near line 15
awk: illegal statement near line 15
awk: syntax error near line 16
awk: illegal statement near line 16
awk: syntax error near line 19
awk: illegal statement near line 19
awk: syntax error near line 20
awk: illegal statement near line 20
awk: syntax error near line 27
awk: bailing out near line 27

On my Sun:

$ nawk -f newjunk.awk
nawk: empty regular expression
 source line number 51
 context is
         >>> // <<<  {


Your "standard" tools ain't so standard.   :-(

Larry

Sepp@ppcger.ppc.sub.org (Josef Wolf) (02/25/91)

lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) writes:

] In article <u4+nMD@ppcger.ppc.sub.org> Sepp@ppcger.ppc.sub.org
] (Josef Wolf) writes:
] : Well. Fine fine. But what about using standard-tools? Which *IX is
] : _delivered_ with Perl?

] Convex's OS.  BSD 4.4 will have it.  It's included with GNU.  Eventually
] everyone will include it, I suspect.

C'mon! Get your feet back on earth. I'm sure there are _some_ people out
in net.lands who don't want to buy a new OS just to get the fun of Perl.

] : Now here is my version of NEWJUNK. It could have been better, but older
] : versions of gawk have these ugly memory-leak, so you have to sort out
] : the 'Newsgroups:' lines and pipe it into gawk :-(

] Well. Fine fine. But what about using standard-tools? Which *IX is
] _delivered_ with gawk?  :-)

Why don't you use awk ? ;-)

] : The awk-version will most likely be slower than the C-version and the
] : Perl-Version, but it schould run on most *IX with little modifications.
                                                      ^^^^^^

] Right.
] On my Vax:
] $ awk -f newjunk.awk
[ some error messages deleted ]

Well, I said 'with _little_ modifications'. Not 'without modifications'...

] On my Sun:

] $ nawk -f newjunk.awk
] nawk: empty regular expression
]  source line number 51
]  context is
]          >>> // <<<  {


Waht about this one:
/.*/    {

] Your "standard" tools ain't so standard.   :-(

But much more standard than Perl.

Greets

    Sepp

| Josef Wolf, Germersheim, Germany | +49 7274 8047  -24 Hours- (call me :-)
| ...!ira.uka.de!smurf!ppcger!sepp | +49 7274 8048  -24 Hours-
|     sepp@ppcger.ppc.sub.org      | +49 7274 8967  18:00-8:00, Sa + Su 24h
|  "is there anybody out there?"   | all lines 300/1200/2400 bps 8n1

lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (02/27/91)

In article <NE-4OD@ppcger.ppc.sub.org> Sepp@ppcger.ppc.sub.org (Josef Wolf) writes:
: lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) writes:
: 
: ] In article <u4+nMD@ppcger.ppc.sub.org> Sepp@ppcger.ppc.sub.org
: ] (Josef Wolf) writes:
: ] : Well. Fine fine. But what about using standard-tools? Which *IX is
: ] : _delivered_ with Perl?
: 
: ] Convex's OS.  BSD 4.4 will have it.  It's included with GNU.  Eventually
: ] everyone will include it, I suspect.
: 
: C'mon! Get your feet back on earth. I'm sure there are _some_ people out
: in net.lands who don't want to buy a new OS just to get the fun of Perl.

Who ever said they had to buy a new OS to get the fun of Perl?  I fear one
of your inclusive ORs turned into an exclusive OR while you weren't looking.

: ] : Now here is my version of NEWJUNK. It could have been better, but older
: ] : versions of gawk have these ugly memory-leak, so you have to sort out
: ] : the 'Newsgroups:' lines and pipe it into gawk :-(
: 
: ] Well. Fine fine. But what about using standard-tools? Which *IX is
: ] _delivered_ with gawk?  :-)
: 
: Why don't you use awk ? ;-)

I do.  When appropriate.

: ] : The awk-version will most likely be slower than the C-version and the
: ] : Perl-Version, but it schould run on most *IX with little modifications.
:                                                       ^^^^^^
: 
: ] Right.
: ] On my Vax:
: ] $ awk -f newjunk.awk
: [ some error messages deleted ]
: 
: Well, I said 'with _little_ modifications'. Not 'without modifications'...

You used a bunch of features that weren't even available in "standard" awk,
such as user-defined functions, match() and getline with redirections.
I don't think that counts as "little modification".

: ] On my Sun:
: 
: ] $ nawk -f newjunk.awk
: ] nawk: empty regular expression
: ]  source line number 51
: ]  context is
: ]          >>> // <<<  {
: 
: 
: Waht about this one:
: /.*/    {
: 
: ] Your "standard" tools ain't so standard.   :-(
: 
: But much more standard than Perl.

In a year, this will be a topic for historians.  nawk has priced itself
out of the market, and gawk is already diverging from nawk.

Larry

henry@zoo.toronto.edu (Henry Spencer) (02/27/91)

In article <11594@jpl-devvax.JPL.NASA.GOV> lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) writes:
>In a year, this will be a topic for historians.  nawk has priced itself
>out of the market, and gawk is already diverging from nawk.

Nawk is bundled with SVR4, so it is about to become much more prevalent.
-- 
"But this *is* the simplified version   | Henry Spencer @ U of Toronto Zoology
for the general public."     -S. Harris |  henry@zoo.toronto.edu  utzoo!henry