[comp.lang.perl] An AWK script to check "junk" for newsgroups

Sepp@ppcger.ppc.sub.org (Josef Wolf) (02/20/91)

dglo@ADS.COM (Dave Glowacki) writes:

] Since, as a rule, EVERY C or shell program posted must be followed up
] by a PERL script, here's my version of NEWJUNK.

Well. Fine fine. But what about using standard-tools? Which *IX is
_delivered_ with Perl?

Now here is my version of NEWJUNK. It could have been better, but older
versions of gawk have these ugly memory-leak, so you have to sort out
the 'Newsgroups:' lines and pipe it into gawk :-(

The awk-version will most likely be slower than the C-version and the
Perl-Version, but it schould run on most *IX with little modifications.

This version uses 3 config files:
/usr/lib/news/newjunk.active    this are the newsgroups, I am interested
/usr/lib/news/newjunk.trash     to throw away the entire Newsgroups:-line
/usr/lib/news/newjunk.junk      Newsgroups, I don't want

In the config-files you can use regular expressions. Here is my
newjunk.active, for example:

---- snipp ----
# newjunk.active
#
# these are the newsgroups I want to have complete, if they will be
# found in junk
^comp\.sys\..*
^comp\.os\..*
^comp\.mail.*
^dnet\..*
^eunet\..*
^mnet\..*
^ppc\..*
^sub\..*
#               I want all.sources.all
.*\.sources.*
#               and all.os9.all
.*\.os9\..*
---- snipp ----

Here goes newjunk.awk. Just pipe all '^Newsgroups:' into 'awk -f newjunk.awk'

---- snipp---
BEGIN {

# read in active
  FS = ":";
#      ^^^  my news-system needs this one
  while (getline <"/usr/lib/news/active" > 0)
    if (length ($1))
      active [activecount++] = $1;

# read in config files
  while (getline tmp <"/usr/lib/news/newjunk.active" > 0)
    if (length (tmp) && !match (tmp, "^#"))
      nactive [nactivecount++] = tmp;

  while (getline tmp <"/usr/lib/news/newjunk.trash" > 0)
    if (length (tmp) && !match (tmp, "^#"))
      trash [trashcount++] = tmp;

  while (getline tmp <"/usr/lib/news/newjunk.junk" > 0)
    if (length (tmp) && !match (tmp, "^#"))
      junk [junkcount++] = tmp;

  FS = ",";
# newsgroups are separated with kommas
}

function insert_newsgroup(ng) {

# if newsgroup is alraedy inserted, we can save some time
  for (k = 0; k < newcount; k++)
    if (ng == newgroups [k])
      return;

# skip newsgroup if it is already active
  for (k = 0; k < activecount; k++)
    if (ng == active [k])
      return;

# insert newsgroup
  newgroups [newcount++] = ng;
}


// {

# check every newsgroup given in input line
  for (j = 1; j <= NF; j++) {

# do we want this newsgroup?
    for (i = 0; i < nactivecount; i++) {
      if (match ($j, nactive [i])) {
        insert_newsgroup($j);
#        break;
# don't know why I get some bus-error at this break -- sigh!
# but the script runs without this too (grinn :-)
      }
    }

# is there any trash-newsgroup?
    for (i = 0; i < trashcount; i++)
      if (match ($j, trash [i]))
        next;

# no trash-groups -> sort out the junk-newsgroups
    to_insert_count = 0;
    for (i = 0; i < junkcount; i++)
      if (!match ($j, junk [i]))
        to_insert [to_insert_count++] = $j;
  }

# insert them now
  for (i = 0; i < to_insert_count; i++)
    insert_newsgroup(to_insert [i]);
}

END {
  for (i = 0; i < newcount; i++) {
# insert the command for YOUR inews here
    cmd = "inews -ad=local '-c=newgroup:" newgroups[i] "' </nil";
    system (cmd);
    print newgroups [i];
  }
}
---- snipp ----

Greetings
        Sepp

Disclaimer: I had no time to make much tests of this version of newjunk.awk 
            If there are bugs, please let me know :-)

| Josef Wolf, Germersheim, Germany | +49 7274 8047  -24 Hours- (call me :-)
| ...!ira.uka.de!smurf!ppcger!sepp | +49 7274 8048  -24 Hours-
|     sepp@ppcger.ppc.sub.org      | +49 7274 8967  18:00-8:00, Sa + Su 24h
|  "is there anybody out there?"   | all lines 300/1200/2400 bps 8n1

lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (02/22/91)

In article <u4+nMD@ppcger.ppc.sub.org> Sepp@ppcger.ppc.sub.org (Josef Wolf) writes:
: dglo@ADS.COM (Dave Glowacki) writes:
: 
: ] Since, as a rule, EVERY C or shell program posted must be followed up
: ] by a PERL script, here's my version of NEWJUNK.
: 
: Well. Fine fine. But what about using standard-tools? Which *IX is
: _delivered_ with Perl?

Convex's OS.  BSD 4.4 will have it.  It's included with GNU.  Eventually
everyone will include it, I suspect.

: Now here is my version of NEWJUNK. It could have been better, but older
: versions of gawk have these ugly memory-leak, so you have to sort out
: the 'Newsgroups:' lines and pipe it into gawk :-(

Well. Fine fine. But what about using standard-tools? Which *IX is
_delivered_ with gawk?  :-)

: The awk-version will most likely be slower than the C-version and the
: Perl-Version, but it schould run on most *IX with little modifications.

Right.

On my Vax:

$ awk -f newjunk.awk
awk -f newjunk
awk: syntax error near line 6
awk: illegal statement near line 6
awk: syntax error near line 11
awk: illegal statement near line 11
awk: syntax error near line 12
awk: illegal statement near line 12
awk: syntax error near line 15
awk: illegal statement near line 15
awk: syntax error near line 16
awk: illegal statement near line 16
awk: syntax error near line 19
awk: illegal statement near line 19
awk: syntax error near line 20
awk: illegal statement near line 20
awk: syntax error near line 27
awk: bailing out near line 27

On my Sun:

$ nawk -f newjunk.awk
nawk: empty regular expression
 source line number 51
 context is
         >>> // <<<  {


Your "standard" tools ain't so standard.   :-(

Larry

tchrist@convex.COM (Tom Christiansen) (02/22/91)

From the keyboard of Sepp@ppcger.ppc.sub.org (Josef Wolf):
:dglo@ADS.COM (Dave Glowacki) writes:
:
:] Since, as a rule, EVERY C or shell program posted must be followed up
:] by a PERL script, here's my version of NEWJUNK.
:
:Well. Fine fine. But what about using standard-tools? Which *IX is
:_delivered_ with Perl?

The one shipped by CONVEX Computer Corporation, of course.  Call it 
competetive advantage.  :-)

If your vendor doesn't supply perl (and fie on them for not doing so),
then it's trivial to get.  Of course, gawk is equally easy to get.

What systems are _delivered_ with nawk?  I don't see why you think a gawk
program has something over a perl one; it's certainly slower.   If your
point is delivered systems, then nawk/gawk scripts aren't the answer.

--tom
-- 
"UNIX was not designed to stop you from doing stupid things, because
 that would also stop you from doing clever things." -- Doug Gwyn

 Tom Christiansen                tchrist@convex.com      convex!tchrist

ronald@robobar.co.uk (Ronald S H Khoo) (02/23/91)

tchrist@convex.COM (Tom Christiansen) writes:

> What systems are _delivered_ with nawk?

Only the most ubiquitous of unices:  SCO Xenix, of course.  So if you
count delivered systems by number of licenses, nawk has a surprisingly
large lead.  Most people with SCO Xenix don't realise this though, cos
its nawk is called "awk" and there ain't no "oawk" on the system.

-- 
Ronald Khoo <ronald@robobar.co.uk> +44 81 991 1142 (O) +44 71 229 7741 (H)

Sepp@ppcger.ppc.sub.org (Josef Wolf) (02/25/91)

lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) writes:

] In article <u4+nMD@ppcger.ppc.sub.org> Sepp@ppcger.ppc.sub.org
] (Josef Wolf) writes:
] : Well. Fine fine. But what about using standard-tools? Which *IX is
] : _delivered_ with Perl?

] Convex's OS.  BSD 4.4 will have it.  It's included with GNU.  Eventually
] everyone will include it, I suspect.

C'mon! Get your feet back on earth. I'm sure there are _some_ people out
in net.lands who don't want to buy a new OS just to get the fun of Perl.

] : Now here is my version of NEWJUNK. It could have been better, but older
] : versions of gawk have these ugly memory-leak, so you have to sort out
] : the 'Newsgroups:' lines and pipe it into gawk :-(

] Well. Fine fine. But what about using standard-tools? Which *IX is
] _delivered_ with gawk?  :-)

Why don't you use awk ? ;-)

] : The awk-version will most likely be slower than the C-version and the
] : Perl-Version, but it schould run on most *IX with little modifications.
                                                      ^^^^^^

] Right.
] On my Vax:
] $ awk -f newjunk.awk
[ some error messages deleted ]

Well, I said 'with _little_ modifications'. Not 'without modifications'...

] On my Sun:

] $ nawk -f newjunk.awk
] nawk: empty regular expression
]  source line number 51
]  context is
]          >>> // <<<  {


Waht about this one:
/.*/    {

] Your "standard" tools ain't so standard.   :-(

But much more standard than Perl.

Greets

    Sepp

| Josef Wolf, Germersheim, Germany | +49 7274 8047  -24 Hours- (call me :-)
| ...!ira.uka.de!smurf!ppcger!sepp | +49 7274 8048  -24 Hours-
|     sepp@ppcger.ppc.sub.org      | +49 7274 8967  18:00-8:00, Sa + Su 24h
|  "is there anybody out there?"   | all lines 300/1200/2400 bps 8n1

Sepp@ppcger.ppc.sub.org (Josef Wolf) (02/25/91)

tchrist@convex.COM (Tom Christiansen) writes:

] What systems are _delivered_ with nawk?  I don't see why you think a gawk
] program has something over a perl one; it's certainly slower.   If your
] point is delivered systems, then nawk/gawk scripts aren't the answer.

Well, but what about awk? Isn't it more standard than perl?

Greetings...


| Josef Wolf, Germersheim, Germany | +49 7274 8047  -24 Hours- (call me :-)
| ...!ira.uka.de!smurf!ppcger!sepp | +49 7274 8048  -24 Hours-
|     sepp@ppcger.ppc.sub.org      | +49 7274 8967  18:00-8:00, Sa + Su 24h
|  "is there anybody out there?"   | all lines 300/1200/2400 bps 8n1

prc@erbe.se (Robert Claeson) (02/25/91)

In article <1991Feb22.011415.21895@convex.com> tchrist@convex.COM (Tom Christiansen) writes:

>What systems are _delivered_ with nawk?

What systems aren't? Oh, the ones running SVR4 of course. awk is nawk there,
and old awk is oawk.

-- 
Robert Claeson

Disclaimer: I represent myself and not my employer.

lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (02/27/91)

In article <NE-4OD@ppcger.ppc.sub.org> Sepp@ppcger.ppc.sub.org (Josef Wolf) writes:
: lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) writes:
: 
: ] In article <u4+nMD@ppcger.ppc.sub.org> Sepp@ppcger.ppc.sub.org
: ] (Josef Wolf) writes:
: ] : Well. Fine fine. But what about using standard-tools? Which *IX is
: ] : _delivered_ with Perl?
: 
: ] Convex's OS.  BSD 4.4 will have it.  It's included with GNU.  Eventually
: ] everyone will include it, I suspect.
: 
: C'mon! Get your feet back on earth. I'm sure there are _some_ people out
: in net.lands who don't want to buy a new OS just to get the fun of Perl.

Who ever said they had to buy a new OS to get the fun of Perl?  I fear one
of your inclusive ORs turned into an exclusive OR while you weren't looking.

: ] : Now here is my version of NEWJUNK. It could have been better, but older
: ] : versions of gawk have these ugly memory-leak, so you have to sort out
: ] : the 'Newsgroups:' lines and pipe it into gawk :-(
: 
: ] Well. Fine fine. But what about using standard-tools? Which *IX is
: ] _delivered_ with gawk?  :-)
: 
: Why don't you use awk ? ;-)

I do.  When appropriate.

: ] : The awk-version will most likely be slower than the C-version and the
: ] : Perl-Version, but it schould run on most *IX with little modifications.
:                                                       ^^^^^^
: 
: ] Right.
: ] On my Vax:
: ] $ awk -f newjunk.awk
: [ some error messages deleted ]
: 
: Well, I said 'with _little_ modifications'. Not 'without modifications'...

You used a bunch of features that weren't even available in "standard" awk,
such as user-defined functions, match() and getline with redirections.
I don't think that counts as "little modification".

: ] On my Sun:
: 
: ] $ nawk -f newjunk.awk
: ] nawk: empty regular expression
: ]  source line number 51
: ]  context is
: ]          >>> // <<<  {
: 
: 
: Waht about this one:
: /.*/    {
: 
: ] Your "standard" tools ain't so standard.   :-(
: 
: But much more standard than Perl.

In a year, this will be a topic for historians.  nawk has priced itself
out of the market, and gawk is already diverging from nawk.

Larry

henry@zoo.toronto.edu (Henry Spencer) (02/27/91)

In article <11594@jpl-devvax.JPL.NASA.GOV> lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) writes:
>In a year, this will be a topic for historians.  nawk has priced itself
>out of the market, and gawk is already diverging from nawk.

Nawk is bundled with SVR4, so it is about to become much more prevalent.
-- 
"But this *is* the simplified version   | Henry Spencer @ U of Toronto Zoology
for the general public."     -S. Harris |  henry@zoo.toronto.edu  utzoo!henry

tmanos@wyvern.uucp (Tom Manos) (02/27/91)

ronald@robobar.co.uk (Ronald S H Khoo) writes:
>tchrist@convex.COM (Tom Christiansen) writes:
>> What systems are _delivered_ with nawk?
>Only the most ubiquitous of unices:  SCO Xenix, of course.

Also Microport SysVr3.2.2

Tom
-- 
Tom Manos @ wyvern     Norfolk, VA

tchrist@convex.COM (Tom Christiansen) (02/27/91)

From the keyboard of Sepp@ppcger.ppc.sub.org (Josef Wolf):
:lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) writes:
:] (Josef Wolf) writes:
:] : Well. Fine fine. But what about using standard-tools? Which *IX is
:] : _delivered_ with Perl?
:
:] Convex's OS.  BSD 4.4 will have it.  It's included with GNU.  Eventually
:] everyone will include it, I suspect.
:
:C'mon! Get your feet back on earth. I'm sure there are _some_ people out
:in net.lands who don't want to buy a new OS just to get the fun of Perl.

Nor do they need to: it's free.

:] : Now here is my version of NEWJUNK. It could have been better, but older
:] : versions of gawk have these ugly memory-leak, so you have to sort out
:] : the 'Newsgroups:' lines and pipe it into gawk :-(
:
:] Well. Fine fine. But what about using standard-tools? Which *IX is
:] _delivered_ with gawk?  :-)
:
:Why don't you use awk ? ;-)

Because it doesn't work for your example.


:] : The awk-version will most likely be slower than the C-version and the
:] : Perl-Version, but it schould run on most *IX with little modifications.
:                                                      ^^^^^^
:
:] Right.
:] On my Vax:
:] $ awk -f newjunk.awk
:[ some error messages deleted ]
:
:Well, I said 'with _little_ modifications'. Not 'without modifications'...
:
:] On my Sun:
:
:] $ nawk -f newjunk.awk
:] nawk: empty regular expression
:]  source line number 51
:]  context is
:]          >>> // <<<  {
:
:
:Waht about this one:
:/.*/    {
:
:] Your "standard" tools ain't so standard.   :-(
:
:But much more standard than Perl.

I can't see how you can say possibly this when you've just clearly
demonstrated how truly incompatible supposedly standard utilities are.
In fact, you didn't find one single delivered awk that ran your code.
You used gawk.

Fine, you say, you'll just standardize on gawk, so you run out and get
it, compile it and install it on all the machines and continue on your
way coding your script.  Until you find the next missing or otherwise
incompatible utility.  If you're lucky, there's some free, fully
functional, non-buggy, and well-documented code out there that you can
go and get once again and install everywhere.  You do this again and
again and again.

Or you can just do it once.  There is but one Perl, while the awks of the
world are legion.  And since there are myriads of these individual tools
across scores of platforms, you've got scores of myriads of legions of
these darn things to muck with.  No thank you.

--tom
--
"UNIX was not designed to stop you from doing stupid things, because
 that would also stop you from doing clever things." -- Doug Gwyn

 Tom Christiansen                tchrist@convex.com      convex!tchrist