[net.news] subscribers script

avolio@decuac.UUCP (Frederick M. Avolio) (09/13/85)

I made a few minor change.  Two to get it to work. One to lessen its
abusiveness to the CPU (while maintaining a high level of abuse...).

NEW vs OLD

16c16
< set `date`
---
> `date`
23d22
< export selectgroups
174a174
> 
185a186
> find $base_dir -mtime -14 -name .newsrc -print >$filefile
187d187
< find /u1/*/.newsrc /u2/*/.newsrc -mtime -14 -print >$filefile


Now, you can make the find more general by awk-ing and sed-ing the
password file for a list of login directories.  But come-on!  In any
event, no need for "find" to go down each tree so far when you are
really only lookign for something on the login directory level.

Fred (not Blonder) Avolio

reid@Glacier.ARPA (Brian Reid) (09/15/85)

Oh come on, people! Here is how to find the .newsrc files without searching
the entire file system:

awk -F: '{printf "if test -f %s/.newsrc; then echo %s/.newsrc; fi\n",$6,$6}' </etc/passwd | sh | sort -u > $filefile

Before you go saying that it ought to be done using ls instead of all those
test -f commands, let me remind you that there is a limit to the size of
argv on an exec, and glacier's /etc/passwd has enough entries to tickle it.
-- 
	Brian Reid	decwrl!glacier!reid
	Stanford	reid@SU-Glacier.ARPA

david@ukma.UUCP (David Herron, NPR Lover) (09/16/85)

In article <11824@Glacier.ARPA> reid@Glacier.UUCP (Brian Reid) writes:
>Oh come on, people! Here is how to find the .newsrc files without searching
>the entire file system:
>
>awk -F: '{printf "if test -f %s/.newsrc; then echo %s/.newsrc; fi\n",$6,$6}' </etc/passwd | sh | sort -u > $filefile

Err, uuhh, sorry.  I did it the way I did on purpose.  The original subscribers
program I had did it pretty much the way you did it.  However, at our site
we have some shared accounts.  The people sharing the accounts have their
own "home directory" and are telling rn that DOTDIR is elsewhere than HOME.
So if I just went around looking for $HOME/.newsrc as above I'd miss some.

BTW.  If you didn't notice in the script, I set $BASEDIR (whatever..) to
/usr/user which is actually the base of our user's directory tree.  So
I'm *NOT* searching the entire filesystem.  Not even most of it.  (We have
sources for too many versions of Unix on line for that!)
-- 
--- David Herron
--- ARPA-> ukma!david@ANL-MCS.ARPA
--- UUCP-> {ucbvax,unmvax,boulder,oddjob}!anlams!ukma!david
---        {ihnp4,decvax,ucbvax}!cbosgd!ukma!david

Hackin's in me blood.  My mother was known as Miss Hacker before she married!

heiby@cuae2.UUCP (Heiby) (09/16/85)

In article <11824@Glacier.ARPA> reid@Glacier.UUCP (Brian Reid) writes:
>Oh come on, people! Here is how to find the .newsrc files without searching
>the entire file system:
>
>awk -F: '{printf "if test -f %s/. ... etc.

This message (and I believe at least one of the preceding ones) ignores
the fact that the NEWSRC environment variable can be used to move the
.newsrc around.  It doesn't have to reside in the user's home directory.
Of course, if the user changes the file name as well as the directory then
there's not much hope of finding it.
-- 
Ron Heiby {NAC|ihnp4}!cuae2!heiby   Moderator: mod.newprod & mod.unix
AT&T-IS, /app/eng, Lisle, IL	(312) 810-6109
"No; my legs are written in a functional programming language." (J. McKie)

bill@persci.UUCP (09/17/85)

All this discussion about .newsrc reading script presupposes that every user
maintains his .newsrc file. That is not true at this site. Some people are
subscribed to *everything* and never read news, others drop in on newsgroups
as they will, never subscribing/unsubscribing. 

When we set up accelerated expirations for unread newsgroups (net.flame, etc)
recently, we polled everybody (and read their .newsrc files). There was little
significant correlation between the two sets of results.

-- 
William Swan  {ihnp4,decvax,allegra,...}!uw-beaver!tikal!persci!bill

david@ukma.UUCP (David Herron, NPR Lover) (09/19/85)

In article <405@persci.UUCP>, bill@persci.UUCP writes:
> All this discussion about .newsrc reading script presupposes that every user
> maintains his .newsrc file. That is not true at this site. Some people are
> subscribed to *everything* and never read news, others drop in on newsgroups
> as they will, never subscribing/unsubscribing. 

This is a *known*problem*, and I did put in a heuristic to get rid
of some of this ....   very simply.  If the last read article number
is LESS THAN the LEAST article number in the group then this
datum is tossed out.  

The only better method for getting around this is to keep some running
statistics between runs of the program.  You'd be looking for things
like the amount of traffic in the group versus any reading done in
the group.  F'r instance, if somebody didn't read a group for a week
then he's not really reading it.

But that seemed like to much to bite off for a project that was supposed
to fill a weekend...



While I have your attention ... I found another problem today .. The sed
script depends on a particular variable being exported to the environment.
But my script (as distributed) doesn't do this.  (It exports an entirely
different variable....)  To solve this I simply exported everything!
Also, I added a PATH definition so I could run the program from cron
as root (thereby see a lot of .newsrc files I miss if running as news).



Anybody feel like volunteering for making it keep running statistics?
-- 
--- David Herron
--- ARPA-> ukma!david@ANL-MCS.ARPA
--- UUCP-> {ucbvax,unmvax,boulder,oddjob}!anlams!ukma!david
---        {ihnp4,decvax,ucbvax}!cbosgd!ukma!david

Hackin's in me blood.  My mother was known as Miss Hacker before she married!

sewilco@mecc.UUCP (Scot E. Wilcoxon) (09/23/85)

In article <405@persci.UUCP>, bill@persci.UUCP writes:
> All this discussion about .newsrc reading script presupposes that every user
> maintains his .newsrc file. That is not true at this site. Some people are
> subscribed to *everything* and never read news, others drop in on newsgroups
> as they will, never subscribing/unsubscribing. 

I've been designing an automatic news feed blocker (more later) which has
to know which groups are NOT being read.  In my ignorance, I assumed that
.newsrc might not be valid so I looked for another way to globally detect
a group which is not being read.

The "time last accessed" (st_atime in stat) lets you detect an article which
has been read since the "time last modified" (st_mtime).  Normally, an article
is only written when it is created so st_mtime is time of creation.  An article
which has never been read will have st_atime == st_mtime.

An unread newsgroup is one in which no articles have been read.  However,
news reading programs which use the "active" file might not bother scanning
a newsgroup which has had no articles posted.  So the stat information
is only valid when a group has articles stored, thus a program to scan
the groups probably should be run just before "expire" is run.

A variation on the theme is checking if the group's directory has been
accessed since it was modified.  This can detect a news reader scanning
a directory for expected articles, even if the directory is empty.

On a site which is passing on batched news, the news batcher does "read"
the articles which it passes on.  That is a desirable side effect for
my application.  Possibly not for yours...

Note that detecting a group which is not being read can be a slightly
different problem than detecting one which is being read.  It's a
matter of whether it is safer to detect too few or too many groups
of the desired status.
-- 

Scot E. Wilcoxon	Minn. Ed. Comp. Corp.      circadia!mecc!sewilco
45 03 N / 93 15 W	(612)481-3507 {ihnp4,uwvax}!dicomed!mecc!sewilco