woods@hao.UUCP (Greg Woods) (03/21/86)
I just want to correct a couple of misconceptions (which are probably due to my own poor writing anyway). I have been cast as the "bad guy" in this, because I have "opposed", in some sense, someone who is doing his best to gain information that everyone wants. Contrary to popular interpretation, I fully support this effort. What disturbs me is twofold: personal public criticism for daring to oppose this net 'hero', and that people are so blind to the inaccuracies (very well explained by Lauren, so I have no need to repeat them) that they are ready to start using these results to determine what groups we keep and which we don't. THAT is all I object to, is certain potential USES of the results, not the survey itself. I think the data are quite enlightening; that so many want mod.movies, for example, suggests that maybe, just maybe, there is more support for moderated groups than the public discussions on the subject would tend to show. But this is a very GENERAL conclusion. What scares ME is the possibility of axing certain groups based solely on these results, like 'hey, look, net.blah has the highest cost per reader, let's get rid of it'. Brian has partially counteracted this by admitting that the margin for error is large; but he also claims that his survey is exempt from the 'self-select sample' effects brought up by Lauren. I do not agree with that assesment. There are other self-select factors that everyone is ignoring. My favorite example :-) is the use of the Bourne shell. Many older versions of UNIX, and smaller systems, do not support this. How does that enter into the self- selection process? Well, this is already longer than I intended, and I promise it will be my last posting on the subject. I encourage Brian to continue the survey; I still wish to hell he had posted a C program instead of a shell script so more could participate, and I caution everyone not to make too many far-reaching decisions based on the results. --Greg -- {ucbvax!hplabs | decvax!noao | mcvax!seismo | ihnp4!seismo} !hao!woods CSNET: woods@ncar.csnet ARPA: woods%ncar@CSNET-RELAY.ARPA "If the game is lost, we're all the same; no one left to place or take the blame; Will we leave this place an empty stone, or a shining ball of earth, we can call our home"
chuq@sun.uucp (Chuq Von Rospach) (03/22/86)
> ... and that people are so blind to the inaccuracies (very well > explained by Lauren, so I have no need to repeat them) that they > are ready to start using these results to determine what groups we > keep and which we don't. Lauren's article (as rebutted by Brian) was a LOT more innacurate than the statistics he attempted to discredit. Yes, I'm MORE than ready to use the results of the statistics to try to streamline the net so that it will benefit the majority of the readers. This, of course, has to be distressing to people in the groups with exceptionally high volume and very few readers, since what Brian has really done is blow away the USENET attitudes regarding volume and utility -- there is now REAL evidence that volume and readership are completely unconnected, and we can track down (and potentially eliminate) the ego-based write mostly groups. >I think > the data are quite enlightening; that so many want mod.movies, > for example, suggests that maybe, just maybe, there is more > support for moderated groups than the public discussions on the > subject would tend to show. But this is a very GENERAL conclusion. Actually, I think this conclusion is incorrect. In many cases the mod groups have so little volume that people haven't gotten around to unsubscribing to it yet. > What scares ME is the possibility of axing certain groups based > solely on these results, like 'hey, look, net.blah has the highest > cost per reader, let's get rid of it'. This may seem silly, but I think that it is logical for streamlining of the net to be done by getting rid of the high volume/low readership groups -- the most affect for the least netwide trauma (except to the people who like to hear themselves type). > but he also > claims that his survey is exempt from the 'self-select sample' effects > brought up by Lauren. I do not agree with that assesment. There are > other self-select factors that everyone is ignoring. I didn't realize you were trained in statistics. How would you recommend improving the data then? No offense intended, but I prefer to listen to the people trained in the discipline... Now, before people accuse me of being too hard on Greg, let me make a few points. I'm not bitching directly at Greg on this, but at some attitudes that happen to be in his posting that seem to be generic on the net: First, Brian's stats are showing some real fallacies in the way things are done on Usenet. One is the assumption that volume == utility, which is being shown to be definitely not true. In many groups, a few very vociferous users can completely overwhelm the rest of the readership. Second, there is the implied 'it isn't good for me, so we can't do it'. Eventually we're going to have to make decisions about what the net is really here for, as volume and costs continue to rise. The LOGICAL thing is to streamline that which affects the least users, which is difficult to do currently because we've never before known who is reading thins -- only who is writing. We can now change that. Third, there is a consistent problem on the net because people say things like "I disagree with this and so it is wrong". Well, Brian knows a LOT about statistics. He has access to some of the best statisticians in the world at Stanford, and he's put a LOT of work into convincing himself that these stats are valid. Unless you know stats as well as him and know what he has really done, I can't think of a way in the world that you could convince me that he is wrong, especially when (as is typical on the net) you have NO facts to back your assertions. -- The ONLY problem I have with Brian's stats is the amount of work it takes (on a net-wide basis) to implement. I don't think they are practical to use on a regular basis as a way of making decisions on the net. They are definitely useful for occasionally figuring out what is going on out there, though, and I'd love to see them run every six months or so. I do think we need a new measure of group utility. Previously that measure has been total volume. I suggest we consider using total volume divided by the number of DIFFERENT posters over a given time. This could be implemented easily as part of the newslist data at seismo, and will give us a good ratio of total interest, assuming you believe that 1 megabyte posted by 20 people is more useful than 1 megabyte posted by three people. There are some groups where this breaks down (especially *.sources* and net.jokes, I would guess) but would be that total number of posters would be a good measure of the total number of readers. Comments? chuq -- :From catacombs of a past participle: Chuq Von Rospach chuqi%plaid@sun.ARPA FidoNet: 125/84 CompuServe: 73317,635 {decwrl,decvax,hplabs,ihnp4,pyramid,seismo,ucbvax}!sun!plaid!chuq I used to really worry about splitting my infinitives until I realized that most people had never heard of them.
reid@glacier.ARPA (Brian Reid) (03/22/86)
I'm curious about Greg's bete noire, the missing /bin/sh. I would very much like to hear from any other site that does not have a Bourne Shell. In my experience with Unix systems, the Bourne Shell is the one constant--other shells may come and go, but /bin/sh is always there. That's why I used it. Also, it's a lot more work to write a C program than to cobble together a shell script. I don't have much experience at writing C programs, and I wanted to get the arbitron program working quickly. I don't like programming in C very much, but shell scripts are fun in a perverse kind of way. Greg, just for you I will make a csh version of Arbitron as soon as I get my grades handed in Monday morning. I'm grading finals at the moment. Brian -- Brian Reid decwrl!glacier!reid Stanford reid@SU-Glacier.ARPA
gds@mit-eddie.MIT.EDU (Greg Skinner) (03/23/86)
I would like to caution that before any global decisions are made regarding which groups to keep, etc., we wait for a lot more of the net to report in. For example, there have only been a few entries from AT&T -- and none from any of the major sites (ihnp4, cbosgd, etc.) and the sites they feed. I believe that data will be critical in determining net readership -- AT&T is the largest multiorganization in Usenet and most probably has the most readers of any multiorganization in Usenet. I hope we get the AT&T data soon. I think even more statistics could be taken. For example, we could figure out how many articles are cross-posted per newsgroup, how many articles posted per site to a group, probably others as well. Is the newsstats data at seismo sufficient to extract that information? If not, I might write a program to take that data, if I can find the time. This may be a premature guess but I think one of the outcomes of this poll will encourage local unmoderated distributions, especially if the data bears out that the readership and writership of certain groups is localized to certain geographic or organizational area. -- It's like a jungle sometimes, it makes me wonder how I keep from goin' under. Greg Skinner (gregbo) {decvax!genrad, allegra, gatech, ihnp4}!mit-eddie!gds gds@eddie.mit.edu
msc@saber.UUCP (Mark Callow) (03/24/86)
> > I'm curious about Greg's bete noire, the missing /bin/sh. > So am I. Here's the result of an "ls -C *.sh" on my 2.10.2 news source directory. /usr/src/usr.bin/news/src c2sendbatch.sh csendbatch.sh install.sh makeactive.sh sendbatch.sh checkgroups.sh cunbatch.sh localize.sh rmgroup.sh I know most of these scripts are used for installation and one can run news without batching. Still if anyone without a Bourne shell installed news, they must have gone to considerable effort. -- From the TARDIS of Mark Callow msc@saber.uucp, sun!saber!msc@decwrl.dec.com ...{ihnp4,sun}!saber!msc "Boards are long and hard and made of wood"
rees@apollo.uucp (Jim Rees) (03/24/86)
On the topic of self-selection, I would agree with Greg's assertion that some sites won't run arbitron. Usenet "site" apollo is actually about 1500 machines with 2000 users, about 200 of them news users. The arbitron script won't work here because of the diversity of machines, protections on home directories, and people who shut down their node at night. At one time I was enough of a shell hacker to make it work, but I just don't have the time or inclination for that stuff any more.
cda@ucbopal.berkeley.edu (Charlotte Allen) (03/25/86)
In article <3389@sun.uucp> chuq@sun.uucp (Chuq Von Rospach) writes: > but I think that it is logical for streamlining >of the net to be done by getting rid of the high volume/low readership >groups -- the most affect for the least netwide trauma (except to the people >who like to hear themselves type). Why don't we get rid of the high volume/low readership posters (guess who comes to mind....)
msc@saber.UUCP (Mark Callow) (03/27/86)
> net to report in. For example, there have only been a few entries > from AT&T -- and none from any of the major sites (ihnp4, cbosgd, This is the second posting implying that the data from the "major" sites is important to this readership survey and must be gathered before any decisions are made. Why? The volume of traffic that passes through a site is totally irrelevant to a survey of newsgroup readership. The important criteria is the number of users particularly news readers on the machine. Some of the most "major" sites are simply mail and news store and forward machines. (e.g. decvax and ucbvax) As such they probably don't have any users let alone users who read news on them. -- From the TARDIS of Mark Callow msc@saber.uucp, sun!saber!msc@decwrl.dec.com ...{ihnp4,sun}!saber!msc "Boards are long and hard and made of wood"
chapman@miro.berkeley.edu (Brent Chapman) (03/28/86)
In article <1960@saber.UUCP> msc@saber.UUCP (Mark Callow) writes: >> net to report in. For example, there have only been a few entries >> from AT&T -- and none from any of the major sites (ihnp4, cbosgd, > >This is the second posting implying that the data from the "major" sites >is important to this readership survey and must be gathered before any >decisions are made. Why? > >The volume of traffic that passes through a site is totally irrelevant >to a survey of newsgroup readership. The important criteria is the number >of users particularly news readers on the machine. Some of the most "major" >sites are simply mail and news store and forward machines. (e.g. decvax and >ucbvax) As such they probably don't have any users let alone users who >read news on them. I can say from personal experience that this is true. Here at Berkeley, the load on ucbvax seldom drops below about 4, even when no-one is logged in. Very few people are willing to put up with the loads on ucbvax just to read news. We have an alternative, which I am not sure is handled by the survey program. (Please note: the scheme I'm about to describe may in fact be very common at large, multi- machine sites, but I don't have any experience with sites other than Berkeley, so I may be pointing out something trivial. If that is the case, I apologize for wasting your time.) Here, there are a few machines (ucbvax, ucbcad, and ucbjade, I believe) that actually have news on them. These machines presumeably have the standard news programs available on them (I don't know; I don't have an account on any of these machines). They also have a "news server". The news server is much like the other Internet servers, such as mail servers and ftp servers. When another machine (such as ucbmiro, the machine I'm using now) wants news access, it opens a socket to a news server on ucbvax, ucbjade, or ucbcad, and deals with articles through that interface (I just LOVE EtherNets!). We have a program called 'rrn' which is apparently 'rn' re-built to deal with the server, instead of directly with the file system. I'm not certain; I've never seen 'rn'. In any case, my question is whether or not the people on these 'non-news' machines are included in the survey. If they are not, then you are excluding most of the news readers at Berkeley. I'm not trying to run down the survey or the surveyor; I think it is a good idea, and that a lot of thought has gone into it to make the survey as accurate as possible. Brent Chapman ucbvax!miro!chapman chapman@miro.berkeley.edu
fair@ucbarpa.berkeley.edu (Erik E. Fair) (03/29/86)
In point of fact, ucbvax has quite a few netnews readers, in spite of the wildly variable load of the machine. I ran arbitron on ucbvax, and sent off the results to netsurvey@su-glacier.arpa. (257 users, 99 net readers). However, since we run a distributed netnews system here at UCB (as noted by Mr. Chapman), I also followed up the arbitron results with a letter to Brian Reid explaining our system, and including a copy of the weekly report that indicates which groups were accessed with what frequency. While we don't have it broken out by user (or even by machine; just a raw count of how many times each group was requested for examination by all the clients of the server that week), I think it is a good measure of what the UCB community is reading. Plug: the software in question implements RFC977 (Network News Transfer Protocol, [NNTP]), written by Phil Lapsley <ucbvax!phil>, and Brian Kantor <sdcsvax!brian>, with some kibitzing from me. It is presently available for public FTP from ucbvax (10.2.0.78, pub/nntp.tar), soon to be posted to mod.sources. keeper of the network news for ucbvax, Erik E. Fair ucbvax!fair fair@ucbarpa.berkeley.edu
chuq@sun.uucp (Chuq Von Rospach) (03/29/86)
> In article <3389@sun.uucp> chuq@sun.uucp (Chuq Von Rospach) writes: > > but I think that it is logical for streamlining > >of the net to be done by getting rid of the high volume/low readership > >groups -- the most affect for the least netwide trauma (except to the people > >who like to hear themselves type). > > Why don't we get rid of the high volume/low readership posters (guess who > comes to mind....) Well, I'd guess offhand just about anyone in a non-technical group on the seismo top 25. Contrary to the snide insinuation, that ain't me, since I've been on the top 25 once in the last six months. On a practical level, getting rid of individual users is an administrative impossibility for the net -- the only control at the user level is in the hands of the SA. Getting rid of bloated newsgroups IS under net control, and worth looking at. Personally, if it was possible, I'd like to see us get rid of articles with no factual content, useless repetitions, childish accusations and other associated garbage. My biggest worry on this, though, is that there would be no net left when we were done. me -- :From the lofty realms of Castle Plaid: Chuq Von Rospach chuq%plaid@sun.COM FidoNet: 125/84 CompuServe: 73317,635 {decwrl,decvax,hplabs,ihnp4,pyramid,seismo,ucbvax}!sun!plaid!chuq The first rule of magic is simple. Don't waste your time waving your hands and hoping when a rock or a club will do -- McCloctnik the Lucid
tim@ism780c.UUCP (Tim Smith) (04/01/86)
In article <3417@sun.uucp> chuq@sun.uucp (Chuq Von Rospach) writes: > >Personally, if it was possible, I'd like to see us get rid of articles with >no factual content, useless repetitions, childish accusations and other >associated garbage. My biggest worry on this, though, is that there would be >no net left when we were done. What I would like to see is for everyone to be required to use Compuserve for six months before being allowed to post to USENET. One doesn't post content-free flames when one is paying 12 bucks an hour for connect time! Perhaps the habit of posting short, to the point, articles would carry over to USENET. Of course there is no way to actually implement this... -- Tim Smith sdcrdcf!ism780c!tim || ima!ism780!tim || ihnp4!cithep!tim
mwm@ucbopal.berkeley.edu (Mike (I'll be mellow when I'm dead) Meyer) (04/01/86)
In article <3417@sun.uucp> chuq@sun.uucp (Chuq Von Rospach) writes: >Well, I'd guess offhand just about anyone in a non-technical >group on the seismo top 25. Contrary to the snide insinuation, that ain't >me, since I've been on the top 25 once in the last six months. > >On a practical level, getting rid of individual users is an administrative >impossibility for the net -- the only control at the user level is in the >hands of the SA. Getting rid of bloated newsgroups IS under net control, and >worth looking at. Ok, I can't resist. We went over this problem at lunch today (random chance, that), and came up with the following two-step solution: 1) a hack to inews so that it refuses to accept articles posted by anyone on a list of user@site type names. 2) An awk script (or something similar) that takes the top 25 list, and turns it into a list for step one. Criteria should include newsgroups posted to and # of s.d. away from average. If the backbone started running this code (or something like it), we would have instant, objective deletion of high-volume users on a netwide level, but only for a couple of weeks. And maybe, just maybe, the thought of being censored that way would make people think before posting. <mike
chuq@sun.uucp (Chuq Von Rospach) (04/02/86)
> ME: > >On a practical level, getting rid of individual users is an administrative > >impossibility for the net -- the only control at the user level is in the > >hands of the SA. Getting rid of bloated newsgroups IS under net control, and > >worth looking at. > > Ok, I can't resist. We went over this problem at lunch today (random chance, > that), and came up with the following two-step solution: > > 1) a hack to inews so that it refuses to accept articles posted by anyone on > a list of user@site type names. > > 2) An awk script (or something similar) that takes the top 25 list, and > turns it into a list for step one. Criteria should include > newsgroups posted to and # of s.d. away from average. > > If the backbone started running this code (or something like it), we would > have instant, objective deletion of high-volume users on a netwide level, > but only for a couple of weeks. And maybe, just maybe, the thought of being > censored that way would make people think before posting. Problems: o you censor people silently -- if you allow a message to be posted and then make it silently go away downstream, how do they know they were deleted from the net? Don't assume any of these people read net.news or read anything but the group they are posting in. You may get rid of the articles, but you aren't solving the problem -- they don't know they are being censored and don't change their ways. o what happens when the data from seismo is WRONG? Without a human in the loop, problems will definitely occur. What happens when I start posting forged messages causing people I don't like to get knocked off the net (and being knocked off, can't even complain about it!) o How do you keep from knocking out the people making positive contributions to netnews? chris torek writes a public domain version of 4.2 in his spare time. He posts it, to the wonderment of all. He then gets kicked off the net for excessive volume. Isn't this a NEGATIVE inducement to doing good things? It ain't as easy as it looks. Coming up with a fair way of cutting back the dead weight sounds good, but it has a lot of practical problems. We just don't have the administrative tools to do it right, I think. -- :From the lofty realms of Castle Plaid: Chuq Von Rospach chuq%plaid@sun.COM FidoNet: 125/84 CompuServe: 73317,635 {decwrl,decvax,hplabs,ihnp4,pyramid,seismo,ucbvax}!sun!plaid!chuq The first rule of magic is simple. Don't waste your time waving your hands and hoping when a rock or a club will do -- McCloctnik the Lucid