[alt.sources] Bandwidth Wasters Hall of Fame - The Code

xanthian@well.UUCP (Kent Paul Dolan) (09/22/89)

Here's a slightly tongue in cheek bandwidth decreasing tool.  Use it in
good health.  Please forgive my beta release shar program that adds a 
blank line at the end of each file, and then complains about it during
unsharing; it doesn't seem to hurt anything.

well!xanthian
Kent, the man from xanth, now just another echo from The Well.

#! /bin/sh
# This is a shell archive.  Remove anything before this line, then feed it
# into a shell via "sh file" or similar.  To overwrite existing files,
# type "sh file -c".
# The tool that generated this appeared in the comp.sources.unix newsgroup;
# send mail to comp-sources-unix@uunet.uu.net if you want that tool.
# If this archive is complete, you will see the following message at the end:
#		"End of shell archive."
# Contents:  bwhf.hype bwhf.csh bwhf1.awk bwhf2.awk bwhf.example_output
# Wrapped by kent as a guest on Thu Sep 21 20:45:03 1989
PATH=/bin:/usr/bin:/usr/ucb ; export PATH
if test -f 'bwhf.hype' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'bwhf.hype'\"
else
echo shar: Extracting \"'bwhf.hype'\" \(1011 characters\)
sed "s/^X//" >'bwhf.hype' <<'END_OF_FILE'
X		    BANDWIDTH WASTERS HALL OF FAME
X
X	     You've seen the postings, now read the code!
X
XHave a group of blowhards taken over your favorite newsgroup, with
Xpostings of negligible content and awesome volume?
X
XIs it getting hard to cut through the chaff in your search to find
Xthose grains of meaning?
X
XAre you mad enough to _take measures_?
X
XDo you wish you had a way to get, not just even, but ahead?
X
XWish no more!  Here are the tools you need to publish your _very own_
XBandwidth Wasters Hall of Fame articles, and point the finger of
X_public ridicule_ at the guilty parties.
X
XEnclosed are two awk scripts, and a cshell script to run them.  These
Xare for a BSD 4.3 system (Sun 4.0.3) with a really wimpy
Ximplementation of awk.  You may have to fiddle things a bit to make it
Xgo on your system, but the basics are here.
X
XRead, enjoy, and most of all, use it to _nail the miscreants_!
X
XYours for an improved signal to noise ratio,
X
Xwell!xanthian
XKent, the man from xanth, now just another echo from The Well.
X

END_OF_FILE
echo shar: NEWLINE appended to \"'bwhf.hype'\"
if test 1012 -ne `wc -c <'bwhf.hype'`; then
    echo shar: \"'bwhf.hype'\" unpacked with wrong size!
fi
# end of 'bwhf.hype'
fi
if test -f 'bwhf.csh' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'bwhf.csh'\"
else
echo shar: Extracting \"'bwhf.csh'\" \(1689 characters\)
sed "s/^X//" >'bwhf.csh' <<'END_OF_FILE'
X#!/bin/csh
X#
X# bwhf.csh by Kent Paul Dolan - Public Domain
X#
X# Bandwidth Waster's Hall of Fame master shell script; runs two awk
X# scripts with a sort step between them.  Set the execute bit on this
X# file with chmod and put it in your path.  It expects the two awk
X# scripts to be in the current directory, and needs access to the
X# "awk" and "sort" and "date" Unix(tm) commands.  I don't know whether
X# this command set would work under "sh", I didn't try it.
X#
X# usage:  bwhf.csh <path-to-newsgroup-articles> <output-file-name>
X#
X# example: bwhf.csh /usr/spool/news/alt/sources BWHF.alt.sources
X#
X# The first awk script accumulates the statistics for each author in
X# an array, then dumps the array to a temp file for sorting.  The
X# [0-9] are to exclude subordinate directories from being processed as
X# articles.
X#
Xawk -f bwhf1.awk ${1}/[0-9]* > /tmp/$$.bwhf.1
X#
X# The sort step sorts on the bytes wasted column, numerically because it
X# has leading blanks, and reversed because we want to list the worst
X# bandwidth wasters first.
X#
Xsort -nr < /tmp/$$.bwhf.1 > /tmp/$$.bwhf.2
X#
X# The second awk script prints a header, including the path to the
X# newsgroup and the date, prints a line for each byte-burner, then
X# prints a footer with a totals line and an apology for not including
X# "beyond AI" capabilities in the output.
X#
Xawk -f bwhf2.awk newsgrouppath=$1 date="`date`" /tmp/$$.bwhf.2 > $2
X#
X# Clean up the temp files - why wait for a reboot?
X#
Xrm /tmp/$$.bwhf.[1-2]
X#
X# You might want to put this back in, to preview the output before you
X# send it off to your favorite newsgroup; it was giving me fits when I
X# ran this script in background, so I commented it out.
X#
X#more $2
X

END_OF_FILE
echo shar: NEWLINE appended to \"'bwhf.csh'\"
if test 1690 -ne `wc -c <'bwhf.csh'`; then
    echo shar: \"'bwhf.csh'\" unpacked with wrong size!
fi
chmod +x 'bwhf.csh'
# end of 'bwhf.csh'
fi
if test -f 'bwhf1.awk' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'bwhf1.awk'\"
else
echo shar: Extracting \"'bwhf1.awk'\" \(4966 characters\)
sed "s/^X//" >'bwhf1.awk' <<'END_OF_FILE'
X#
X# bwhf1.awk by Kent Paul Dolan - Public Domain
X#
X# Bandwidth Waster's Hall of Fame first awk script; finds article
X# authors on "From:" line, credits them with the article and the bytes
X# it contains, accumulates byte and article counts into arrays indexed
X# by author (_love_ those associative array indices) , counts total
X# bytes, lists bytes, byte share, articles, author's login, and any
X# other author info from the "From:" line.
X#
X# Fails to merge postings from the same author at different sites,
X# because it is not possible to distinguish the case of different
X# people at different sites with the same login, and the same person
X# and login from different sites, by mechanical means.
X#
X# This script is normally run by csh script bwhf.csh, but anyway, here is:
X#
X# usage: awk -f bwhf1.awk <path-to-newsgroup>/[0-9]* > <outfile-to-sort-step>
X#
X# example: awk -f bwhf1.awk /usr/spool/news/alt/sources/[0-9]* temp1
X#
X# where the [0-9]* takes care of the case of a newsgroup with articles
X# which also has one or more subgroups (whose names won't start with
X# [0-9])
X#
X# Setup a couple of variables for file swapping control and multiple
X# "From:" line detection.
X#
XBEGIN		{
X#
X# use this to detect when we have changed files and need to start a
X# new bytecount for a new file and save the old one to the old
X# author's count.
X#
X		  lastfile = FILENAME
X#
X# Use this to avoid problems with multiple "From:" lines in the same
X# article (not really needed, since awk zeros all variables at
X# creation, but the code is a lot easier to comprehend with this in
X# here): 
X#
X		  sawfrom = 0
X		}
X#
X# Although this is the first record processing code physically,
X# logically it is not executed until the top of the second and
X# subsequent articles of the input, therefore the "From:" code below
X# has been executed once before this code.  This pattern/action pair
X# has to be up here to make sure that the bytecount and sawfrom fields
X# are cleared before any other processing on second and subsequent
X# articles.
X#
X# When the article for the current record has changed:
X#
X# Accumulate the byte count for the previous article for its author
X# (saved as "from" in the "From:" pattern/action set); then clear the
X# bytecount.  Reset the lastfile item to the current file name, and
X# clear sawfrom so that we are again looking for a "From:" line.
X#
Xlastfile != FILENAME	{ bytes[from] = bytes[from] + bytecount
X			  bytecount = 0
X			  lastfile = FILENAME
X			  sawfrom = 0
X			}
X#
X# For every record (line) in the file (article), count it's bytes (the
X# + 1 takes care of the '\n', which is ignored by "length($0)") into a
X# total byte count for the file.
X#
X{ bytecount = bytecount + length($0) + 1 }
X# 
X# One line in the article gets special processing: the _first_ "From:"
X# line.  If we haven't set sawfrom to 1 in this article, and this line
X# _starts_ with "From:", then it is the one we want to identify the
X# author of the article.  Pull the login@site out of the second field
X# as element "from" (the author ID), use it as an array index of an
X# associative array "articles" to (possibly create with contents zero
X# and) bump the article count for this author.  Most authors' posting
X# software includes a vanity ID after the login@site information. Use
X# the index and substr commands to pull that off and store it too,
X# indexed by author in associative array "fromtags" .  The authors who
X# use more than one vanity ID from the same site get the usage from
X# the last of their articles.  Set sawfrom to 1 (true) to avoid
X# processing a second "From:" line where an article includes some of
X# the header of another article without a protecting lead character.
X# 
X/^From:/ && sawfrom == 0 { 
X		  from = $2
X		  articles[from]++
X		  ind = index($0,$2) + length($2) + 1
X		  fromtags[from] = substr($0,ind)
X		  sawfrom = 1
X		}
X# 
X# After all the articles have been processed, we need to add the
X# bytecount for the last article to the credit of the last wastrel,
X# because we don't see another line to process through the "lastfile =
X# FILENAME" pattern/action pair above, which does that crediting for
X# all other articles but the last one.
X# 
X# Loop through the associative byte count array by author to get a
X# total byte count for all the articles, to use in determining an
X# author's share of the total bandwidth waste.  Use that information
X# in a second loop which prints per-author summary information to
X# calculate the share percentage field.  For each author, print the
X# bytes wasted, the waste share, the articles exuded, and the author
X# ID and author vanity ID.
X#
X# The resulting file is ready for the sort step.
X#
XEND	{ bytes[from] = bytes[from] + bytecount
X
X	  for (from in articles)
X	  {
X	    bytestotal = bytestotal + bytes[from]
X	  }
X	  for (from in articles)
X	  {
X	    
X            printf("%8s %6.2f%% %4s  %s %s\n", \
X	           bytes[from], \
X		   (bytes[from]*100)/bytestotal, \
X		   articles[from], \
X		   from, \
X		   fromtags[from])
X	  }
X	}

END_OF_FILE
echo shar: NEWLINE appended to \"'bwhf1.awk'\"
if test 4967 -ne `wc -c <'bwhf1.awk'`; then
    echo shar: \"'bwhf1.awk'\" unpacked with wrong size!
fi
# end of 'bwhf1.awk'
fi
if test -f 'bwhf2.awk' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'bwhf2.awk'\"
else
echo shar: Extracting \"'bwhf2.awk'\" \(3187 characters\)
sed "s/^X//" >'bwhf2.awk' <<'END_OF_FILE'
X#
X# bwhf2.awk by Kent Paul Dolan - Public Domain
X#
X# Bandwidth Waster's Hall of Fame second awk script; prints header,
X# prints by-author lines and (re)accumulates byte and article totals,
X# prints a footer showing the totals of bytes, share, and article
X# counts.
X#
X# A "sort" step to sort the by-author lines in reverse bytes-wasted
X# order should be run after the first script and before this one to
X# rank the bandwidth wasters from most to least heinous, although this
X# script is not dependent on the sort order of the input lines.
X# 
X# This awk script is normally run by csh script bwhf.csh, but here is:
X#
X# usage: awk -f bwhf2.awk newsgrouppath=/usr/spool/news/whatever \
X#        date="some-string" <input-from-sort-of-output-of-bwhf1.awk>
X#
X#example: awk -f bwhf2.awk newsgrouppath=/usr/spool/news/alt/sources \
X#	  date="`date`" temp2
X#
X# (the "\" means each of these is all supposed to be on one line)
X#
X# Start the header:
X#
XBEGIN	{ 
X	  printf("%55s\n", "BANDWIDTH WASTERS HALL OF FAME")
X          printf("%48s\n","for articles in")
X	}
X#
X# Finish the header:
X#
X# This has to be done at the first line, because until awk tries to
X# read the first line, it hasn't seen the command line settings for
X# newsgrouppath and date, so putting this in the BEGIN block failed.
X#
XNR == 1	{ pformat = "%" int(40 + ((length(newsgrouppath)  + 1) / 2) ) "s\n"
X	  printf(pformat,newsgrouppath)
X	  pformat = "%" int(40 + ((length(date)  + 1) / 2) ) "s\n"
X	  printf(pformat,date)
X	  print ""
X	  print "   Bytes  Volume  Offending"
X	  print "  Wasted   Share  Articles     Guilty Party"
X	  print ""
X	}
X#
X# Accumulate the total bytes and total articles, and print each
X# wastrel's contribution line:
X# 
X# I was faking the share total to 100 percent, but then I thought a
X# bit more.  Now it is calculated, giving BWHF posters the chance to
X# edit the sort output down to just the worst ten or so offenders, and
X# pass just those records through this second awk script.  My own
X# expeience is that people just hate being omitted from the list, but
X# your mileage may vary, so I changed the code to accomodate.
X# 
X		{ bytestotal = bytestotal + $1
X#
X# We have to strip off the trailing "%" from $2 to make a number:
X#
X		  share = substr($2,1,length($2)-1)
X                  sharetotal = sharetotal + share
X		  articlestotal = articlestotal + $3
X		  print
X		}
X#
X# Print the footer, consisting of a Totals line, an apology that this
X# awk script doesn't do AI name matches for posters who use multiple
X# sites, and a none too subtle piece of author puffery and general
X# purpose mischief making.
X#
XEND	{ 
X	  print "-------- ------- ----"
X	  printf("%8s %6.2f%% %4s  Totals for %d authors\n", \
X	         bytestotal,sharetotal,articlestotal,NR)
X	  print ""
X	  print "(Roundoff fuzz may make total share not equal 100.00%)"
X	  print ""
X	  print "(Sorry, if you posted from more than one site, you got more"
X	  print "than one entry.  It's unavoidable; think about it!  But even"
X	  print "though your subtotals look smaller, we know who you are!)"
X	  print ""
X	  print "[A shar file of the scripts used to create this article was"
X	  print "posted to alt.sources by the author, Kent Paul Dolan.]"
X
X	}

END_OF_FILE
echo shar: NEWLINE appended to \"'bwhf2.awk'\"
if test 3188 -ne `wc -c <'bwhf2.awk'`; then
    echo shar: \"'bwhf2.awk'\" unpacked with wrong size!
fi
# end of 'bwhf2.awk'
fi
if test -f 'bwhf.example_output' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'bwhf.example_output'\"
else
echo shar: Extracting \"'bwhf.example_output'\" \(1159 characters\)
sed "s/^X//" >'bwhf.example_output' <<'END_OF_FILE'
X                         BANDWIDTH WASTERS HALL OF FAME
X                                 for articles in
X                           /usr/spool/news/alt/sources
X                          Thu Sep 21 19:54:16 PDT 1989
X
X   Bytes  Volume  Offending
X  Wasted   Share  Articles     Guilty Party
X
X  730081  44.23%   18  pokey@well.UUCP (Jef Poskanzer)
X  294802  17.86%    6  mark@unix386.Convergent.COM (Mark Nudelman)
X  195626  11.85%    5  lwall@jato.Jpl.Nasa.Gov (Larry Wall)
X  149560   9.06%    1  raivio@procyon.hut.FI (Perttu Raivio)
X
X[30 lines of example output omitted to save bandwidth!]
X
X     496   0.03%    1  larrym@rigel.uucp (24121-E R Inghrim(3786)556)
X     477   0.03%    1  garyc@quasi.tek.com (Gary Combs;685-2072;60-720;;tekecs)
X-------- ------- ----
X 1650582  99.98%   67  Totals for 36 authors
X
X(Roundoff fuzz may make total share not equal 100.00%)
X
X(Sorry, if you posted from more than one site, you got more
Xthan one entry.  It's unavoidable; think about it!  But even
Xthough your subtotals look smaller, we know who you are!)
X
X[A shar file of the scripts used to create this article was
Xposted to alt.sources by the author, Kent Paul Dolan.]

END_OF_FILE
echo shar: NEWLINE appended to \"'bwhf.example_output'\"
if test 1160 -ne `wc -c <'bwhf.example_output'`; then
    echo shar: \"'bwhf.example_output'\" unpacked with wrong size!
fi
# end of 'bwhf.example_output'
fi
echo shar: End of shell archive.
exit 0

pmd@cbnews.ATT.COM (Paul Dubuc) (09/28/89)

In article <1618@tellab5.tellabs.CHI.IL.US> toth@tellab5.tellabs.CHI.IL.US (Joseph G. Toth Jr.) writes:
}In article <4254@wpi.wpi.edu>, jdutka@wpi.wpi.edu (John Dutka) writes:
}> I couldn't get the csh script to work on our system for
}> /usr/spool/news/comp/sys/mac - the program said the argument was too long -
}> any suggestions/help from anyone?
}> 
}
}I havn't looked at the source for the script, but;
}
}This comes from the fact that Unix(tm), at some point, creats a command
}with a list of parameters (filenames) to provide a list to the executed
}program.  The list is built on the prefix and meta character specifications.
} ...
}It might be possible to modify the script to 'cd <dirname>;cmd *'
}or 'pushd <dirname>;cmd *;popd' (if available) to expand the command
}line as;
}   <cmd> f1 f2 f3 ... fx
}
}and eliminate the long prefix on each file entry in the list to be
}processed.

A more permanent solution to this would be to use xargs(1).
-- 
Paul Dubuc   |   "To consider persons and events and
att!asr1!pmd |   situations only in the light of their
	     |   effect upon myself is to live on the
	     |   doorstep of hell"	Thomas Merton

suitti@haddock.ima.isc.com (Stephen Uitti) (09/29/89)

>}> I couldn't get the csh script to work on our system for
>}> /usr/spool/news/comp/sys/mac - the program said the argument was too long -
>}> any suggestions/help from anyone?

One suggestion is "don't bother".  comp.sys.mac is a high volume news group,
but most of the articles have something to do with reality, with few flame
wars.  Even when Canvas is compared to Superpaint, a description of both
products emerges.  The bug lists for uSoft Word 4.0 were long, but helpful.
This stupid script has been the cause of the most useless flame wars in
comp.arch and other groups.  I didn't save the program.  I suggest that people
actually delete their copies.  The high bandwidth waster was the posting of
the script.  With that action, hundreds or thousands of people could
mindlessly post long articles.  We don't need more mindless posts.

Stephen.