[comp.unix.microport] Duplicate articles

davidsen@steinmetz.steinmetz.UUCP (William E. Davidsen Jr) (03/17/88)

At one time we got a batch of duplicate articles in some groups. I wrote
this little script to locate the articles, and optionally to prepare a
file of rm commands which could be fed to shell. There was a reason not
to remove them on the fly, but I don't remember it.

I hope no one else has this problem and this posting is totally useless
(but I doubt that it's true).

:
#
#  finddup - find duplicate entries in news
#
# enter the group name as a series of arguments, a list of dups
# will be output. Optionally a list of rm commands may be written
# to a file for execution.
#
# Example or find only:
#  finddup.sh comp arch
#
# Example of find and delete:
#  finddup.sh @r delfile comp arch
#  sh delfile
#
# @(#)finddup.sh v1.3, by bill davidsen, modified 1/19/88

# this code tests if the first argument is "@r". If so the next
# argument is taken as the name of an output file for the remove
# commands.
if [ "$1" = "@r" ]
then
  # convert to absolute pathname
  case "$2" in
  /.*) # absolute pathname
    rfile="$2";;
  *)   # relative pathname
    rfile=`pwd`
    rfile=$rfile/$2;;
  esac
  shift; shift
else
  rfile=""
fi

# build the directory name
dir=$NEWS
i=1
while [ $i -le $# ]
do
  eval dir=$dir/\$$i
  if [ $i -eq 1 ]
  then
    ngname=$1
  else
    eval ngname=$ngname.\$$i
  fi
  i=`expr $i + 1`
done

# change to the directory
if [ -d $dir ]
then
  cd $dir
  echo "Scanning newsgroup $ngname"
else
  echo "$ngname - no such group"
  exit 1;
fi

# are we building a remove list?
if [ -n "$rfile" ]
then
  echo "Building a remove list in $rfile"
fi

# build the topic list
for n in [1-9]*
do
  # see if any files found
  if [ ! -f $n ]
  then
    echo "No files in $ngname"
    exit 0;
  fi

  # scan for message id
  sed -n "
    /^Message-ID:/{
      s//$n:/
      p
      q
    }
  " $n
done |
sort -t: +1 |
awk '
BEGIN {
  indup = 0;
  oldmid = "";
  FS = ":";
}
{
  if ($2 == oldmid) {
    printf("Msg %d duplicates %d\n", $1, oldmnum);
    if (rfile != "") {
      printf("rm %s/%d\n", dir, $1) > rfile
    }
  }
  else {
    oldmid = $2;
    oldmnum = $1+0;
  }
}' rfile=$rfile dir=$dir -
-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs | seismo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me