rusty@cadnetix.com (Rusty Carruth) (06/03/89)
Checksum: 3093563129 (Verify with "brik -cv") Posting-number: Volume 03, Issue 022 Submitted-by: rusty@cadnetix.com (Rusty Carruth) Archive-name: un_news/un_news.sha [ This is a package of shell and awk scripts that will automatically extract and uudecode Usenet postings. Because it will be useful to readers of this newsgroup, and because not all of them might read comp.sources.misc, I decided it was appropriate for posting here. Since these scripts are probably only usable on Berkeley UNIX systems, they are posted as they arrived, in shar form. They have not been tested here. -- R.D. ] [ From: ] Rusty Carruth UUCP:{uunet,boulder}!cadnetix!rusty DOMAIN: rusty@cadnetix.com Daisy/Cadnetix Corp. (303) 444-8075\ 5775 Flatiron Pkwy. \ Boulder, Co 80301 Radio: N7IKQ 'home': P.O.B. 461 \ Lafayette, CO 80026 #! /bin/sh # This is a shell archive. Remove anything before this line, then unpack # it by saving it into a file and typing "sh file". To overwrite existing # files, type "sh file -c". You can also feed this as standard input via # unshar, or by typing "sh <file", e.g.. If this archive is complete, you # will see the following message at the end: # "End of shell archive." # Contents: README news.un-news-er .news.autodearc news.unnews.awk1 # news.unnews.awk2 # Wrapped by rusty@rusty on Tue Apr 4 16:11:39 1989 PATH=/bin:/usr/bin:/usr/ucb ; export PATH if test -f 'README' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'README'\" else echo shar: Extracting \"'README'\" \(4874 characters\) sed "s/^X//" >'README' <<'END_OF_FILE' April 4, 1989 X X X Well, here it is, as promised. This little set of shell scripts, awk programs, and other junk is designed to make my life a bit nicer. I was regularly grabbing ALL articles in the source and binary newsgroups and then trying to find time to un-news them (un-news, n. - a term created by Rusty Carruth (at least, I think I created it) to denote the process of examining saved news articles for complete packages and applying the appropriate programs for reconstruction of the original package). X XFiguring that there had to be a better way, I wrote an automatic news-grabber which would copy all news articles from the newsgroups I was interested in over to my machine for later un-newsing. That kept me from missing any news articles. X However, I had a problem - I had megabytes of articles which had not yet been un-newsed. My disk was filling up! (Pretty bad for a 50 meg disk) So, the un-news scripts were born. X The version I now have (and am now posting) will attempt to process all uuencoded archives and all multiple-part shell archives. Both single and multiple-part uuencoded archives are handled. I have attempted to be sure that the articles are really uuencoded (or shar-ed, in the case of shell archives), and currently have no procedure for automatically patching, nor for automatically un-shar'ing the patch file. Because of the testing I have done, it is highly likely that some valid articles will be missed. X When the scripts have finished running, they will email the results to whatever userid you specify. (I have my scripts set up to run automatically every day, you may wish to not use this feature) X One thing to be aware of is that these scripts automatically run sh on archives which are recognized. If you wish to be very careful, you will not want un-checked programs running automatically. The first line of caution would be to run a 'safe shell' which disallows certain actions. To be really safe, remove the automatic un-shar portion of the script and inspect all shar archives by hand. X You will need to move the files to their 'correct' place after un-sharing them, see the file news.un-news-er for more info. If you want to run the script automatically, add a line like the following to your /usr/lib/crontab (or use at, or whatever): X X0 6 * * * su rusty < /usr/rusty/bin/news.un-news-er.csh >/dev/console X In my case, I run as rusty, and then run the 'news.un-news-er.csh' script, which is simply: X csh ~rusty/bin/news.un-news-er X to make the script run under the cshell rather than the bourne shell. X These scripts are probably very BSD-dependant, sorry about that. If you have SysV, you can remove a bunch of junk in the first awk script (<blah>.awk1) which makes up for not having a 'match' function in BSD awk. (Look at the long IF string...) X I have probably forgotten to tell you something very important, so I will apologize in advance: I apologize for forgetting something which made this not work on your system! :-) X Don't forget to move the files from where you un-shar them to wherever you decide you want them to live. X And a final note about the directory structure this thing expects. X The assumption about directory structures I made is that you will run the script from the top of a tree which matches the news directory structure, at least as far as you travel down it (**INCLUDING** having any directories contained at the level you descend to, e.g. if you go to comp/binaries/ibm/pc, then you must have the 'd' directory even if you don't save the comp/binaries/ibm/pc/d articles... THIS IS VERY IMPORTANT!). X Another assumption I made about directory structure is that only news articles would be 'visible'. All files which you do not wish X'un-news' to look at had better be hidden (start with a '.'). That is why '.TOTAKE' and '.COMPRESSED.STUFF' start with a '.', to keep them from being seen by the 'grep *' command. X You will find the un-newsed articles in the '.TOTAKE' subdir of the newsgroup from which the article originated, and the compressed article ends up in the '.COMPRESSED.STUFF' subdir. You may wish to link that one off somewhere (oops, BSD creeping in again!) so that the space is not used on your machine. X I do not have 'zoo' running on my sun yet, so the source files are not zoo-ed together. My "final" version will make zooing the last step of the un-shar process. If you have zoo on your Un*x box, you may wish to do this also. X This 'software' is 'unleashed', 1989 Carroll D. Carruth, Jr. X(thats my legal name). Please use/abuse/trash/ignore these scripts as you see fit. I would appreciate it if you send me bug fixes and enhancements (I think I will, anyway) at uunet!cadnetix!rusty or rusty@cadnetix.com. X Please don't sell these things, they are not worth it! X XFor all practical purposes, this is version 2.0 of this thing. X X--rusty carruth END_OF_FILE if test 4874 -ne `wc -c <'README'`; then echo shar: \"'README'\" unpacked with wrong size! fi # end of 'README' fi if test -f 'news.un-news-er' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'news.un-news-er'\" else echo shar: Extracting \"'news.un-news-er'\" \(7270 characters\) sed "s/^X//" >'news.un-news-er' <<'END_OF_FILE' X#! /bin/csh X# this script is a test script to be run in a subdir in which you wish to X# convert news articles into their uudecoded or unshar'd (as appropriate) X# peices. output goes into a directory called '.TOTAKE', compressed X# copies of the original articles go into '.COMPRESSED.STUFF'. A list X# of all subject lines can be found in '.subjects', and a X# list of sorted subject lines is in '.subjects.sorted'. X# X# Variables used by this script to access other files or users, X# and their default values X# HOMEDIR ~/News Where you have those X# directories set up X# GROUPLIST ~/.news.autodearc list of newsgroup dirs X# to scan X# SCRIPTDIR ~/bin where to find the secondary X# files X# MAILFILE ~/news.autodearc.results X# what file to use to put the X# results of this script X# user <from shell> who to mail results to. X# AWKSCRIPT news.unnews.awk1 name of awk script which X# finds subject lines of the X# form '<blah> part number/number' X# (and variations thereof) where X# both numbers are the same X# AWKSCRIPT2 news.unnews.awk2 same as above, but does not X# require numbers to be equal X# X# Directory structure example for comp.binaries.ibm.pc is: X# News/comp/binaries/ibm/pc <----ibm pc binary directory X# 1234 1235 1236 <----articles X# .TOTAKE <----where the uncrunched goes X# .COMPRESSED.STUFF <----where the article gets X# sent after un-doing X# d <----directory (empty, in X# this case) which X# corresponds to the X# newsgroup c.b.i.p.d X# THIS IS NEEDED! X# X# Note that there is a directory 'd' which is required to be kept X# even though you may not use it. The script checks to be sure X# that it is not trying to deal with directories on the news X# server, and it uses the local environment to filter them out. X# (Since we are not supposed to be munging around on the news X# server!) X# X# For shell archives, .TOTAKE will hold directories which contain X# the un-shar'd files for the archive. The name of the directory X# will be the name of the first archive volume. X# X# For the multiple-part decoder to work, the subject line must X# be of the form "Subject: vxxiyy: <blah> part n/n" (and some X# variations on that, see the awk script :-). Currently, the X# following newsgroups seem to (mostly) follow that convention: X# X# comp/sources/games X# comp/sources/misc X# comp/sources/x X# comp/binaries/ibm/pc X# (there may be others as well) X X#set echo ; set verbose X set HOMEDIR = ~/News set MAILFILE = ~/news.autodearc.results X set GROUPLIST = ~/.news.autodearc X set SCRIPTDIR = ~/bin set AWKSCRIPT = news.unnews.awk1 set AWKSCRIPT2 = news.unnews.awk2 X cd $HOMEDIR set subdirlist = `egrep -v '(^#)' $GROUPLIST` echo results of automatic de-newsing as of `date` >>$MAILFILE foreach subdir ($subdirlist) X pushd $subdir X echo "-----Doing $subdir-------------" >> $MAILFILE X if( ! -d .COMPRESSED.STUFF ) mv .COMPRESSED.STUFF COMPRESSED.STUFF.WHATSIT X if( ! -e .COMPRESSED.STUFF ) mkdir .COMPRESSED.STUFF X if( ! -d .TOTAKE ) mv .TOTAKE TOTAKE.WHATSTHIS X if( ! -e .TOTAKE ) mkdir .TOTAKE X set nonomatch ; rm .subjects* ; unset nonomatch X egrep '(^Subject: )' * | sort -f +2.0 > .subjects.sorted X awk -f $SCRIPTDIR/$AWKSCRIPT .subjects.sorted > .subjects.todo X set SUBLIST = `awk -F: '{print $1}' .subjects.todo` X# the above awk script returns the list of files which have 'part x/x' in them X set SETLIST = '' X foreach SET ($SUBLIST) echo $SET X set ZZZ = "(^$SET)" X set LINE = `egrep $ZZZ .subjects.todo` X set ARCNAME = `echo $LINE | awk -F: '{print $3}'` X @ NFILES = `echo $LINE | awk '{print $(NF-1)}'` X @ ENDLINE = `echo $LINE | awk '{print $NF}'` X @ STARTLINE = $ENDLINE + 1 - $NFILES X awk "NR<=$ENDLINE {print}" .subjects.sorted |awk "NR>=$STARTLINE {print}" > .garbage.list X set FILELIST = `awk -F: '{print $1}' .garbage.list` X set FIRSTFILE = `echo $FILELIST | awk '{print $1}'` X# I found a problem when the number of articles recieved did not equal X# the number in the group, so the following stuff tries to X# check for this. First, there must be the right number of files echo $FILELIST X if (`echo $FILELIST | wc -w` != $NFILES ) then X echo "Wrong number of files for $LINE">> $MAILFILE X else if( `awk -f $SCRIPTDIR/$AWKSCRIPT2 .garbage.list | wc -w` != $NFILES) then X# note that I am not doing nearly as much as I could here. A full test X# would be to make sure that the numbers went 1,2,3,... to $NFILES. X# I'll do all that if it turns out I need to! X echo "Strange error in file list: $LINE" >>$MAILFILE X# see if this is a uuencoded mess, below returns true if it is X else if ( "` egrep '(^begin *[0-9]+)' $FIRSTFILE`" !='' ) then X echo "uuencoded data: $LINE" >> $MAILFILE X pushd .TOTAKE X set UUFILENAME = `egrep '(^begin)' ../$FIRSTFILE |awk '{print $3}'` X set UUFILENAME = `echo $UUFILENAME | awk -F. '{print $1}'` X awk 'NR<=1 , /(^BEGIN)/ {print}' ../$FIRSTFILE >> $UUFILENAME.hdr X if ( -e .trashit. ) rm .trashit. X foreach SHARC ($FILELIST) X cat ../$SHARC |awk '/(^BEGIN)/,/(^END)/ {print}'|egrep -v '(^BEGIN)|(^END)' >>.trashit. X end X uudecode .trashit. && rm .trashit. X if( -e .trashit.) echo "uudecode failed" >> $MAILFILE X popd X mv $FILELIST .COMPRESSED.STUFF X# it could be argued that compressing uuencoded stuff is useless. X# It probably is a waste of time, but lets do it anywho. X pushd .COMPRESSED.STUFF ; compress $FILELIST ; popd X# see if it is a shar archive X else if ("`egrep '(^#! *\/bin)' $FIRSTFILE`" != '' ) then X echo "shar archive : $LINE" >> $MAILFILE X mkdir .TOTAKE/$ARCNAME X pushd .TOTAKE/$ARCNAME X awk 'NR<=1 , /(^#! *\/bin)/ {print}' ../../$FIRSTFILE > $FIRSTFILE.hdr X foreach SHARC ($FILELIST) X echo "Doing file $SHARC" >> unshar.report X cat ../../$SHARC |awk '/(^#! *\/bin)/,/(^exit)/ {print}' |sh >> unshar.report X end X popd X mv $FILELIST .COMPRESSED.STUFF X pushd .COMPRESSED.STUFF ; compress $FILELIST ; popd X else X echo "not recognized: $LINE" >> $MAILFILE X endif X end X#set echo ; set verbose X# now lets see if we can handle single files... X set FILES = `egrep -l '(^BEGIN-+)' *` X set FILES = `egrep -l '(^END-+)' $FILES` X set FILES = `egrep -l '(^begin *[0-9]+ )' $FILES` X set FILES = `egrep -l '(^end)' $FILES` X foreach file ($FILES) X if( "`awk '/(^BEGIN-)/,/(^END-)/{print}' $file|awk '/(^begin *[0-9]+)/,/(^end)/{if(NR == 2) print}'`" != '') then X echo "uuencoded data: `egrep '(^Subject)' $file`" >> $MAILFILE X pushd .TOTAKE X set UUFILENAME = `egrep '(^begin *[0-9]+)' ../$file |awk '{print $3}'` X if ( -e $UUFILENAME ) then X echo "WARNING -- $UUFILENAME already exists, not processed" >> $MAILFILE X else X set UUFILENAME = `echo $UUFILENAME | awk -F. '{print $1}'` X awk 'NR<=1 , /(^BEGIN-)/ {print}' ../$file >> $UUFILENAME.hdr X X cat ../$file | awk '/(^BEGIN-)/,/(^END-)/{print}' |awk '/(^begin *[0-9]+)/,/(^end)/{print}' | uudecode X endif X popd X mv $file .COMPRESSED.STUFF X pushd .COMPRESSED.STUFF ; compress $file ; popd X endif X end X echo "Still have `ls |wc -w` files left" >> $MAILFILE X popd end echo END of results >> $MAILFILE echo Mailing to $USER cat $MAILFILE | mail $USER && rm $MAILFILE END_OF_FILE if test 7270 -ne `wc -c <'news.un-news-er'`; then echo shar: \"'news.un-news-er'\" unpacked with wrong size! fi chmod +x 'news.un-news-er' # end of 'news.un-news-er' fi if test -f '.news.autodearc' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'.news.autodearc'\" else echo shar: Extracting \"'.news.autodearc'\" \(75 characters\) sed "s/^X//" >'.news.autodearc' <<'END_OF_FILE' comp/sources/x X#comp/sources/games X#comp/sources/unix comp/binaries/ibm/pc END_OF_FILE if test 75 -ne `wc -c <'.news.autodearc'`; then echo shar: \"'.news.autodearc'\" unpacked with wrong size! fi # end of '.news.autodearc' fi if test -f 'news.unnews.awk1' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'news.unnews.awk1'\" else echo shar: Extracting \"'news.unnews.awk1'\" \(2628 characters\) sed "s/^X//" >'news.unnews.awk1' <<'END_OF_FILE' X# this awk script will return only those lines which have 'part' near the X# end of the line and which also have 2 numbers following the 'part' X# (separated by either 'of' or '/') where those 2 numbers are equal. X# It also adds to the end of the line the number of files making up X# the set and the line number this line appears in the file. X/(part|Part|PART) *[0-9]+\/[0-9]+ *$/ { X # match Part<num>/<num>$, now get last number X #STRNG = "/[0-9]*" X # TEST1 = match( $NF, STRNG ) X I = length ( $NF ) X STRNG = $NF X while ( substr(STRNG,I,1) != "/" ) I-- X LASTNUM = substr( $NF, I + 1 ) X #TEST1 = match($NF, "[0-9]*") X I = 1 X while ( substr(STRNG,I,1) != "/" ) I++ X J = 0 X found = 0 X while (found == 0) { X J = J + 1 X if ( substr(STRNG,J,1) == "0" ) found = 1 X else if ( substr(STRNG,J,1) == "1" ) found = 1 X else if ( substr(STRNG,J,1) == "2" ) found = 1 X else if ( substr(STRNG,J,1) == "3" ) found = 1 X else if ( substr(STRNG,J,1) == "4" ) found = 1 X else if ( substr(STRNG,J,1) == "5" ) found = 1 X else if ( substr(STRNG,J,1) == "6" ) found = 1 X else if ( substr(STRNG,J,1) == "7" ) found = 1 X else if ( substr(STRNG,J,1) == "8" ) found = 1 X else if ( substr(STRNG,J,1) == "9" ) found = 1 X endif endif endif endif endif endif endif endif endif endif X } X FIRSTNUM =substr( $NF, J ,I - J ) X if( FIRSTNUM == LASTNUM ) X print $0 " " LASTNUM " " NR X } X X/((part|Part|PART) *[0-9]+\/[0-9]+\) *$)/ { X # match Part<num>/<num>$, now get last number X #STRNG = "/[0-9]*" X # TEST1 = match( $NF, STRNG ) X I = length ( $NF ) - 1 X STRNG = $NF X while ( substr(STRNG,I,1) != "/" ) I-- X LASTNUM = substr( $NF, I + 1, length($NF) - I - 1) X #TEST1 = match($NF, "[0-9]*") X I = 1 X while ( substr(STRNG,I,1) != "/" ) I++ X J = 0 X found = 0 X while (found == 0) { X J = J + 1 X if ( substr(STRNG,J,1) == "0" ) found = 1 X else if ( substr(STRNG,J,1) == "1" ) found = 1 X else if ( substr(STRNG,J,1) == "2" ) found = 1 X else if ( substr(STRNG,J,1) == "3" ) found = 1 X else if ( substr(STRNG,J,1) == "4" ) found = 1 X else if ( substr(STRNG,J,1) == "5" ) found = 1 X else if ( substr(STRNG,J,1) == "6" ) found = 1 X else if ( substr(STRNG,J,1) == "7" ) found = 1 X else if ( substr(STRNG,J,1) == "8" ) found = 1 X else if ( substr(STRNG,J,1) == "9" ) found = 1 X endif endif endif endif endif endif endif endif endif endif X } X FIRSTNUM =substr( $NF, J ,I - J ) X if( FIRSTNUM == LASTNUM ) X print $0 " " LASTNUM " " NR X } X X/(part|Part|PART) +[0-9]+ +of +[0-9]+ *$/ { X # found Part<num> of <num>$, now get last number X if( $NF == $(NF-2) ) X print $0 " " $NF " " NR X } X END_OF_FILE if test 2628 -ne `wc -c <'news.unnews.awk1'`; then echo shar: \"'news.unnews.awk1'\" unpacked with wrong size! fi # end of 'news.unnews.awk1' fi if test -f 'news.unnews.awk2' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'news.unnews.awk2'\" else echo shar: Extracting \"'news.unnews.awk2'\" \(1148 characters\) sed "s/^X//" >'news.unnews.awk2' <<'END_OF_FILE' X# this awk script looks at lines which have 'part' near the X# end of the line and which also have 2 numbers following the 'part' X# (separated by either 'of' or '/'), and returns the first of those 2 numbers. X/(part|Part|PART) *[0-9]+\/[0-9]+[ |/)] *$/ { X # match Part<num>/<num>$ X I = length ( $NF ) X STRNG = $NF X I = 1 X while ( substr(STRNG,I,1) != "/" ) I++ X J = 0 X found = 0 X while (found == 0) { X J = J + 1 X if ( substr(STRNG,J,1) == "0" ) found = 1 X else if ( substr(STRNG,J,1) == "1" ) found = 1 X else if ( substr(STRNG,J,1) == "2" ) found = 1 X else if ( substr(STRNG,J,1) == "3" ) found = 1 X else if ( substr(STRNG,J,1) == "4" ) found = 1 X else if ( substr(STRNG,J,1) == "5" ) found = 1 X else if ( substr(STRNG,J,1) == "6" ) found = 1 X else if ( substr(STRNG,J,1) == "7" ) found = 1 X else if ( substr(STRNG,J,1) == "8" ) found = 1 X else if ( substr(STRNG,J,1) == "9" ) found = 1 X endif endif endif endif endif endif endif endif endif endif X } X FIRSTNUM =substr( $NF, J ,I - J ) X print FIRSTNUM X } X X/(part|Part|PART) +[0-9]+ +of +[0-9]+ *$/ { X # found Part<num> of <num>$, X print $(NF-2) X } X END_OF_FILE if test 1148 -ne `wc -c <'news.unnews.awk2'`; then echo shar: \"'news.unnews.awk2'\" unpacked with wrong size! fi # end of 'news.unnews.awk2' fi echo shar: End of shell archive. exit 0