palkovic@tomato.fnal.gov (John Palkovic) (03/02/91)
I posted this script yesterday (<9Y`#-6-@linac.fnal.gov> in news.software.b) to remove old news floods (newly arrived articles with old Date: headers): #!/bin/sh # # Old.rm. Trash new news articles with old Date: headers. # J. Palkovic 2/28/91 # # You may need to change this next line: PATH=/usr/lib/newsbin:/usr/local/bin:/usr/bin TODAY="`date`" TODAY=`getdate "$TODAY"` LIMIT=`expr $TODAY - 1209600` cd /usr/spool/news find `ls |egrep -v '\.'` -mtime -7 -name '[0-9]*' -type f -print |( while read f do DATE=`header -f Date $f` THEN=`getdate "$DATE" 2>/dev/null` if test "$THEN" then test $THEN -lt $LIMIT && rm -f $f fi done ) Header is a C program that extracts header fields, written by Chip Salzenberg (see above mentioned post for source to header). The question I have for all you comp.unix.shell gurus is: how can I speed this script up? It took more than two hours to run on our 150+ MB news spool. I have some ideas, but would like to hear from you. -John -- palkovic@linac.fnal.gov || {tellab5,royko,simon}!linac!palkovic
tchrist@convex.COM (Tom Christiansen) (03/02/91)
From the keyboard of palkovic@linac.fnal.gov: :I posted this script yesterday (<9Y`#-6-@linac.fnal.gov> in :news.software.b) to remove old news floods (newly arrived articles :with old Date: headers): : :#!/bin/sh :# :# Old.rm. Trash new news articles with old Date: headers. :# J. Palkovic 2/28/91 :# :# You may need to change this next line: :PATH=/usr/lib/newsbin:/usr/local/bin:/usr/bin :TODAY="`date`" :TODAY=`getdate "$TODAY"` :LIMIT=`expr $TODAY - 1209600` :cd /usr/spool/news :find `ls |egrep -v '\.'` -mtime -7 -name '[0-9]*' -type f -print |( : while read f : do : DATE=`header -f Date $f` : THEN=`getdate "$DATE" 2>/dev/null` : if test "$THEN" : then : test $THEN -lt $LIMIT && rm -f $f : fi : done :) : :Header is a C program that extracts header fields, written by Chip :Salzenberg (see above mentioned post for source to header). The :question I have for all you comp.unix.shell gurus is: how can I speed :this script up? It took more than two hours to run on our 150+ MB news :spool. I have some ideas, but would like to hear from you. The find, ls, and egrep aren't really a problem, as they only execute once. Your big hit is that you have a lot of execs going in that tight loop. I would try two things: first, run it under ksh (if you have it) and make the tests be builtin. That'll save you two execs. The second thing I would try is to code up the whole tight loop in perl, since the tests, rm, and header stuff are all trivial to do there. This could save you all the execs in the loop, which I think would be a big win. The gotcha is the getdate: I've got perl library routines to go from both ctime and `date` output format, but getdate is actually much more clever than just that. I'm not convinced that all the dates will be in either ctimer or `date` format: is this mandatory? A peremptory perusal predicts promising perl processing, but perhaps perfect processing is preferable. So maybe I'd just use `getdate` after all and eat one exec. However, I didn't find any failures in the 3000 articles I just tested, so maybe perfection isn't worth it. --tom -- "UNIX was not designed to stop you from doing stupid things, because that would also stop you from doing clever things." -- Doug Gwyn Tom Christiansen tchrist@convex.com convex!tchrist