ray@ole.UUCP (Ray Berry) (02/24/91)
Over the past year I've probably downloaded the massive SIMIBM.ARC file 2-3 times in an attempt to keep a reasonably current list on hand. Needless to say, this is not at all efficient from the standpoint of net bandwidth. OTOH, I've noticed that Keith has been pretty dependable lately about posting a monthly "update" file that specifies all the new entries on a per-calendar- month basis. So it seemed like a good idea to create a method for updating my master SIMTEL list with these monthly update files. The AWK script below is my first cut at the problem. I decided to use AWK rather than write a c program because of the uncertainty of both the availability and format of the files involved. Fortunately, both the master SIMTEL list and the updates are sorted both directory-wise and file-wise, so the job is basically just a simple merge sort. Obviously, this script doesn't address the question of identifying files that are deleted from SIMTEL, presumably because they are superceded with newer versions. Perhaps there could be some formalization of this update process, and a method found to handle deletions as well as insertions. (Starting to sound like a job for a diff'ed 'ed' script...). The script was developed (in DOS) with Thompson Automation AwkPlus (formally "PolyAwk"). Advisory messages to the "CON" device are nonstandard and won't work on other AWKs. Also, the 'ctime()' function in the BEGIN block is nonstandard. Sorry. For DOS environments, MKS awk has other ways of getting the date; OTOH, I don't know how to make MKS awk write to stderr or the CON device. At any rate, it should be very simple to adapt the script to your particular environment. Needless to say, corrections and/or improvments would be welcomed. Ray Berry ---SNIP---- # This awk script is intended to merge the monthly SIMTEL updates into the # master SIMTEL index list. Both lists are assumed to be sorted, both in # terms of the directory names, as well as the file lists for each directory. # usage : awk -f this_script update_file_name > new_master_file # No attempt is made to identify/delete older versions of archive entries # as newer versions are introduced. Matching the leading alpha portion # of filename entries isn't enough- too many false alarms are produced. # When new directory names are encountered, an advisory message is written # to the DOS CONsole. When update lines are encountered that match lines # already in the master file, a warning message is printed. # original author: Ray Berry - uucp:...sumax!ole!ray; CS:73407,3152 2/23/91 function print_thru_blank() { do { getline; print; } while ($0 != "") return; } BEGIN { INDEX = "simtel.lst" # whatever you call your master catalog file # document update in new master list print "merged file \"" ARGV[1] "\" on " ctime(); } $1=="Directory" { update_dir = $2; for(;;) { if (index(last_dir, update_dir)) { break; } # seek the next directory header in the master file for (;;) { if ( ! getline < INDEX ) { # a new directory name sorts behind the # last name currently in the master list. print "eof on " INDEX >"CON"; # copy the new directory to the output print ""; print "Directory " update_dir print "adding directory " update_dir > "CON" print_thru_blank(); next; } if ($1 == "Directory" ) break; print; } # check to see if update directory is not in master file if (update_dir < $2 ) { # copy the update directory data to the index file last_dir = $0; # save listfile directory name print "Directory " update_dir; print "adding directory " update_dir > "CON" print_thru_blank(); print last_dir; # print old directory name next; } print; # the Directory line if ( update_dir == $2 ) { break; } } # merge the file listings from main list & update file getline; getline x < INDEX for (;;) { if (x == $0) print "warning- duplicate lines!" >"CON"; if (x < $0) { print x; if ( getline x < INDEX == 0 ) { # read remainder of 'new' items print "eof on "INDEX > "CON" print; print_thru_blank(); exit; } if (x=="") { # read remainder of 'new' items print; print_thru_blank(); next; } } else { print; getline; if (!NF) { print x; next; #remainder of this directory list #gets printed in next Directory search } } } } END { # output remainder of INDEX file while ( getline < INDEX >0 ) print; } ----SNIP---- -- Ray Berry kb7ht uucp: ...sumax!ole!ray CIS: 73407,3152 /* "inquire within" */
raymond@math.berkeley.edu (Raymond Chen) (02/24/91)
If there is interest, I could post `monthly updates' of SIMTEL20 in the following form: PD1:<MSDOS.WHATEVER> Filename Type Length Date Description ============================================================================== -FOO11.ZIP B 1234 900101 Do something weird, version 1.1 +FOO11.ZIP B 5678 910101 Do something weird, version 1.2 where the `-' indicates files that have been deleted and the `+' indicates files that have been added. Note also that I posted several months ago a package of perl scripts that automatically incorporate Keith's monthly updates into a locally-maintained copy of the SIMTEL20 index. These are the same scripts that are used at the math.princeton.edu server. I will mail the scripts to any interested parties. (As Ray Berry pointed out, they really aren't very difficult scripts to write, since Keith Petersen did all of the hard work.) -- raymond@math.berkeley.edu Your friendly comp.sys.ibm.pc.misc archives administrator.