sources-request@mirror.UUCP (06/19/86)
Submitted by: talcott!seismo!epiwrl!epimass!jbuck Mod.sources: Volume 6, Issue 9 Archive-name: getpaths [ I had to reformat the AWK script in histog, strangely enough; otherwise AWK crashed. My version is in the file histog.mod --r$ ] # This is a shell archive. Remove anything before this line, # then unpack it by saving it in a file and typing "sh file". # Contents: 6sites README backbone-list closure getopt.c getpaths.1 getpaths.c # histog histog.mod hostcount makefile makefile.42 makefile.sysv # makefile.v7 oscantree.c scantree.c threelink twolink echo x - 6sites sed 's/^XX//' > "6sites" <<'@//E*O*F 6sites//' XXBACKBONE!decvax!seismo!ihnp4!tektronix!sdcrdcf!decwrl @//E*O*F 6sites// chmod u=rw,g=rw,o=rw 6sites echo x - README sed 's/^XX//' > "README" <<'@//E*O*F README//' XX A Collection of Usenet path-counting tools XX Joseph T. Buck - Entropic Processing, Inc. XXThis is a set of tools for computing statistics on the Usenet paths XXon your machine. XXThe program 'getpaths' gathers paths from your news spool, taking cross XXpostings into account (each path appears in the output once). XXSee the man page supplied. XXThere are three makefiles provided; I hope getpaths can be made to XXrun on anything. Use makefile.42 for 4.2bsd or 4.3bsd; makefile.sysv XXfor System III or System V; use makefile.v7 for Version 7 or 4.1bsd. XXIf you have trouble, there are three questions you have to answer: XX 1) Do you have the 4.2 directory access library? If so, use XX scantree.c, otherwise use oscantree.c. XX 2) v7 and bsd systems have functions "index" and "rindex". XX SysIII and SysV changed the names to strchr and strrchr, XX for no good reason. See the definition of CFLAGS in XX makefile.sysv. XX 3) Do you have getopt? (SysIII,V do, bsd, v7 don't). I'm XX including Henry Spencer's public domain version, since XX it's short and valuable and Berkeley folks can't run the XX program without it. XXYou may find the scantree functions useful in other programs. You're XXwelcome to them, but leave my name in, please. XXWith no options, getpaths prints the paths in every article on the standard XXoutput. If you expire news after one week, this will result in about 300 Kbytes XXof output. To restrict yourself to a subtree, you may give an option, which XXis a pathname that uses /usr/spool/news as a base. XX getpaths mod XX getpaths net/politics XXOnce you've used getpaths to gather some paths, you can use the shell scripts XXto compute some statistics on them: XXhistog - XX Computes the average path length and prints a histogram. XXhostcount - XX Counts hostnames that occur anywhere in a path and prints them XX in declining order of frequency. XXtwolink - XX Prints connected pairs of sites in declining order of frequency. XX I used twolink to prepare my most recent map. XXthreelink - XX Prints connected triplets of sites in declining order of frequency. XXFinally, there is "closure", which will give "awk" a good workout on your XXsystem. It is an automated version of the procedure that I used to build XXan extended backbone map. XXclosure takes three arguments. The first argument is the name of a XXfile containing the "core" sites you wish to start with (you can XXthink of these as "backbone" sites, but I'm not claiming to define what XX"backbone" really means). XXThe file must be in a somewhat peculiar format; each line must begin XXwith the word BACKBONE (in caps), followed by the site names, XXseparated by ! marks. For example, if you want to start with hosts XXalpha, beta, and gamma, the backbone file could be XXBACKBONE!alpha!beta!gamma XXor XXBACKBONE!alpha XXBACKBONE!beta XXBACKBONE!gamma XXYes, it's ugly. XXThe second argument is the name of the file containing a list of paths XX(as output by getpaths, perhaps). Finally, the third argument, an XXinteger, is a threshold. XXclosure searches the paths file for chains of paths that connect two XX"core" sites; a chain is disregarded unless it occurs at least XX"threshold" times, where "threshold" is the third argument. Every XXsite in each such chain is added to the "core". This is repeated XXuntil no more sites can be added to the "core". A report of each XXiteration and what paths are added to the core is printed on the XXstandard output. XXI've supplied two "initial cores" you can use; "6sites" includes XXihnp4, seismo, decvax, decwrl, tektronix, and sdcrdcf, and XX"backbone-list" contains all sites on Gene Spafford's backbone map. XXTry the commands XXgetpaths net > net.paths XXclosure 6sites net.paths 20 XXor XXclosure backbone-list net.paths 20 XXI'm interested in seeing the results for sites far from the SF Bay XXArea. You can write me at XX oliveb-\ XX \ XX ihnp4---pesnta--epimass!jbuck XX / XX hplabs-/ @//E*O*F README// chmod u=rw,g=rw,o=rw README echo x - backbone-list sed 's/^XX//' > "backbone-list" <<'@//E*O*F backbone-list//' XXBACKBONE!cbosgd!clyde!watmath!utzoo XXBACKBONE!ihnp4!cuae2!alberta!ubc-vision!burl!ulysses!bellcore XXBACKBONE!uw-beaver!tektronix!decvax!linus XXBACKBONE!qantel!hplabs!oliveb!glacier!decwrl XXBACKBONE!sdcrdcf!hao!sdcsvax!drillsys!gatech!akgua!mcnc!philabs XXBACKBONE!lll-lcc!lll-crg!cmcl2!seismo @//E*O*F backbone-list// chmod u=rw,g=rw,o=rw backbone-list echo x - closure sed 's/^XX//' > "closure" <<'@//E*O*F closure//' XX#! /bin/sh XX# closure: Joseph T. Buck, Entropic Processing, Inc. XX# Usage: closure startfile paths thresh XX# This rather awk-ward script builds a "transitive closure" of a backbone. XXif test $# != 3 XXthen echo Usage: $0 backbone-list paths threshold >&2; exit 1 XXfi XX# The file /tmp/$$c contains the current backbone list; /tmp/$$a contains XX# The latest set of additions. XXcp $1 /tmp/$$c XXecho "------ Original backbone: ------" XXawk -F! '{ for (i=1; i<=NF; i++) if ($i != "BACKBONE") printf ("%s ",$i); } XXEND { printf ("\n");}' /tmp/$$c XXi=1 XXwhile true XXdo XX awk -F! '{ XX if ($1 == "BACKBONE") { XX for (i = 2; i <= NF; i++) XX backbone[$i] = 1; XX } XX else { XX i = 1; XX while (i <= NF && !backbone[$i]) XX i++; XX if (i > NF) XX next; XX while (1) { XX while (i <= NF && backbone[$i]) XX i++; XX if (i > NF) XX break; XX first = i - 1; XX while (i <= NF && !backbone[$i]) XX i++; XX if (i > NF) XX break; XX for (j = first; j < i; j++) XX printf ("%s!", $j); XX printf ("%s\n", $i); XX } XX } XX}' /tmp/$$c $2 | sort | uniq -c |\ XX awk '{ if ($1 >= thr) printf("BACKBONE!%s\n", $2); }' thr=$3 - > /tmp/$$a XX if test ! -s /tmp/$$a XX then echo "------ Final backbone: ------" XX awk -F! '{for (i=2; i<NF; i++) print $i}' /tmp/$$c | sort -u XX rm /tmp/$$c /tmp/$$a XX exit 0 XX fi XX echo "------ Pass $i: add the following: ------" XX awk -F! '{ for (i=2; i<NF; i++) printf("%s!", $i); printf("%s\n", $NF);}' /tmp/$$a XX cat /tmp/$$a >> /tmp/$$c XX i=`expr $i + 1` XXdone @//E*O*F closure// chmod u=rwx,g=rwx,o=rwx closure echo x - getopt.c sed 's/^XX//' > "getopt.c" <<'@//E*O*F getopt.c//' XX/* XX * getopt - get option letter from argv XX * by Henry Spencer XX * posted to Usenet net.sources list XX */ XX#include <stdio.h> XXchar *optarg; /* Global argument pointer. */ XXint optind = 0; /* Global argv index. */ XXstatic char *scan = NULL; /* Private scan pointer. */ XXextern char *index(); XXint XXgetopt (argc, argv, optstring) XXint argc; XXchar *argv[]; XXchar *optstring; XX{ XX register char c; XX register char *place; XX optarg = NULL; XX if (scan == NULL || *scan == '\0') { XX if (optind == 0) XX optind++; XX if (optind >= argc || argv[optind][0] != '-' || argv[optind][1] == '\0') XX return (EOF); XX if (strcmp (argv[optind], "--") == 0) { XX optind++; XX return (EOF); XX } XX scan = argv[optind] + 1; XX optind++; XX } XX c = *scan++; XX place = index (optstring, c); XX if (place == NULL || c == ':') { XX fprintf (stderr, "%s: unknown option -%c\n", argv[0], c); XX return ('?'); XX } XX place++; XX if (*place == ':') { XX if (*scan != '\0') { XX optarg = scan; XX scan = NULL; XX } XX else { XX optarg = argv[optind]; XX optind++; XX } XX } XX return (c); XX} @//E*O*F getopt.c// chmod u=rw,g=rw,o=rw getopt.c echo x - getpaths.1 sed 's/^XX//' > "getpaths.1" <<'@//E*O*F getpaths.1//' XX.TH GETPATHS LOCAL 6/2/86 XX.SH NAME XXgetpaths \- gather Usenet article paths XX.SH SYNOPSIS XX.B getpaths XX[ XX.BI \-n " file" XX] [ XX.BI \-u XX] [ XX.I distrib XX] XX.SH DESCRIPTION XX.PP XX.I Getpaths XXscans through the news spool and prints out all article paths. Crossposted XXarticles have their paths printed only once. XX.PP XXBy default, usernames are removed from the paths. If the XX.B \-u XXflag is given, usernames are left on. XX.PP XXThe XX.BI \-n " file" XXoption causes paths to be included only from articles that are newer XX(in terms of file modification time) than XX.IR file . XX.PP XXIf no argument is given, the entire news spool will be scanned. The XX.I distrib XXargument is a relative path from /usr/spool/news. For example, XX.PP XXgetpaths net XX.PP XXgets paths from net.*, and XX.PP XXgetpaths net/micro/atari XX.PP XXdoes what you think. XX.SH BUGS XXThe algorithm for eliminating crosspostings only prints a path if XXthe first newsgroup on the Newsgroups: line matches the directory currently XXbeing scanned. So if you say "getpaths na", an article with the header XXline XX.PP XXNewsgroups: net.wanted,na.forsale XX.PP XXwill not have its path printed (though it will print if the groups are given XXin the reverse order). In other words, the first group is always considered XXthe primary group, and other groups are always considered crossposts. XX.SH AUTHOR XXJoseph T. Buck, Entropic Processing, Inc. XX.br XX{pesnta,oliveb}!epimass!jbuck @//E*O*F getpaths.1// chmod u=rw,g=rw,o=rw getpaths.1 echo x - getpaths.c sed 's/^XX//' > "getpaths.c" <<'@//E*O*F getpaths.c//' XX/* XX getpaths.c XX Program to snarf up lots of Usenet paths XX Joseph T. Buck, Entropic Processing, Inc. XX*/ XX#define SPOOLDIR "/usr/spool/news" XX#include <stdio.h> XX#define LINLEN 512 XXchar *strcpy(); XXint stripuser = 1; XX/* This function assumes the current directory is SPOOLDIR, and that XX the paths are of the form net/news/group/2345. XX*/ XXvoid perror(), exit(); XXwp (name, ino) XXchar *name; XXint ino; XX{ XX int status = 0, lng; XX char line[LINLEN], savpath[LINLEN]; XX FILE *fd; XX char *p, *q, *rindex(); XX char curngr[40]; /* current newsgroup */ XX/* Figure out the current newsgroup, by deleting whatever is XX after the last /, and changing '/' to '.'. Skip initial "./" XX if present XX*/ XX if (name[0] == '.' && name[1] == '/') name += 2; XX (void) strcpy (curngr, name); XX p = rindex (curngr, '/'); XX if (p == NULL) return; XX *p = 0; XX while (p >= curngr) { XX if (*p == '/') *p = '.'; XX p--; XX } XX lng = strlen (curngr); XX/* Open the file */ XX if ((fd = fopen (name, "r")) == NULL) { XX perror (name); XX return; XX } XX while (status != 3) { XX if (fgets (line, LINLEN, fd) == NULL) break; XX line[strlen(line)-1] = 0; /* delete \n */ XX if (strncmp (line, "Newsgroups:", 11) == 0) { XX p = line + 11; XX while (*p == ' ' || *p == '\t') p++; XX/* Detect cases like curngr == "net.micro", NG == "net.micro.pc,net.micro" */ XX if (p[lng] && p[lng] != ',' && p[lng] != ' ' && p[lng] != '\t') XX break; XX else if (strncmp (p, curngr, lng) != 0) break; XX status |= 2; XX } XX else if (strncmp (line, "Path:", 5) == 0) { XX p = line + 5; XX while (*p == ' ' || *p == '\t') p++; XX/* Strip the username from the end of the path. Don't write anything XX if there's no ! in the path (a locally posted article) XX*/ XX if (stripuser) { XX q = rindex (line, '!'); XX if (q == NULL) break; XX *q = '\0'; /* Delete the tail */ XX } XX (void) strcpy (savpath, p); XX status |= 1; XX } XX } XX if (status == 3) (void) printf ("%s\n", savpath); XX (void) fclose (fd); XX return; XX} XXmain (argc, argv) XXchar **argv; XX{ XX if (chdir (SPOOLDIR) != 0) { XX (void) fprintf (stderr, "Can't chdir to "); XX perror ("/usr/spool/news"); XX exit (1); XX } XX if (argc > 1 && strcmp (argv[1], "-u") == 0) { XX stripuser = 0; XX argv++; XX argc--; XX } XX if (argc !=2) argv[1] = "."; XX scan_tree (argv[1], wp); XX return 0; XX} XX#include <sys/types.h> XX#include <sys/stat.h> XXis_dir (name) XXchar *name; XX{ XX struct stat sbuf; XX return (stat (name, &sbuf) == 0 && (sbuf.st_mode & S_IFMT) == S_IFDIR); XX} @//E*O*F getpaths.c// chmod u=rw,g=rw,o=rw getpaths.c echo x - histog sed 's/^XX//' > "histog" <<'@//E*O*F histog//' XX#! /bin/sh XXawk -F! '{ count[NF] += 1; if (NF > mx) mx = NF; sum += NF; } XXEND { printf ("Path length statistics\nAverage length: %f\nHistogram:\n", XX sum / NR); XX for (i = 1; i <= mx; i++) printf ("%d %d\n",i,count[i]); }' @//E*O*F histog// chmod u=rwx,g=rwx,o=rwx histog echo x - histog.mod sed 's/^XX//' > "histog.mod" <<'@//E*O*F histog.mod//' XX#! /bin/sh XXawk -F! '{ count[NF] += 1; if (NF > mx) mx = NF; sum += NF; } XXEND { XX printf "Path length statistics\nAverage length: %f\nHistogram:\n", sum / NR; XX for (i = 1; i <= mx; i++) printf ("%d %d\n",i,count[i]); XX}' @//E*O*F histog.mod// chmod u=rwx,g=rwx,o=rwx histog.mod echo x - hostcount sed 's/^XX//' > "hostcount" <<'@//E*O*F hostcount//' XX#! /bin/sh XX# This script takes a list of paths as input. XX# It prints a list of the sites encountered in the paths, sorted XX# according to their frequency. XXawk -F! '{ for (i = 1; i <= NF; i++) print $i }' $* |\ XXsort | uniq -c | sort -nr @//E*O*F hostcount// chmod u=rwx,g=rwx,o=rwx hostcount echo x - makefile sed 's/^XX//' > "makefile" <<'@//E*O*F makefile//' XX# This makefile is appropriate for 4.2bsd. For SysV or V7, XX# replace "scantree" by "oscantree" everywhere. XXCFLAGS = -O XX# Use the following for Sys V. XX# CFLAGS = -O -Dindex=strchr -Drindex=strrchr XXgetpaths: getpaths.o scantree.o XX cc $(CFLAGS) getpaths.o scantree.o -o getpaths @//E*O*F makefile// chmod u=rw,g=rw,o=rw makefile echo x - makefile.42 sed 's/^XX//' > "makefile.42" <<'@//E*O*F makefile.42//' XX# This makefile is appropriate for 4.2bsd or 4.3bsd. XXCFLAGS = -O XXgetpaths: getpaths.o scantree.o getopt.o XX cc $(CFLAGS) getpaths.o scantree.o getopt.o -o getpaths @//E*O*F makefile.42// chmod u=rw,g=rw,o=rw makefile.42 echo x - makefile.sysv sed 's/^XX//' > "makefile.sysv" <<'@//E*O*F makefile.sysv//' XX# This makefile is appropriate for System III and System V. XXCFLAGS = -O -Dindex=strchr -Drindex=strrchr XXgetpaths: getpaths.o oscantree.o XX cc $(CFLAGS) getpaths.o oscantree.o -o getpaths @//E*O*F makefile.sysv// chmod u=rw,g=rw,o=rw makefile.sysv echo x - makefile.v7 sed 's/^XX//' > "makefile.v7" <<'@//E*O*F makefile.v7//' XX# This makefile is appropriate for Version 7, or any system that XX# has neither getopt nor the 4.2 directory library. XXCFLAGS = -O XXgetpaths: getpaths.o oscantree.o XX cc $(CFLAGS) getpaths.o oscantree.o -o getpaths @//E*O*F makefile.v7// chmod u=rw,g=rw,o=rw makefile.v7 echo x - oscantree.c sed 's/^XX//' > "oscantree.c" <<'@//E*O*F oscantree.c//' XX/* XX scan_tree: Joseph T. Buck, Entropic Processing, Inc. XX Old-style version XX scan_tree applies a function to each file in a directory tree XX (excluding directories), and returns the number of files processed. XX This version reads directories directly and will not work with 4.2bsd. XX*/ XX#include <sys/types.h> XX#include <sys/dir.h> XX#define MAXL 256 XXscan_tree (path, func) XXchar *path; XXint (*func)(); XX{ XX char name[MAXL], *strcpy(), *strncpy(), *pos; XX struct direct dirent; XX int fd = open (path), nf = 0, len; XX if (fd < 0) return 0; XX XX (void) strcpy (name, path); XX len = strlen (path); XX name[len] = '/'; XX pos = name + len + 1; XX pos[DIRSIZ] = '\0'; /* req'd in case dirent has length DIRSIZ */ XX while (read (fd, (char *) &dirent, sizeof dirent) == sizeof dirent) { XX if (dirent.d_ino == 0) continue; XX (void) strncpy (pos, dirent.d_name, DIRSIZ); XX if (strcmp (pos, ".") == 0 || strcmp (pos, "..") == 0) continue; XX if (is_dir (name)) nf += scan_tree (name, func); XX else { XX (func) (name, dirent.d_ino); XX nf += 1; XX } XX } XX (void) close (fd); XX return nf; XX} @//E*O*F oscantree.c// chmod u=rw,g=rw,o=rw oscantree.c echo x - scantree.c sed 's/^XX//' > "scantree.c" <<'@//E*O*F scantree.c//' XX/* XX scan_tree: Joseph T. Buck, Entropic Processing, Inc. XX New-style version XX scan_tree applies a function to each file in a directory tree XX (excluding directories), and returns the number of files processed. XX This version uses the 4.2bsd directory access library; another version XX that reads directories directly is available. XX*/ XX#include <sys/types.h> XX#include <sys/dir.h> XX#define MAXL 256 XXscan_tree (path, func) XXchar *path; XXint (*func)(); XX{ XX char name[MAXL], *strcpy(), *pos; XX DIR *dirp = opendir (path); XX struct direct *de; XX int nf = 0, len; XX if (dirp == NULL) return 0; XX (void) strcpy (name, path); XX len = strlen (path); XX name[len] = '/'; XX pos = name + len + 1; XX while ((de = readdir (dirp)) != NULL) { XX if (strcmp (de->d_name, ".") == 0 || strcmp (de->d_name, "..") == 0) XX continue; XX (void) strcpy (pos, de->d_name); XX if (is_dir (name)) nf += scan_tree (name, func); XX else { XX (func) (name,de->d_ino); XX nf += 1; XX } XX } XX (void) closedir (dirp); XX return nf; XX} @//E*O*F scantree.c// chmod u=rw,g=rw,o=rw scantree.c echo x - threelink sed 's/^XX//' > "threelink" <<'@//E*O*F threelink//' XX#! /bin/sh XX# This script takes a list of paths as input. XXcat $* |\ XXawk -F! '{ for (i=1; i<NF-1; i++) printf ("%s!%s!%s\n", $i,$(i+1),$(i+2)); }' |\ XXsort | uniq -c | sort -nr @//E*O*F threelink// chmod u=rwx,g=rwx,o=rwx threelink echo x - twolink sed 's/^XX//' > "twolink" <<'@//E*O*F twolink//' XX#! /bin/sh XX# This script takes a list of paths as input. XXcat $* |\ XXawk -F! '{ for (i = 1; i < NF; i++) printf ("%s!%s\n", $i, $(i+1)); }' |\ XXsort | uniq -c | sort -nr @//E*O*F twolink// chmod u=rwx,g=rwx,o=rwx twolink exit 0