mason@tmsoft.UUCP (07/08/87)
Here's the stuff I promised. I would like to hear of non-portabilities in the code, and any sites that you KNOW are in Canada that aren't in the list. I will be on vacation until early August. At that time I will re-post this with fixes to the whole net (in case other people want to analyse their gateways) and ask uucp/news admins around the country to send me their results. So, play with these for a month, look at the list of your benefactors (you could even send them a thank-you note :-) & I'll talk to you in August. Have a good July ../Dave Mason, TM Software Associates (Compilers & System Consulting) ..!{utzoo seismo!mnetor utcsri utgpu lsuc}!tmsoft!mason #! /bin/sh # This is a shell archive, meaning: # 1. Remove everything above the #! /bin/sh line. # 2. Save the resulting text in a file. # 3. Execute the file with /bin/sh (not csh) to create the files: # README.stats # Makefile # can.sites # stats.c # This archive created: Wed Jul 8 09:11:53 1987 export PATH; PATH=/bin:$PATH if test -f 'README.stats' then echo shar: will not over-write existing file "'README.stats'" else cat << \SHAR_EOF > 'README.stats' This program is written to analyse news and mail paths in order to determine the mail and news gateways into a region. It takes a list of file names on standard-in, and will scan each of the files looking for headers of the form: Path: Date: Posted: Received: From_ >From_ and extracts dates and site-names from these headers. It does a fairly simple analysis on the dates & splits the transit period into various categories (note that it only understands 2 date formats, but fortunately almost all the dates we have here fit one of the formats). A new option (thanks to Rayan's hassling) is the -p option. This says that standard in is a sequence of paths, one per line. If the 'p' is immediately followed by a character, everything up to and including the first occurence of that character on each input line will be ignored. This means that if you have a list of mail paths that you have collected somewhere you can see an analysis of them. A neat use of this is to analyse your pathalias database (if you have one) to see who your benefactors are for outgoing mail. A command of the form: ./stats -p' ' can.sites </usr/lib/uucp/paths will do this. It analyses the paths to determine how much mail/news is local, how much is entirely within the region, how much was brought into the region directly, and how much came via each of the up to 28 gateways. The point of all this is to see if an alternate organization of the network in the region would be in some sense "better". The data file can.sites must have the list of sites in any order, one per line. Blanks lines, and lines starting with '#' are ignored. Domains that are entirely within the region can be listed. The program uses a binary search (unfortunately included because BSD Unix doesn't come with the search library), so it first sorts the list. This is part of an effort by /usr/group/cdn to determine if it should support a Canadian Hub node. (Note that mail/news comes from many sites that are not listed in the maps, therefore you may want to edit the can.sites file to make it reflect your local reality, though I'd like to know paths to them so that any future comments can go out to all). If you know of errors in the can.sites file (either sites I have listed as Canadian that aren't, or sites that I have omitted that are) please send me these ammendments. I will send this out again, with an ammended list, and any bug fixes, to comp.sources.unix (or something similar) in August, so please try this. SHAR_EOF fi # end of overwriting check if test -f 'Makefile' then echo shar: will not over-write existing file "'Makefile'" else cat << \SHAR_EOF > 'Makefile' #debugging on SysV: CFLAGS=-g -O #for BSD: CFLAGS=-O -DBSD #for SysV: CFLAGS=-O test: stats # find /u/*/Mail -type f -print | ./stats can.sites # find /usr/spool/news -type f -print | ./stats can.sites ./stats -p' ' can.sites </usr/lib/uucp/paths stats: stats.c ${CC} ${CFLAGS} -o stats stats.c SHAR_EOF fi # end of overwriting check if test -f 'can.sites' then echo shar: will not over-write existing file "'can.sites'" else cat << \SHAR_EOF > 'can.sites' # assembled by Dave Mason <mason@tmsoft> from map data 87.07.07 # don't hold me responsible, but I think these are most of # the Canadian sites that are directly reachable # the following sites are 'DIRECT' calls to listed sites # so presumably they are Canadian sites # In any case they're not well connected, so shouldn't mess up the stats # local to alberta sites acsedm astotin cadomin ggc0 # local to british columbia sites attvcr bby-bc fornax #all these are linked to mprg, I guess dssmv0 handel joplin liszt mprc mprd mprott waters # local to ontario sites actel alias bml bnr-ai bnr-mtl bnr-rsc bvax # there's a problem here. zorac talks to daq, but there's a daq in texas daq cmpscr # local to quebec sites cambs # miscellany mkv020 utihs redvax utmars sunedm cavell uwocc1 uvicar winston uogvax2 shoshin vending bruno yup wildcan attcan jasper spyvan manitou orchid # domains - all I know about .cdn .sq.com .toronto.edu .unicus.com .waterloo.edu # this isn't official, but it is referenced somewhere .can sq.com unicus.com aesat alberta alpha0 amigpx aquila arcsun arlene array attila aucs auvax biomel blues.db bnr-di bnr-vpa braegen cae calgary cdl chp clan cle clouso clunk cmq1 cnrail cognos cott01 crcmar csb dalcs dalcsug dalstat danger darwin dataspan daver dciem deepthot dmnhack dvlmarv eclectic edm electro elora ers et force10 forgen gandalf garfield gass geac gen1 genat hcr hcrvax hcrvx1 hcrvx2 idacom image.me iros1 jazz.db julian kimnovax lathe.me lethe lightning looking loyalist lsuc maccs marcel mars.math math matrix mcgill-vision melody.db methods mgvax micomvax mill.me mks mmainc mnetor molihp moore mosart mosca mprg mprvaxa mprvaxb ms msitxt munucs musocs ncc nermal nrcaer nrcctis nte-scg nvanbc odie odyssee ois.db onfcanim ontmoh orcisi oscvax othervax parkridge parkwood pcchui.db pcfred.db pcs pcssun per pmbrc psddevl quality qucis radha ragno regina rhodnius rhythm.db rom ryenat ryesone sask sfucmpt sickkids skalar skatter skeng skerth skorpio skul skvlsi skyblu spectrix spycal sq sqrt squad squat squawk squeak squish stars stjoes strat syntron tango.db tap.me tcc3b1 teknica teletron telly thunder tmsoft trigraph tsgfred tslanpar tunscs unicus uottawa uqv-mts uvicctr van-bc vivarium xicom xios yetti yugauss yumath zap zaphod zorac # U of British Columbia ubc-andrew ubc-bdcvax ubc-cryos ubc-cs ubc-cs4 ubc-csfs1 ubc-csgrads ubc-dsrg ubc-ean ubc-mts ubc-rtec ubc-ug1 ubc-ug2 ubc-ug3 ubc-ug4 ubc-ug5 ubc-ugserver ubc-vision # U of Toronto # lots of these weren't listed, but they're pretty obvious # though 'me', 'ecf' shouldn't be on news path names # and I'm not sure about these *.ai names ephemeral.ai graeme.ai maria.ai ray.ai thera.ai utai utas1 utas2 utas3 utcdfa utcdfb utcdfc utcga utcgb utcseri utcsri utcssca utcsscb utdgp ecf utecf16k utecfb utecfe utecfmv01 utecfmv02 utecfmv03 utecfmv04 utegc uteuler utfang utflis utfyzx utglg utgpu # and by its old name utcs uthub utjaws utmanitou me utme utmolar utradio utrim utscar utstat utteeth utterly.ai uturing utubrutus utworm utzoo # University of Waterloo watcgl watdcsu watdragon wateng water watmath wataco watacs watale watarts watbank watbun watcal watcsg watdaisy watdcs watimp watlager watlion watmad watmsg watmum watnot watrose watsos1 watsos2 watstat watsup1 watvlach watvlsi watwml SHAR_EOF fi # end of overwriting check if test -f 'stats.c' then echo shar: will not over-write existing file "'stats.c'" else cat << \SHAR_EOF > 'stats.c' /* Copyright 1987 TM Software Associates Inc, Toronto, Canada */ /* Cannot be sold for profit without permission from the copyright holder */ /* Scan mail and news to gather traffic statistics */ #include <stdio.h> #include <sys/types.h> #include <sys/stat.h> #include <ctype.h> #include <strings.h> extern char *strchr(),*strrchr(); /* Rayan says BSD needs this */ #define NNODES 400 #define NTYPES 32 #define PREST 35 #ifdef BSD #define strchr index #define strrchr rindex #endif char buff[1024]; #define SITESIZE 20 struct node{char name[SITESIZE];}; struct node can_sites[NNODES+5]; int datestats[NTYPES*2]; int pathstats[NTYPES*2]; char *sitenames[NTYPES]={"local path","Canadian path","Direct from Abroad"}; char pathrest[NTYPES][PREST+1]; int warning,nsites=3; unsigned nnodes; struct stat filestat; main(argc,argv) int argc; char **argv; { char filename[512]; register char *cp; register int ismail,i,total,pathflag=0; static char *time_periods[NTYPES] = { "time warp", "<1 hour", "1-3 hours", "4-7 hours", "8-11 hours", "12-24 hours", "2 days", "3 days", "4-5 days", "6-7 days", "a week or more", "?????11???", "?????12???", "?????13???", "?????14???"}; time_periods[NTYPES-1]="invalid times"; if (argc==3 && argv[1][0]=='-' && argv[1][1]=='p') { --argc; ++argv; pathflag=argv[0][2]; if (!pathflag) pathflag= -1; } if (argc!=2) { fprintf(stderr,"Usage: stats [-p[c]] can.sites\n"); exit(1); } get_sites(argv[1]); while (gets(filename)!=NULL) { if (pathflag) { if (pathflag>0 && (cp=strchr(filename,pathflag))) ++cp; else cp=filename; do_pathstats(1,cp); ++datestats[NTYPES*2-1]; } else { if (stat(filename,&filestat)==0 && (filestat.st_mode&0170000)!=0040000) scan(filename); } } for (ismail=0;ismail<=NTYPES;ismail+=NTYPES) { total=0; for (i=0;i<NTYPES;++i) total += datestats[ismail+i]; if (total) { if (pathflag) printf("\nPath Analysis\n"); else { if (ismail) printf("\nMail\n"); else printf("\nNews\n"); for (i=0;i<NTYPES;++i) { if (datestats[ismail+i]) printf(" %20.20s %4d %3d%%\n", time_periods[i], datestats[ismail+i], datestats[ismail+i]*100/total); } } printf(" Gateways from outside Canada\n"); for (i=0;i<NTYPES;++i) { if (pathrest[i][0] && (cp=strchr(pathrest[i],' '))) *cp='\0'; if (pathstats[ismail+i]) printf(" %20.20s %4d %3d%% %s\n", sitenames[i], pathstats[ismail+i], pathstats[ismail+i]*100/total, pathrest[i]); } } } return(0); } scan(filename) char *filename; { register FILE *fp; char pathbuf[512],datebuf[512],postbuf[512],recvbuf[512], frombuf[512],lastbuf[512]; register char *cp; register int ismail,realmail=0; if ((fp=fopen(filename,"r"))!=NULL) { frombuf[0]='\0'; for(;;) { pathbuf[0]='\0'; datebuf[0]='\0'; postbuf[0]='\0'; recvbuf[0]='\0'; while (inline(fp)) { if (strncmp(buff,"Path:",5)==0) { if (cp=strchr(&buff[6],'!')) ++cp; else cp = &buff[6]; strcpy(pathbuf,cp); } if (strncmp(buff,"Date:",5)==0) strcpy(datebuf,&buff[6]); if (strncmp(buff,"Posted:",7)==0) strcpy(postbuf,&buff[8]); if (!recvbuf[0] && strncmp(buff,"Received:",9)==0) strcpy(recvbuf,&buff[10]); if (strncmp(buff,"Received:",9)==0) strcpy(lastbuf,&buff[10]); if (strncmp(buff,">From ",5)==0) strcpy(lastbuf,&buff[6]); if (strncmp(buff,"From",4)==0 && (buff[4]==' ' || buff[4]=='\t')) { strcpy(frombuf,&buff[5]); ++realmail; } if (!frombuf[0] && strncmp(buff,"Unix-From:",10)==0) strcpy(frombuf,&buff[11]); } if (!datebuf[0]) strcpy(datebuf,postbuf); ismail=0; if (frombuf[0]) { ++ismail; strcpy(pathbuf,frombuf); } if (pathbuf[0]) { do_datestats(ismail,frombuf,recvbuf,datebuf,lastbuf); do_pathstats(ismail,pathbuf); } if (!realmail) break; while (!feof(fp)) { (void) inline(fp); if (strncmp(buff,"From",4)==0 && (buff[4]==' ' || buff[4]=='\t')) { strcpy(frombuf,&buff[5]); break; } } if (feof(fp)) break; } fclose(fp); } } get_sites(fname) char *fname; { register char *cp; register FILE *fp; nnodes=0; if (fp=fopen(fname,"r")) { while(fgets(can_sites[nnodes].name,SITESIZE,fp)) { if (!can_sites[nnodes].name[0] || can_sites[nnodes].name[0]=='#') continue; if (nnodes>=NNODES) { fprintf(stderr,"stats: too many nodes in %s\n",fname); exit(1); } if (cp=strchr(can_sites[nnodes].name,'\n')) *cp='\0'; if (strlen(can_sites[nnodes].name)>=SITESIZE) { fprintf(stderr,"stats: node name too long: %s\n",can_sites[nnodes].name); exit(1); } ++nnodes; } close(fp); qsort((char *)can_sites,nnodes,SITESIZE,strcmp); } } char *bsearch(cp) char *cp; { register struct node *hp,*lp,*mp; register int i; hp = &can_sites[nnodes]; lp = can_sites; while (hp>lp) { mp = ((hp-lp)>>1) + lp; if ((i=strcmp(mp->name,cp))==0) return (mp->name); else if (i<0) lp=mp+1; else hp=mp; } return (NULL); } do_pathstats(ismail,pathbuf) char *pathbuf; { register int patht; register char *cp,*tp,*sp,*pp,*np; if (cp=strchr(pathbuf,'!')) { #if DEBUG printf("%s",pathbuf); #endif cp=pathbuf; sp=pp=NULL; while (np=strchr(cp,'!')) { *np='\0'; for (tp=cp;tp<np;++tp) if (isupper(*tp)) *tp=tolower(*tp); if ((tp=strrchr(cp,'.')) && strcmp(tp,".uucp")==0) *tp='\0'; pp=sp; sp=bsearch(cp); tp=cp; while (!sp && (tp=strchr(tp+1,'.'))) sp=bsearch(tp); if (!sp) break; cp=np+1; } if (!pp) { if (np) { patht=2; if (!pathrest[patht][0]) strncpy(pathrest[patht],cp,PREST); } else patht=1; } else if (np) { for (patht=3;patht<nsites;++patht) if (sitenames[patht]==pp) break; if (patht==nsites) if (nsites<NTYPES) { sitenames[patht]=pp; *np='!'; strncpy(pathrest[patht],cp,PREST); ++nsites; } else sitenames[--patht]="other gateways"; #ifdef DEBUG printf("%d %d %s\n",patht,nsites,pp); #endif } else patht=1; } else patht=0; ++pathstats[ismail*NTYPES+patht]; } do_datestats(ismail,frombuf,recvbuf,datebuf,lastbuf) char *frombuf,*recvbuf,*datebuf,*lastbuf; { register int i; time_t sent,recvd; recvd=filestat.st_mtime; if (!scandate(frombuf,&recvd,"From")) (void) scandate(recvbuf,&recvd,"Recv"); if (!datebuf[0] || ( !scandate(datebuf,&sent,"Date") && !scandate(lastbuf,&sent,">From"))) i=NTYPES-1; else if (recvd<sent) i=0; else { i=(recvd-sent)/60; if (i<60) i=1; else if (i<4*60) i=2; else if (i<8*60) i=3; else if (i<12*60) i=4; else if (i<24*60) i=5; else if (i<48*60) i=6; else if (i<72*60) i=7; else if (i<120*60) i=8; else if (i<7*24*60) i=9; else i=10; } #ifdef DEBUG printf("%d %d %d\n",i,sent,recvd); #endif ++datestats[ismail*NTYPES+i]; } scandate(datebuf,when) char *datebuf; time_t *when; { register char *ap,*cp=datebuf; register long i,j; char *mp,*dp,*yp,*tp; struct namenum{char name[3];int num}; static struct namenum timezones[] = { {"GMT",0}, {"NST",-4*60+30}, {"NDT",-3*60+30}, {"AST",-4*60}, {"ADT",-3*60}, {"EST",-5*60}, {"EDT",-4*60}, {"CST",-6*60}, {"CDT",-5*60}, {"MST",-7*60}, {"MDT",-6*60}, {"PST",-8*60}, {"PDT",-7*60}, {"YST",-9*60}, {"YDT",-8*60}}; static struct namenum months[12] = { {"Jan",0}, {"Feb",31}, {"Mar",59}, {"Apr",90}, {"May",120}, {"Jun",151}, {"Jul",181}, {"Aug",212}, {"Sep",243}, {"Oct",273}, {"Nov",304}, {"Dec",334}}; if (!*cp) return(0); while (*cp) { mp=NULL; if (!isdigit(*cp)) { if (isdigit(cp[5]) && cp[9]==':' && isupper(*cp) && isalpha(cp[1]) && isalpha(cp[2]) && isdigit(cp[17]) && isdigit(cp[18])) { mp=cp; dp=cp+4; if (*dp==' ') ++dp; yp=cp+17; tp=cp+7; } } else { ap=cp+1; if (*ap==' ' && isdigit(*cp)) ap=cp; if (isdigit(*ap) && ap[1]==' ' && isupper(ap[2]) && isalpha(ap[3]) && isalpha(ap[4]) && ap[5]==' ' && isdigit(ap[6]) && isdigit(ap[7])) { mp=ap+2; dp=cp; yp=ap+6; tp=ap+9; } } if (mp==NULL) { ++cp; continue; } j=atoi(yp); if (!islower(mp[1])) mp[1]=tolower(mp[1]); if (!islower(mp[2])) mp[2]=tolower(mp[2]); for (i=0;i<12;++i) if (strncmp(months[i].name,mp,3)==0) { i=months[i].num; if (((j&3)==0) && i>50) ++i; break; } if (i==12) { ++cp; continue; } if (j<100) j+=1900; i += (j-1969)/4; i += (j-1970)*365 + atoi(dp) - 1; i *= 24*60*60; #ifdef DEBUG printf("%s:%s ",what,cp); #endif cp=ap+8; if (isdigit(tp[0])) i+=atoi(tp)*60*60; else i+=atoi(&tp[1])*60*60; i+=atoi(&tp[3])*60; ap = tp+5; if (*ap==':') ap += 3; while (*ap==' ') ++ap; if (islower(*ap)) *ap=toupper(*ap); if (islower(ap[1])) ap[1]=toupper(ap[1]); if (islower(ap[2])) ap[2]=toupper(ap[2]); for (j=0;j<(sizeof timezones)/(sizeof(struct namenum));++j) if (strncmp(timezones[j].name,ap,3)==0) { *when = i + timezones[j].num*60; #ifdef DEBUG printf("%d %s",*when,asctime(gmtime(when))); #endif return(1); } ++cp; } if (warning) printf("No recognized date:%s\n",datebuf); return(0); } int inline(fp) register FILE *fp;{ register char *cp=buff; register int c,flag=0; while ((c=getc(fp))!=EOF) { if (flag==0) { if (c=='\n') return(0); else ++flag; } else if (flag<0) { if (c!=' ' && c!='\t') { ungetc(c,fp); *cp='\0'; return(1); } else flag= 1-flag; } if (c=='\n') flag= -flag; else *cp++ = c; } *cp='\0'; return(flag!=0); } SHAR_EOF fi # end of overwriting check # End of shell archive exit 0