reid@decwrl.dec.com (Brian Reid) (04/03/88)
The monthly processing for the "arbitron" reports that I post takes about 24 hours of CPU time on my office microvax, and uses about 40 megabytes of file space at various times during that processing. For this reason it is very tedious to debug. I have finally tracked down a problem, caused by a bug in awk when the number of distinct hosts exceeded 8192, which caused the February 1988 posting to be fairly bogus and the March 1988 posting to be incomplete. I should probably rewrite it in perl, but the program is exceedingly complex, and was hard to get right, and I haven't really got the time to do the conversion just now, so I'm going to keep using the awk version. This means that strange future bugs may crop up--as you all know if you have ever tried to do big programs in awk, it is very buggy and crashes in mysterious ways at random times. Anyhow, in a few minutes I will (manually) post the March 1988 arbitron results, and sometime in the next week I will retrieve the February 1988 data from my archives, uncompress it, and re-run the program for February 1988. I know there are a lot of you out there who collect these reports every month--I got about 100 "please send me the Feb88 data--it didn't reach my site" requests. Thanks for your patience. Brian Reid self-appointed official statistician Radio Free USENET
tower@bu-cs.BU.EDU (Leonard H. Tower Jr.) (04/12/88)
X-UUCP-Path: ..!harvard!bu-cs!tower In article <346@bacchus.DEC.COM> reid@decwrl.UUCP (Brian Reid) writes: | |The monthly processing for the "arbitron" reports that I post takes about 24 |hours of CPU time on my office microvax, and uses about 40 megabytes of file |space at various times during that processing. | |For this reason it is very tedious to debug. I have finally tracked down a |problem, caused by a bug in awk when the number of distinct hosts exceeded |8192, which caused the February 1988 posting to be fairly bogus and the March |1988 posting to be incomplete. | |I should probably rewrite it in perl, but the program is exceedingly complex, |and was hard to get right, and I haven't really got the time to do the |conversion just now, so I'm going to keep using the awk version. This means |that strange future bugs may crop up--as you all know if you have ever tried |to do big programs in awk, it is very buggy and crashes in mysterious ways at |random times. | |Thanks for your patience. | |Brian Reid |self-appointed official statistician |Radio Free USENET I suggest you try the GNU Project's awk, gawk. Following one of the Project's guidelines, it has no arbitrary size limits. Rick Adams uunet statistics are now generated using gawk, for much the same reasons. gawk is available for anonymous ftp off of uunet.uu.net, prep.ai.mit.edu (under /u2/emacs), and via uucp from other sites. Questions to gnu@prep.ai.mit.edu . enjoy -len