[news.groups] apology for gaps in arbitron result posting

reid@decwrl.dec.com (Brian Reid) (04/03/88)

The monthly processing for the "arbitron" reports that I post takes about 24
hours of CPU time on my office microvax, and uses about 40 megabytes of file
space at various times during that processing.

For this reason it is very tedious to debug. I have finally tracked down a
problem, caused by a bug in awk when the number of distinct hosts exceeded
8192, which caused the February 1988 posting to be fairly bogus and the March
1988 posting to be incomplete.

I should probably rewrite it in perl, but the program is exceedingly complex,
and was hard to get right, and I haven't really got the time to do the
conversion just now, so I'm going to keep using the awk version. This means
that strange future bugs may crop up--as you all know if you have ever tried
to do big programs in awk, it is very buggy and crashes in mysterious ways at
random times.

Anyhow, in a few minutes I will (manually) post the March 1988 arbitron
results, and sometime in the next week I will retrieve the February 1988 data
from my archives, uncompress it, and re-run the program for February 1988.
I know there are a lot of you out there who collect these reports every
month--I got about 100 "please send me the Feb88 data--it didn't reach my 
site" requests.

Thanks for your patience.

Brian Reid
self-appointed official statistician
Radio Free USENET

tower@bu-cs.BU.EDU (Leonard H. Tower Jr.) (04/12/88)

X-UUCP-Path: ..!harvard!bu-cs!tower


In article <346@bacchus.DEC.COM> reid@decwrl.UUCP (Brian Reid) writes:
|
|The monthly processing for the "arbitron" reports that I post takes about 24
|hours of CPU time on my office microvax, and uses about 40 megabytes of file
|space at various times during that processing.
|
|For this reason it is very tedious to debug. I have finally tracked down a
|problem, caused by a bug in awk when the number of distinct hosts exceeded
|8192, which caused the February 1988 posting to be fairly bogus and the March
|1988 posting to be incomplete.
|
|I should probably rewrite it in perl, but the program is exceedingly complex,
|and was hard to get right, and I haven't really got the time to do the
|conversion just now, so I'm going to keep using the awk version. This means
|that strange future bugs may crop up--as you all know if you have ever tried
|to do big programs in awk, it is very buggy and crashes in mysterious ways at
|random times.
|
|Thanks for your patience.
|
|Brian Reid
|self-appointed official statistician
|Radio Free USENET

I suggest you try the GNU Project's awk, gawk.  Following one of the
Project's guidelines, it has no arbitrary size limits.  Rick Adams
uunet statistics are now generated using gawk, for much the same
reasons.

gawk is available for anonymous ftp off of uunet.uu.net,
prep.ai.mit.edu (under /u2/emacs), and via uucp from other sites.

Questions to gnu@prep.ai.mit.edu .

enjoy -len