fair@ucbvax.ARPA (Erik E. Fair) (09/06/85)
I hate reading predictions of gloom, doom, and ultimate disaster for the USENET. Lauren Weinstein and Chuq Von Rospach have been doomsaying quite a bit lately (at least, this is how I perceive their articles of late), so I decided to get some hard numbers to refute them. I like numbers, because, among other things, you can't argue with them. Unfortunately, the numbers did not show what I had expected... Disk Space Usage Summary in /usr/spool/news on ucbvax by newsgroup (1024 byte/blocks): International Newsgroups: 38939 net 1575 mod 791 fa 193 control 13 junk ----- 41511 National/State/Regional Newsgroups: 415 ucb 430 ba 13 uc 26 ca 54 na --- 938 Grand Total: 42449 blocks used for netnews article storage Now an important number for context: ucbvax's netnews system keeps articles online for 28 days (4*WEEK). Given that, we can get: average number of blocks arriving daily: 1516 (~ 1.5 Megabytes) number articles in the history file in net,fa,mod: 15896 subtotal of disk blocks in net,fa,mod: 41305 average number of bytes per article in net,fa,mod: 2660 average number of articles arriving daily: 567 Now some history. I did some similar counting in June 1984 on another system (dual), and got two numbers that I thought were interesting: 10,000 articles per month 1600 bytes per article therefore 16 Megabytes/month Note: the netnews system on dual only kept netnews for two weeks, so my sample then was smaller. However, from these numbers we get 66% increase in the average size of articles 58% increase in the number of articles per month This, in a period of 14 months. What can we do to save ourselves from immanent information overload? An answer (mine, of course) will be forthcoming in a series of articles in this newsgroup. keeper of the network news for ucbvax, and guardian of the gateway, Erik E. Fair ucbvax!fair fair@ucbarpa.BERKELEY.EDU
jerry@oliveb.UUCP (Jerry Aguirre) (09/21/85)
> Disk Space Usage Summary in /usr/spool/news > on ucbvax by newsgroup (1024 byte/blocks): > > International Newsgroups: > 38939 net > 1575 mod > 791 fa > 193 control > 13 junk > ----- > 41511 > Erik E. Fair ucbvax!fair fair@ucbarpa.BERKELEY.EDU Be carefull of how you make the disk usage measurements. The "du" program lies. I have had it report greater disk usage than the size of the disk. There are two sources of errors, "sparse" files and links. If a file has large areas that were seeked over but never written then those blocks that would be zero are not allocated but are marked in a way that later reads can recognize. The read then simulates the reading of a block of zeros. An example of this is the DBM history database used by news. Du calculates blocks by dividing the file size (logical) by the size of the blocks, not always accurate! The second source of error, which is more relevent here, is the counting of links. The storage of news makes use of links to cross post an article to more than one news group. An article may appear in both net.flame and net.sources but only one copy is stored. The du program will count each link seperatly. This usually doubles du's estimate of usage though the exact ammount depends on the quantity and size of cross postings. Both these problems seem to have been fixed in the 4.2BSD version so whether they affect you depends on what version of Unix you run. The link problem can still bite you if you manually add the sizes. My news directory runs about 13Mb with a 2 week expiration so your size seems a little high. Either you receive lots of articles that I don't, you are getting lots of duplicate articles, or your expire is not deleting every thing it should. For 4 weeks you should have under 26Mb, not 41Mb. The current news history code is a crock and has many loopholes that allow an article to be received without being entered into the database. This means that they don't get deleted and that duplicates can be received. I had this bite me while testing out a batched version of the ihave/sendme protocol. The target system requested several thousand articles that were not in it's history. They were, however, already in the news directory. Also, unless I regularly (weekly) run an expire -r then unexpired articles gradually accumulate. Jerry Aguirre @ Olivetti ATC {hplabs|fortune|idi|ihnp4|tolerant|allegra|tymix|olhqma}!oliveb!jerry
usenet@ucbvax.ARPA (USENET News Administration) (09/24/85)
Of course, ucbvax runs 4.3 BSD, with (as you noted Jerry) a du that knows how to count links correctly, so that is not a source of error in my numbers. Also, there are no sparse files in the netnews spool area, since everything in there is a contiguous text file. As to your comments about expire, I have had no problems here. I have a shell script that looks for netnews articles that are greater than 35 days old which I then check by hand for an Expires: field. To date, expire has missed on the order of 30 articles in a period of three months, which is an acceptable error rate for me (since I don't really want to hack up expire.c). Don't ask me why we seem to have twice as much netnews online as you expected. I was merely counting, and reporting the results, because I think they indicate a trend toward both increasing traffic, and increasing size of the individual articles. Erik E. Fair ucbvax!fair fair@ucbarpa.BERKELEY.EDU
henry@utzoo.UUCP (Henry Spencer) (09/26/85)
> The second source of error, which is more relevent here, is the counting > of links. The storage of news makes use of links to cross post an > article to more than one news group. An article may appear in both > net.flame and net.sources but only one copy is stored. The du program > will count each link seperatly... Any du that does this is broken. Please note that if it is, it's because somebody at Berkeley or AT&T or XYZ Vaporboxes Inc. broke it. The original Bell Labs du on V7 copes with links properly. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
mikel@codas.UUCP (Mikel Manitius) (10/03/85)
> > The second source of error, which is more relevent here, is the counting > > of links. The storage of news makes use of links to cross post an > > article to more than one news group. An article may appear in both > > net.flame and net.sources but only one copy is stored. The du program > > will count each link seperatly... > > Any du that does this is broken. Please note that if it is, it's because > somebody at Berkeley or AT&T or XYZ Vaporboxes Inc. broke it. The original > Bell Labs du on V7 copes with links properly. > -- > Henry Spencer @ U of Toronto Zoology > {allegra,ihnp4,linus,decvax}!utzoo!henry Really? I remember well on my old 4.1bsd system, du kept a table of inodes it had seen before and did not count files it had seen again, and I can confirm this with a test I just did on our System V (5.2), I created a file of X size, linked it several times, and du reported X size, I made a subdirectory and linked it in there too, du reported X+1 size, the extra 1 block comming from the allocation for the dirrectory. I see no problem with du. -- ======= Mikel Manitius ==----===== AT&T ...!{ihnp4!}codas!mikel ==------===== Information Systems (305) 869-2462 ===----====== SDSS Regional Support AT&T-IS ETN: 755 =========== Altamonte Springs, FL My opinions are my own. =======
rick@seismo.CSS.GOV (Rick Adams) (10/10/85)
Why doesn't anyone look at the code for du before spouting off on what it does or doesn't do? On 4.[12]bsd, du keeps track of the first 1000 inodes it finds with multiple links. If it finds more than that, it will not notice that they are duplicates and will count them as if they were distinct files. My system has over 7000 articles on line right now. About 1100 of them are multiple links. So 100 articles are being miscounted and du is wrong in this case. If you have 15000 articles (about 1 months worth), then du will probably miss at least 1000 of them and be WAY off. ---rick