zweig@cs.uiuc.edu (Johnny Zweig) (04/18/91)
Excuse/ignore me if this has been beaten to death already, since I got tired of the Jargon File discussion and didn't read the later postings. Consider a big file F that somebody is about to post to the net. There are those that say "yikes! Now every site on the net has to deal with F, and since they don't all want it, it is a tremendous waste of net bandwidth and this should be made available via anonymous FTP!" Let us think about this. The best case, in terms of bandwidth, is a hypothetical Internet consisting only of a backbone that all the hosts are connected to -- so when anyone FTPs to the archive site, their packets don't go through any hosts besides the one they are connected to. Call this a host expansion (HE) factor of 1. If, on average, my packets have to go through another host before they get to the FTP site, call this an effective HE of two, and so forth for higher values of HE. Now consider the effective number of people at each site that will FTP a copy of the file F. Call this the expected dispersal (EP). If the file is boring, EP may be far less than 1, if it is popular it may be up to a dozen or so. Now consider and FTP session that grabs a copy of F. Each data packet will be large, since F is large, so we can approximate it by the situation where packet-flow is all in one direction (if there is substantial per-packet processing going on, this factor may be nonnegligible, and we need to magnify our factors accordingly). Each copy of F will go through HE communication links, and each host will snarf an average of EP copies, so the net gets a total of N (the number of hosts) times HE times EP bandwidth gobbled up by our friend F. Now consider a posting of F to a newsgroup. Assume all N hosts get a copy, so there will be N units of bandwidth used up. (Some hosts may send out multiple copies of F, but nntp is set up so that, for the most part, each copy traverses a communication link once; this breaks down for sites that sit between other sites and their nntp servers -- the HE factor comes into play here, but is confounded because HE measures how many hops each site is away from a single site, which will typically be much larger than the distance betweent he average site and its newsfeed.) In other words, when HE*EP<1, it makes sense to make F available via anonymous FTP to conserve bandwidth. But for popular files, HE*EP can be large (what is the Internet diamater these days? 5? 7?). I don't have the data handy, but I would be that EP is about 3 for the Jargon file (this is a guesstimate from my experience with fellow grad students, most of whom grab a copy, offset by another guesstimate of how many sites aren't so full of hackers/geeks/dweebs. If HE is 4, this means making the file available via anonymous FTP would use about ten times more Internet host-bandwidth than posting to the well-balaanced usenet. Has everyone forgotten how horrible Bitnet chat was, and why relay was written? Putting data in a central site where each copy not only goes _to_ many hosts, but _through_ many hosts on its way is a bad thing. It seems that anonymous FTP is a win only for large files of moderate to low interest. Disk usage is actually what most people were bitching about. There is a program called expire that, sadly, doesn't treat huge postings any differently from small ones. This is the problem, at least as far as I can figure. In terms of network bandwidth, posting is lots better than FTPing. -Johnny Post
wwm@pmsmam.uucp (Bill Meahan) (04/18/91)
Johnny: Your analysis is very elegant, BUT what about the THOUSANDS of us who are NOT on the Internet? Hard as it may be to believe, big bunches of us only have relatively low speed (2400 baud or even less) telephone connections to some newsfeed host, perhaps only once or twice a day. We sure ain't usin' NNTP on a FDDI net :-) Very large postings can chew up my phone line pretty fast! For source postings, well I'll live with that since it is MY choice whether or not to have my upstream feed send me the sources groups. Even then, considerate posters (e.g. Larry Wall with the newest version of Perl) keep the volume down by spreading out the 36 or so pieces over a week or so. Even then, full postings are rare. Only when the volume of diffs will exceed the volume of a new posting will FUBAR_99.74 be released. -- Bill Meahan |Product Design & Testing Section Production Test Engineer |Starter Motor Engineering wwm@pmsmam | +1 313 484 9320