[comp.misc] Ruminations on bandwidth

zweig@cs.uiuc.edu (Johnny Zweig) (04/18/91)

Excuse/ignore me if this has been beaten to death already, since I got tired
of the Jargon File discussion and didn't read the later postings.

Consider a big file F that somebody is about to post to the net.  There are
those that say "yikes! Now every site on the net has to deal with F, and since
they don't all want it, it is a tremendous waste of net bandwidth and this
should be made available via anonymous FTP!"  Let us think about this.

The best case, in terms of bandwidth, is a hypothetical Internet consisting
only of a backbone that all the hosts are connected to -- so when anyone FTPs
to the archive site, their packets don't go through any hosts besides the one
they are connected to. Call this a host expansion (HE) factor of 1. If, on
average, my packets have to go through another host before they get to the FTP
site, call this an effective HE of two, and so forth for higher values of HE.

Now consider the effective number of people at each site that will FTP a copy
of the file F. Call this the expected dispersal (EP).  If the file is boring,
EP may be far less than 1, if it is popular it may be up to a dozen or so.

Now consider and FTP session that grabs a copy of F. Each data packet will be
large, since F is large, so we can approximate it by the situation where
packet-flow is all in one direction (if there is substantial per-packet
processing going on, this factor may be nonnegligible, and we need to magnify
our factors accordingly).  Each copy of F will go through HE communication
links, and each host will snarf an average of EP copies, so the net gets a
total of N (the number of hosts) times HE times EP bandwidth gobbled up by our
friend F.

Now consider a posting of F to a newsgroup.  Assume all N hosts get a copy,
so there will be N units of bandwidth used up. (Some hosts may send out
multiple copies of F, but nntp is set up so that, for the most part, each copy
traverses a communication link once; this breaks down for sites that sit
between other sites and their nntp servers -- the HE factor comes into play
here, but is confounded because HE measures how many hops each site is away
from a single site, which will typically be much larger than the distance
betweent he average site and its newsfeed.)

In other words, when HE*EP<1, it makes sense to make F available via anonymous
FTP to conserve bandwidth. But for popular files, HE*EP can be large (what is
the Internet diamater these days? 5? 7?).  I don't have the data handy, but I
would be that EP is about 3 for the Jargon file (this is a guesstimate from my
experience with fellow grad students, most of whom grab a copy, offset by
another guesstimate of how many sites aren't so full of hackers/geeks/dweebs.
If HE is 4, this means making the file available via anonymous FTP would use
about ten times more Internet host-bandwidth than posting to the
well-balaanced usenet.

Has everyone forgotten how horrible Bitnet chat was, and why relay was
written?  Putting data in a central site where each copy not only goes _to_
many hosts, but _through_ many hosts on its way is a bad thing. It seems that
anonymous FTP is a win only for large files of moderate to low interest.

Disk usage is actually what most people were bitching about. There is a
program called expire that, sadly, doesn't treat huge postings any differently
from small ones.  This is the problem, at least as far as I can figure. In
terms of network bandwidth, posting is lots better than FTPing.

-Johnny Post

wwm@pmsmam.uucp (Bill Meahan) (04/18/91)

Johnny:

Your analysis is very elegant, BUT what about the THOUSANDS of us who are NOT
on the Internet?  Hard as it may be to believe, big bunches of us only have
relatively low speed (2400 baud or even less) telephone connections to some
newsfeed host, perhaps only once or twice a day.

We sure ain't usin' NNTP on a FDDI net :-)

Very large postings can chew up my phone line pretty fast!

For source postings, well I'll live with that since it is MY choice whether or
not to have my upstream feed send me the sources groups.  Even then,
considerate posters (e.g. Larry Wall with the newest version of Perl) keep the
volume down by spreading out the 36 or so pieces over a week or so.


Even then, full postings are rare.  Only when the volume of diffs will exceed
the volume of a new posting will FUBAR_99.74 be released.

-- 
Bill Meahan			|Product Design & Testing Section
Production Test Engineer	|Starter Motor Engineering
wwm@pmsmam			| +1 313 484 9320