[news.admin] listservers as an alternative to news for distribution

welty@steinmetz.ge.com (richard welty) (07/03/88)

In article <1988Jul1.043049.2418@ziebmef.uucp> becker@ziebmef.UUCP (Bruce Becker) writes:
*	it seems like the time to look at the practice of other systems -
*	in specific, I am familiar with BitNet, which send out descriptions
*	of available binaries (source, documents, etc), and issues a pointer
*	to a thing called a "listserver"...

*	It seems to me that this uses far less net bandwidth than the
*	broadcasting method, and serves the community equally well...

In article <3335@s.cc.purdue.edu> rsk@s.cc.purdue.edu.UUCP (Rich Kulawiec) writes:
>An extreme example, but more complicated topologies and cost distributions
>eventually lead one to the same conclusion: if enough people at enough
>sites want the source, it's cheaper to post it.  Some years ago, I think
>Chuq did some analysis of this problem, and concluded that the tradeoff
>was somewhere around 100 people, in terms of overall network bandwidth.

Chuq has stated that his analysis is now probably somewhat out of date,
and recently guessed that the number was likely to be more like 200
(as I recall -- I did not save his article, unfortunately.

>Clearly, however, this is a huge lose for the originating site, which
>must send 100 copies of something rather than 1.

It can be a huge lossage for links adjacent to the originating site
as well.  When I ran the auto-sports mailing list off of steinmetz,
more than 40 of the addresses were being directed by pathalias through
the phone link between rochester and steinmetz, which was pretty expensive.
We attempted to configure sendmail here on steinmetz to fold addresses,
but the copy that we have here (on an ultrix system) does not support
the appropriate options.

Of course, with a listserver, folding the addresses (if your sendmail
can do it) would require batching requests, because otherwise the
folding would not be possible.
-- 
Richard Welty  518-387-6346  GE R&D, K1-5C39, Niskayuna, New York
   welty@ge-crd.ARPA  {uunet,philabs,rochester}!steinmetz!welty
    ``I always wondered what that switch did'' -- Aaron Heller

chuq@plaid.Sun.COM (Chuq Von Rospach) (07/07/88)

>*	It seems to me that this uses far less net bandwidth than the
>*	broadcasting method, and serves the community equally well...

>>An extreme example, but more complicated topologies and cost distributions
>>eventually lead one to the same conclusion: if enough people at enough
>>sites want the source, it's cheaper to post it.  Some years ago, I think
>>Chuq did some analysis of this problem, and concluded that the tradeoff
>>was somewhere around 100 people, in terms of overall network bandwidth.

>Chuq has stated that his analysis is now probably somewhat out of date,
>and recently guessed that the number was likely to be more like 200
>(as I recall -- I did not save his article, unfortunately.

Whatever numbers I did are hopelessly out of date these days. With the
proliferation of Telebit modems and PC Pursuit and the essentially free
NNTP links slogging data around the country, the only way to really judge
"cost" is to split the links into "free", "low cost" [PC Pursuit, local
calls] and "high cost" connections. 

I haven't really looked at it, but it'd be extremely hard to even get a
rough estimate of what percentage of the links fall into what category.
Without that, trying to estimate costs and the E-mail/net crossover is 
essentially impossible. 

The limiting factors become, in many cases, disk space, CPU cycles, or modem
access and bandwidth. When you're using NNTP or PC-Pursuit, the cost is
fixed regardless of amount of data, your cost/byte goes to essentially zero.
(I'm over-simplifying to some degree, here, but I think it's close enough
for right now). You then get into a scenario where it's actually cheaper to
send via USENET, even if only a (relatively) few people read it, and even if
it goes on the exact same route -- because it's possible that the news is
being held for PC-Pursuit during off hours and mail, on the exact same link,
goes out immediately. 

So my numbers are useless in measuring reality today. If you want to
try to build a "cost" of a given message via USENET versus E-mail and
mailing lists, here's what I think you'd need to figure out the cost of
a USENET message.

  If you take the number of sites on the net
  (random number: 7,000), and figure out the average number of hops
  needed to service each site (completely random number: 10), then take
  the percentage of "high cost" hops in an average path (random number:
  40%), you can then cost of a message in money-hops (based on above
  random numbers: 7,000*10*0.40=28,000 money-hops). Factor in some
  overhead for the following: CPU Cycles, Disk space, modem
  utilization, fixed overhead costs for PC-Pursuit and/or Internet
  charges and administrative overhead. Turn that number into
  money-hops, just to keep thing consistent. As a completely random
  number, figure your fixed costs to be about 10% of your money-hops,
  so the total money-hop figure would be about 31,000).

  To turn this into a dollar figure, you'd have to come up with a cost per
  money-hop. Figure out the average cost of transmitting an fixed sized
  1,000 byte message across a long-distance link @1200 baud (random number:
  $.02). So the (random) cost per K of a message on USENET would be:
  31,000*$0.02, or about $620.00. (hmm. That message is compressed, not
  ascii. You can figure in a compression quotient). A 50K Macintosh binary
  would therefore cost the net about $31,000 in transmission charges.
  Expensive toys.

  These numbers are bogus. If someone's interested, they can try to figure
  out real versions of them. For instance, I'm willing to bet that the "high
  cost" number is exceptionally low -- I'd doubt that, on a net-wide basis,
  the "high cost" links account for less than 60% of the net, maybe as high
  as 75%. So these numbers may be as much as 50% low. Also, you have to
  remember to factor areas without full feeds (like Europe) in or out based
  on whether they get the newsgroups or not, so you can't just take a number
  like 7,000 for all cases -- cut out Europe and Australia and a given
  group might lose 10% of the potential sites, making the numbers
  worthless.

To find the break-even point for a mailing list, you'd have to do the same
kind of cost analysis as above. That's much easier (for the most part)
because you know how many people are reading it and how many hops it takes
to deliver the mailing list -- smart mailers in the middle rerouting on you
being one major exception. 

The number of "high cost" links will be somewhat higher, because many sites
don't hold up mail for PC-Pursuit distribution. As long as the money-hops
are lower, it's cheaper to go by mailing list, except that you're also
concentrating the strain on local machines, so there might be a regional
bandwidth problem even though there's a net gain for the entire net. That
can be a significant problem for some folks which might warrant dealing with
the situation even if it's not otherwise justifiable.

>>Clearly, however, this is a huge lose for the originating site, which
>>must send 100 copies of something rather than 1.

Yeah, but it really depends. 100 messages that go to the Internet is all CPU
cycle cost. 100 messages that go to Australia via a modem is expensive. So
you can't generalize just on number of messages -- you need to look at where
they're going and how.

Anyone want to comment on this analysis? It looks reasonable to me,
understanding that the numbers I chose are bogus (kids, don't try this at
home...), but these things always work better when multiple people smooth
out the rough spots....



Chuq Von Rospach			chuq@sun.COM		Delphi: CHUQ

	Robert A. Heinlein: 1907-1988. He will never truly die as long as we
                           read his words and speak his name. Rest in Peace.