[net.news.group] mailing lists vs. newsgroups: facts

chuqui@nsc.UUCP (Chuq Von Rospach) (09/07/85)

In article <3500005@ccvaxa> preece@ccvaxa.UUCP writes:
>In what sense does a mailing list do a better job?  (1) It is less
>visible to new readers, since it isn't just there to be browsed on
>every site. (2) The traffic still has to be passed along the route
>to each reader, as mail.  In some cases that will mean MORE net traffic
>than if the notes had been passed as news.

>I wonder how significant that is.

Oh, I do so hate to put a damper on anargument, but lets try using facts
for once and see what happens...

The following formula shows the number of readers needed on a mailing list
fof a newsgroup conversion to break even:
    list_readers = (sites_on_net*efficiency)/(increase*average_hops)

The derivation of that formula is at the bottom of this article for those
that want to check my math. The definitions are:

    o sites_on_net -- the number of sites a message in this newsgroup
    is distributed to.  I'll use 1950 based on the following:  I assume
    2200 sites on the net. I assume 5% of those sites are local networks
    and transfer cost is 'free'.  5% of the sites turn the group off for
    some reason.  That leaves you about 1950 sites.

    o increase -- the factor by which readership increases when
    converted to a newsgroup (1 is no increase, 2 is doubled, 4 is
    quadrupled, etc...). For the best case, lets assume readership
    quadruples, for the worst case, it merely doubles.

    o efficiency -- the efficiency advantage of news transport over
    mail which is shown as (1-%reduction_for_efficiency). No batching
    saves you 0%, batching with no compression is about 35%, and full
    compression is about 55-65%.  The variance between worst case and
    best case is an estimate of the number of sites running various
    batching schemes, and worst case could (theoretically) be as low as
    0% but lets use the range 35% to 65%.  Because news feeds tend to
    be shorter distances than a lot of mail feeds, add another 10%.
    Worst case is then (1-.45) or .55 and best is (1-.75) or .25.

    o average hops -- the number of hops, on average, that a message in
    a mailing list needs to travel from the list to the recipient.
    Based on my two large mailing lists I've run (lan-news last year,
    nuke-winter this year) the average number of hops from my site to the
    person on the list is about 4. Let's use 3 for a best case and 5 for a
    worst case.

    o many mailing lists (mail.feminists, for instance) use intermediary
    distribution points to reduce the number of total hops. Mail.feminists
    has something like 200 people on it, but a lot of messages are sent out
    to sites that redistribute them further to keep the load down. This
    feature allows a list to support a lot more users before hitting the
    breakeven point.

    o large mailing lists can be digested, thereby reducing a lot of
    mail overhead by shipping fewer but larger messages, which also puts
    off the breakeven point (this could also be done by a mod.all group)

Best case breakeven then becomes (1950*.25)/(4*3) or 40 people on the list.
Worst case breakeven is (1950*.55)/(5*2) or 107 107 person on the list.

In general, it looks like when the number of hits somewhere between 50 and
75 readers it makes sense to turn it into either a moderated group (if
content regulation is of interest) or a net.all group (if you want a
free-for-all). 

===== Caveats =====
    o Volume tends to be higher on a newsgroup. Also, there tends to be a
    higher amount of garbage because of the loss of moderation. If there is
    a reason to keep the garbage out, a moderator ought to be used with a
    mod.all group or the mailing list ought to be maintained.

    o hop_count_cost assumes netwide traffic. Certain sites (ihnp4 and
    other major mail gateways) would see higher traffic patterns
    because of a mailing list, leaf sites would see lower.

    o Many of those numbers are estimates. Your mileage may vary,
    especially the mailing list -> newsgroup audience increase. It may
    actually be as low as 1:1, and as high as infinity -- we have no
    data to work on.  average_hops varies on how well connected the hub
    of a mailing list is, but even if they only talk to ihnp4 the
    average paths isn't much worse than 5.

    o With the exception of the fudge factor in the news efficiency, the
    increased cost of a long distance hop over a local hop is ignored.

=== breakeven formula generation ===

A hop_count_cost is considered to be the total_hops/list_readers

For a mailing list, total_hops can be defined as (average_hops * list_readers)
so the hop_count_cost becomes (average_hops * list_readers)/list_readers
or average_hops.

For a newsgroup, total_hops is defined as the number of sites on the net.
list_readers needs to be extrapolated from the number of readers on the
mailing list, and we throw in a fudge factor because transfer by batching
in news is more efficient than shipping mail. The formula becomes:

    (sites_on_net*efficiency)/(list_readers*increase)

Setting those two equations equal to each other, we can find the breakeven
point. The formula is:
    average_hops = (sites_on_net*efficiency)/(list_readers*increase)

which becomes
    list_readers = (sites_on_net*efficiency)/(increase*average_hops)

and you solve for the number of readers that need to be on the list for a 
conversion to a newsgroup to break even.

=== final disclaimer ===

Putting together this article I have finally figured out why so few people
bother with facts while arguing on the net. It took me about 2 hours to put
the math together and a lot of thinking (in other words, work...) It is a
lot easier to play with supposition and opinion, and I guess we get lazy
after a while...

chuq

-- 
Chuq Von Rospach nsc!chuqui@decwrl.ARPA {decwrl,hplabs,ihnp4}!nsc!chuqui

An uninformed opinion is no opinion at all. If you dont know what you're
talking about, please try to do it quietly.

lauren@vortex.UUCP (Lauren Weinstein) (09/08/85)

I'm afraid that the number of bizarre suppositions in that article make
the estimates practically worthless.  I won't even ATTEMPT to provide
my own figures, because I think it would just be guessing--which is
what I think Chuqui did.  I doubt if those figures of his bear any
significant relationship to reality in the real net world.

Without more accurate data, trying to draw conclusions like that is
simply a waste of time.  Let's try again when we have REAL data to
work with.

--Lauren--

dmmartindale@watcgl.UUCP (Dave Martindale) (09/12/85)

Chuq's analysis of the breakeven point for mailing lists looks correct
for the case where the mailing list and the mail systems along the way
are set up in the most naive fashion: every member of the list gets a
copy, and intermediate sites get all of those copies passed through them.

In fact, this isn't necessary.  All versions of UNIX that I know of
except 4.1BSD have an "rmail" that can handle multiple recipients.  Thus,
if site A is sending the same message to 10 people via site B, it need
send only *one* copy over the link to site B, and specify 10 recipients.
If site B is similarly clever, it will also send to multiple people at
once wherever possible on its outgoing links.  This sort of clever forwarding
was first set up on watmath in response to the traffic generated by the
women's mailing list.  As far as I know, only sites running sendmail
have an appropriately-clever mailer, but this includes many sites that
handle large amounts of mail traffic.  And it does require that the system
administrator determine which of the connecting sites can handle multiple
recipients and configure sendmail appropriately.  But they should do this
anyway, in order to cut phone costs.  The amount of work involved is certainly
far less than news requires to keep operating.

Also, as Chuq mentions but neglects to account for in his formula, large
mailing lists can have downstream redistribution points consisting of
mail aliases that automatically redistribute mail (this can be done on
many flavours of UNIX) and ordinary people who perform the same function.
This is also used by the women's mailing list.

In the best possible case, every site could have at most one incoming copy
of the mail message that is redistributed to outgoing addresses by one of
the three methods described above.  Then the total number of messages
would be *less* than that generated by news (because of the multiple
connectivity of USENET, sites often receive 2 or more copies of the same
article) on the subnet that the mail is being sent to.  News does benefit
from compression though.  But it seems that a well-planned
mailing list has a breakeven point that is far higher than Chuq suggests.

chuqui@nsc.UUCP (Chuq Von Rospach) (09/13/85)

In article <789@vortex.UUCP> lauren@vortex.UUCP (Lauren Weinstein) writes:
>I'm afraid that the number of bizarre suppositions in that article make
>the estimates practically worthless.  I won't even ATTEMPT to provide
>my own figures, because I think it would just be guessing

As a matter of fact, Lauren, I pulled as much real data as I could into
that article. Anything that was a supposition or a guess was labeled as
such. If you don't agree with my data, please help me build better data so
we can make it more accurate. Simply writing it off as useless is truly
useless.

>Without more accurate data, trying to draw conclusions like that is
>simply a waste of time.  Let's try again when we have REAL data to
>work with.

If you don't like my data, find better data to prove me wrong. Simply
saying I'm wrong without showing it is a pure value judgement. I happen to
think that reality IS somewhere between my best and worst cases. As I can
get better data, I'll be able to build a better model.

Your article is, to me, a perfect example of what is wrong with this
network. There is a LOT of opinion on the network, and very little fact,
logic, or reason. Most people seem to believe that disagreeing with
something makes it untrue. If I have a choice between an opinion with facts,
such as what I attempted to put together, and an opinion, which is what you
tossed out, I'll take the facts any day. They may not be completely
accurate, but at least I have a chance to figure out what the inaccuracies
are. Simply saying "I don't like it, so it isn't so" doesn't give me any
basis for understanding WHY you say it and why I should believe it. If you
don't like my facts, come up with a few of your own. 

chuq
-- 
Chuq Von Rospach nsc!chuqui@decwrl.ARPA {decwrl,hplabs,ihnp4}!nsc!chuqui

An uninformed opinion is no opinion at all. If you dont know what you're
talking about, please try to do it quietly.

kre@munnari.OZ (Robert Elz) (09/14/85)

In <3221@nsc.UUCP> Chuq created a formula that purported to
determine the number of users needed on a mailing list before
it becomes more economical to make it a mailing list.  The
number derived was something between 40 and 107.

In <789@vortex.UUCP> Lauren noted that this formula was
nonsense (my words), provoking <3256@nsc.UUCP> from Chuq
bemoaning the "opinion unsupported by facts".

Frankly, I thought that the inadequacy of the formula so
patently obvious, that providing "facts" to rebut it would
be hardly worth the trouble, but here goes anyway.

According to Chuq, if I establish a mailing list with (lets
overestimate for safety) 200 members, then it is more economical
to the net at large to turn it into a newsgroup (perhaps
moderated).  Numbers are the only criterion that counts...

Well, my mailing list is in Australia - the formula doesn't
include localities as a parameter - in fact, it concerns
some local Australian TV soap opera that isn't seen outside
Australia, and never will be.

Yet, somehow, amazingly, its more economical to the net as
a whole to turn this thing into a newsgroup than to leave it
as a mailing list!

What's more, nothing changes if the mailing list isn't spread
over 200 different Australian hosts, but is all on my local
host and doesn't go over the network at all!

[Please don't interject & say I could create a local, or
Australian newsgroup - of course I could.  But that isn't
what 'the formula' supposedly tells me - it says that a "net"
or "mod" group is appropriate for my list.  I also realize that
numbers alone aren't the only criterion for deciding to switch
a list to a newsgroup.  The only question here is which is more
economical to the net - and Chuq's formula says a newsgroup
would be.  There may easily be other reasons than economics
for deciding to keep a mailing list]

Come on - that formula is far too simplistic, to decide that
any particular mailing list would be more economical as a
newsgroup takes much more analysis than just counting members.

What's more, a properly run mailing list would add traffic to
no links not between hosts with users receiving the list.  That's
how mailing lists ought to be run - if a site wants to get it,
it should call some other site that is already getting the list
(perhaps the originator, perhaps a relay site) and have it sent
from there.  That way, only the recipients of the list pay for
it, which is as it should be.  People setting up mailing lists
should bear this in mind, as should people requesting to be
added to such lists.

I know that will never happen - mail will always be forwarded
through "volunteer" sites, but we should aim for something
approaching this ideal.  If we assume this, then before a mailing
list is more economical to the net as a whole as a newsgroup,
almost every site on the net would need to receive it.  (I am
not going to attempt to fudge mathematics to show this, to me
it seems fairly clear, but please take this as an opinion.)

The problem of sending multiple copies of a mailing list in
multiple uucp transfers has also been noted.  Quite apart
from the fix to sendmail apparently in 4.3, this is quite easy
to avoid - if multiple people at one site are on a mailing list
that site should set up a redistribution point - get just one
copy, and redistribute it to all the local users - and possibly
to others at more remote sites.  This can be done now (or at least
those of us with any kind of reasonable mailer (sendmail, mmdf, ..)
can do it).

In his articles, Chuq issued pleas for more facts on the net.
I concur - but please lets have *facts* not pseudo-facts, I'd
much rather read an article which is clearly someone's opinion,
unsupported by anything, than one which pretends to be solid
fact, and is wrong.

Robert Elz		seismo!munnari!kre   kre%munnari.oz@seismo.css.gov

ps: the mailing list mentioned above doesn't actually exist,
please don't ask to be added to it :-)

chuqui@nsc.UUCP (Chuq Von Rospach) (09/18/85)

In article <915@munnari.OZ> kre@munnari.OZ (Robert Elz) writes:
>In <3221@nsc.UUCP> Chuq created a formula that purported to
>determine the number of users needed on a mailing list before
>it becomes more economical to make it a mailing list.  The
>number derived was something between 40 and 107.
>
>Frankly, I thought that the inadequacy of the formula so
>patently obvious, that providing "facts" to rebut it would
>be hardly worth the trouble, but here goes anyway.

sigh. The point I was TRYING to make with Lauren's comment is that many
people on the net seem to think that saying "I don't like it so it isn't
true" is a valid argument. WHY don't you like it, fergawshsakes? If you 
don't tell us WHY its wrong, why should we believe you any more than we
should believe the Easter bunny? Because you comb your hair in the morning?
If you explain what you don't like about a concept, then we can either
agree to throw it away (because there was a flaw that wasn't caught) ro we
can improve it. Just bitching at it doesn't give us a chance to do either,
because it doesn't give us insight into the problems.

[Editorial sidenote: besides the school of thought that has been beating on
me mercilously because they disagree with me without telling me why they
disagree with me (making me wonder if the disagreement is nothing more than
an emotional or personal response and not a factual one), I'm getting a lot
of responses that say, in essense, "dunderhead, you left this out, so the
formula ain't right! throw it out!" Now, the reality is that I KNOW the
silly thing isn't as rigorous as it could be, but the black/white attitude
of its either right or its worthless is VERY discouraging. Many sectors of
the net seem to think that if it doesn't spring out immediately into
perfection it isn't worth working on, and that just isn't so. My hope is to
take what I think is a good first approximation (and I'll bet that the
turnaround point for an average mailing list IS between my worst and best
case, somewhere) and turn it into a good second approximation. I happen to
appreciate feedback, even/especially negative feedback because it helps me
get a better perspective on the problem, but if your comments fall into 
the following categories, you're better off staying home and kicking your
dog:
    o I hate anyone who looks like you
    o anything you do sucks
    o that idea sucks (unless you tell me why, so I can either fix it
	or understand why it isn't fixable.
    o you left out the second comma in the fourth sentence of the third
	paragraph, so obviously your idea is worthless.

If you can't criticize constructively, the net is MUCH better off simply by
having you keep quiet. I guarantee you that if an idea DOES suck, the word
will get around, along with an explanation why. We ought to be working
together on this stuff, folks, not tossing rotten tomatoes.

end editorial sidebar, back to our program in progress]

>According to Chuq, if I establish a mailing list with (lets
>overestimate for safety) 200 members, then it is more economical
>to the net at large to turn it into a newsgroup (perhaps
>moderated).  Numbers are the only criterion that counts...
>
>Well, my mailing list is in Australia - the formula doesn't
>include localities as a parameter - in fact, it concerns
>some local Australian TV soap opera that isn't seen outside
>Australia, and never will be.
>
>Yet, somehow, amazingly, its more economical to the net as
>a whole to turn this thing into a newsgroup than to leave it
>as a mailing list!

It is even MORE ecnomical to set up a newsgroup with an australian
distribution (which is only sane, because you then drop all of the sites
out of the formula that don't apply to australia). The same would be true
about american soap operas using usa or na instead of net. A variation of
this formula would apply, but you would need to build the numbers to the
affected subnet. 

There was a 'basic' assumption to the formula that we were talking about
things that have relatively even distribution across the net and even
interests across the net. Add a flavor of geographic restriction to it, and
you need to vary the equation to fit. It is much easier, though, to take a
known formula and change it to fit a special case than to try to build all
the special cases into the formula ahead of time, because you then end up
with formulas that look like they came out of the BLS -- very accurate, but
representative of nothing.

>[Please don't interject & say I could create a local, or
>Australian newsgroup - of course I could.  But that isn't
>what 'the formula' supposedly tells me - it says that a "net"
>or "mod" group is appropriate for my list.

Well, I have and that is because a formula is only as good as the
assumptions surrounding it. One assumption was that we were talking about a
'general purpose' mailing list -> newsgroup conversion. Your example
violates that assumption, so attempting to fit it into the formula is
invalid. This isn't "THE FORMULA", this is just a formula that tries to
find out what the general case is. Once we can agree on a general case, we
can derive the formula to find the special cases, but worrying about
special cases now is premature and will only keep us away from the real
problems in the formula. 

>Come on - that formula is far too simplistic, to decide that
>any particular mailing list would be more economical as a
>newsgroup takes much more analysis than just counting members.

Agreed. Among the things it leaves out because I couldn't quantify them
(any suggestions?) are:

    o increased backbone loads to places like ihnp4 and seismo, since
    they'd see more traffic from a mailing list.

    o better quantification of the local call versus long distance calls in
    the cost of things. 

    o dealing with the very expensive trans-oceanic links

    o dealing with the cost-free ARPA connections (excluding content
    problems involved)

    o added cost of a newsgroup caused by having a single message take
    multiple paths to places (we can factor in the duplicate article
    rejection rate in somehow, probably)

    o reduced cost of a mailing list moderator screening garbage

    o reduced cost of a mailing list moderator digesting stuff (fewer,
    larger messages helping reduce uucp overhead)

    o added advantage of reduced distribution time of a mailing list (mail
    being MUCH faster than news in general). Among other things, reduces
    garbage of duplicated postings because you don't have 99,000 people
    telling you what city "Hill Street Blues" is in. You also get needed
    information faster.

    o how redistribution points along the way for mailing lists affect
    things, since it causes some messages to 'share' a hop, making a
    mailing list cheaper in effect.

>In his articles, Chuq issued pleas for more facts on the net.
>I concur - but please lets have *facts* not pseudo-facts, I'd
>much rather read an article which is clearly someone's opinion,
>unsupported by anything, than one which pretends to be solid
>fact, and is wrong.

Well, I'll happily thank Robert for taking the time to pull apart my
comments so I can try to put them together again. Beside allowing/forcing
me to clarify things, he's brought up some good points, which I hope I've
covered. I think my idea is valid. I think my implementation is good, as
far as it goes. I KNOW I need help in improving it. I've gotten a fair
amount of good and useful feedback on it to date as well as all the swill,
and I've tried to discuss the feedback with people (and ignore the swill).
Just because it isn't perfect, folks, don't throw it away. Make it
better... Cooperation and discussion will help us bring the network
forward. Rotten tomatoes don't solve anything...

-- 
Chuq Von Rospach nsc!chuqui@decwrl.ARPA {decwrl,hplabs,ihnp4}!nsc!chuqui

Take time to stop and count the ewoks...