chuqui@nsc.UUCP (Chuq Von Rospach) (09/07/85)
In article <3500005@ccvaxa> preece@ccvaxa.UUCP writes: >In what sense does a mailing list do a better job? (1) It is less >visible to new readers, since it isn't just there to be browsed on >every site. (2) The traffic still has to be passed along the route >to each reader, as mail. In some cases that will mean MORE net traffic >than if the notes had been passed as news. >I wonder how significant that is. Oh, I do so hate to put a damper on anargument, but lets try using facts for once and see what happens... The following formula shows the number of readers needed on a mailing list fof a newsgroup conversion to break even: list_readers = (sites_on_net*efficiency)/(increase*average_hops) The derivation of that formula is at the bottom of this article for those that want to check my math. The definitions are: o sites_on_net -- the number of sites a message in this newsgroup is distributed to. I'll use 1950 based on the following: I assume 2200 sites on the net. I assume 5% of those sites are local networks and transfer cost is 'free'. 5% of the sites turn the group off for some reason. That leaves you about 1950 sites. o increase -- the factor by which readership increases when converted to a newsgroup (1 is no increase, 2 is doubled, 4 is quadrupled, etc...). For the best case, lets assume readership quadruples, for the worst case, it merely doubles. o efficiency -- the efficiency advantage of news transport over mail which is shown as (1-%reduction_for_efficiency). No batching saves you 0%, batching with no compression is about 35%, and full compression is about 55-65%. The variance between worst case and best case is an estimate of the number of sites running various batching schemes, and worst case could (theoretically) be as low as 0% but lets use the range 35% to 65%. Because news feeds tend to be shorter distances than a lot of mail feeds, add another 10%. Worst case is then (1-.45) or .55 and best is (1-.75) or .25. o average hops -- the number of hops, on average, that a message in a mailing list needs to travel from the list to the recipient. Based on my two large mailing lists I've run (lan-news last year, nuke-winter this year) the average number of hops from my site to the person on the list is about 4. Let's use 3 for a best case and 5 for a worst case. o many mailing lists (mail.feminists, for instance) use intermediary distribution points to reduce the number of total hops. Mail.feminists has something like 200 people on it, but a lot of messages are sent out to sites that redistribute them further to keep the load down. This feature allows a list to support a lot more users before hitting the breakeven point. o large mailing lists can be digested, thereby reducing a lot of mail overhead by shipping fewer but larger messages, which also puts off the breakeven point (this could also be done by a mod.all group) Best case breakeven then becomes (1950*.25)/(4*3) or 40 people on the list. Worst case breakeven is (1950*.55)/(5*2) or 107 107 person on the list. In general, it looks like when the number of hits somewhere between 50 and 75 readers it makes sense to turn it into either a moderated group (if content regulation is of interest) or a net.all group (if you want a free-for-all). ===== Caveats ===== o Volume tends to be higher on a newsgroup. Also, there tends to be a higher amount of garbage because of the loss of moderation. If there is a reason to keep the garbage out, a moderator ought to be used with a mod.all group or the mailing list ought to be maintained. o hop_count_cost assumes netwide traffic. Certain sites (ihnp4 and other major mail gateways) would see higher traffic patterns because of a mailing list, leaf sites would see lower. o Many of those numbers are estimates. Your mileage may vary, especially the mailing list -> newsgroup audience increase. It may actually be as low as 1:1, and as high as infinity -- we have no data to work on. average_hops varies on how well connected the hub of a mailing list is, but even if they only talk to ihnp4 the average paths isn't much worse than 5. o With the exception of the fudge factor in the news efficiency, the increased cost of a long distance hop over a local hop is ignored. === breakeven formula generation === A hop_count_cost is considered to be the total_hops/list_readers For a mailing list, total_hops can be defined as (average_hops * list_readers) so the hop_count_cost becomes (average_hops * list_readers)/list_readers or average_hops. For a newsgroup, total_hops is defined as the number of sites on the net. list_readers needs to be extrapolated from the number of readers on the mailing list, and we throw in a fudge factor because transfer by batching in news is more efficient than shipping mail. The formula becomes: (sites_on_net*efficiency)/(list_readers*increase) Setting those two equations equal to each other, we can find the breakeven point. The formula is: average_hops = (sites_on_net*efficiency)/(list_readers*increase) which becomes list_readers = (sites_on_net*efficiency)/(increase*average_hops) and you solve for the number of readers that need to be on the list for a conversion to a newsgroup to break even. === final disclaimer === Putting together this article I have finally figured out why so few people bother with facts while arguing on the net. It took me about 2 hours to put the math together and a lot of thinking (in other words, work...) It is a lot easier to play with supposition and opinion, and I guess we get lazy after a while... chuq -- Chuq Von Rospach nsc!chuqui@decwrl.ARPA {decwrl,hplabs,ihnp4}!nsc!chuqui An uninformed opinion is no opinion at all. If you dont know what you're talking about, please try to do it quietly.
lauren@vortex.UUCP (Lauren Weinstein) (09/08/85)
I'm afraid that the number of bizarre suppositions in that article make the estimates practically worthless. I won't even ATTEMPT to provide my own figures, because I think it would just be guessing--which is what I think Chuqui did. I doubt if those figures of his bear any significant relationship to reality in the real net world. Without more accurate data, trying to draw conclusions like that is simply a waste of time. Let's try again when we have REAL data to work with. --Lauren--
dmmartindale@watcgl.UUCP (Dave Martindale) (09/12/85)
Chuq's analysis of the breakeven point for mailing lists looks correct for the case where the mailing list and the mail systems along the way are set up in the most naive fashion: every member of the list gets a copy, and intermediate sites get all of those copies passed through them. In fact, this isn't necessary. All versions of UNIX that I know of except 4.1BSD have an "rmail" that can handle multiple recipients. Thus, if site A is sending the same message to 10 people via site B, it need send only *one* copy over the link to site B, and specify 10 recipients. If site B is similarly clever, it will also send to multiple people at once wherever possible on its outgoing links. This sort of clever forwarding was first set up on watmath in response to the traffic generated by the women's mailing list. As far as I know, only sites running sendmail have an appropriately-clever mailer, but this includes many sites that handle large amounts of mail traffic. And it does require that the system administrator determine which of the connecting sites can handle multiple recipients and configure sendmail appropriately. But they should do this anyway, in order to cut phone costs. The amount of work involved is certainly far less than news requires to keep operating. Also, as Chuq mentions but neglects to account for in his formula, large mailing lists can have downstream redistribution points consisting of mail aliases that automatically redistribute mail (this can be done on many flavours of UNIX) and ordinary people who perform the same function. This is also used by the women's mailing list. In the best possible case, every site could have at most one incoming copy of the mail message that is redistributed to outgoing addresses by one of the three methods described above. Then the total number of messages would be *less* than that generated by news (because of the multiple connectivity of USENET, sites often receive 2 or more copies of the same article) on the subnet that the mail is being sent to. News does benefit from compression though. But it seems that a well-planned mailing list has a breakeven point that is far higher than Chuq suggests.
chuqui@nsc.UUCP (Chuq Von Rospach) (09/13/85)
In article <789@vortex.UUCP> lauren@vortex.UUCP (Lauren Weinstein) writes: >I'm afraid that the number of bizarre suppositions in that article make >the estimates practically worthless. I won't even ATTEMPT to provide >my own figures, because I think it would just be guessing As a matter of fact, Lauren, I pulled as much real data as I could into that article. Anything that was a supposition or a guess was labeled as such. If you don't agree with my data, please help me build better data so we can make it more accurate. Simply writing it off as useless is truly useless. >Without more accurate data, trying to draw conclusions like that is >simply a waste of time. Let's try again when we have REAL data to >work with. If you don't like my data, find better data to prove me wrong. Simply saying I'm wrong without showing it is a pure value judgement. I happen to think that reality IS somewhere between my best and worst cases. As I can get better data, I'll be able to build a better model. Your article is, to me, a perfect example of what is wrong with this network. There is a LOT of opinion on the network, and very little fact, logic, or reason. Most people seem to believe that disagreeing with something makes it untrue. If I have a choice between an opinion with facts, such as what I attempted to put together, and an opinion, which is what you tossed out, I'll take the facts any day. They may not be completely accurate, but at least I have a chance to figure out what the inaccuracies are. Simply saying "I don't like it, so it isn't so" doesn't give me any basis for understanding WHY you say it and why I should believe it. If you don't like my facts, come up with a few of your own. chuq -- Chuq Von Rospach nsc!chuqui@decwrl.ARPA {decwrl,hplabs,ihnp4}!nsc!chuqui An uninformed opinion is no opinion at all. If you dont know what you're talking about, please try to do it quietly.
kre@munnari.OZ (Robert Elz) (09/14/85)
In <3221@nsc.UUCP> Chuq created a formula that purported to determine the number of users needed on a mailing list before it becomes more economical to make it a mailing list. The number derived was something between 40 and 107. In <789@vortex.UUCP> Lauren noted that this formula was nonsense (my words), provoking <3256@nsc.UUCP> from Chuq bemoaning the "opinion unsupported by facts". Frankly, I thought that the inadequacy of the formula so patently obvious, that providing "facts" to rebut it would be hardly worth the trouble, but here goes anyway. According to Chuq, if I establish a mailing list with (lets overestimate for safety) 200 members, then it is more economical to the net at large to turn it into a newsgroup (perhaps moderated). Numbers are the only criterion that counts... Well, my mailing list is in Australia - the formula doesn't include localities as a parameter - in fact, it concerns some local Australian TV soap opera that isn't seen outside Australia, and never will be. Yet, somehow, amazingly, its more economical to the net as a whole to turn this thing into a newsgroup than to leave it as a mailing list! What's more, nothing changes if the mailing list isn't spread over 200 different Australian hosts, but is all on my local host and doesn't go over the network at all! [Please don't interject & say I could create a local, or Australian newsgroup - of course I could. But that isn't what 'the formula' supposedly tells me - it says that a "net" or "mod" group is appropriate for my list. I also realize that numbers alone aren't the only criterion for deciding to switch a list to a newsgroup. The only question here is which is more economical to the net - and Chuq's formula says a newsgroup would be. There may easily be other reasons than economics for deciding to keep a mailing list] Come on - that formula is far too simplistic, to decide that any particular mailing list would be more economical as a newsgroup takes much more analysis than just counting members. What's more, a properly run mailing list would add traffic to no links not between hosts with users receiving the list. That's how mailing lists ought to be run - if a site wants to get it, it should call some other site that is already getting the list (perhaps the originator, perhaps a relay site) and have it sent from there. That way, only the recipients of the list pay for it, which is as it should be. People setting up mailing lists should bear this in mind, as should people requesting to be added to such lists. I know that will never happen - mail will always be forwarded through "volunteer" sites, but we should aim for something approaching this ideal. If we assume this, then before a mailing list is more economical to the net as a whole as a newsgroup, almost every site on the net would need to receive it. (I am not going to attempt to fudge mathematics to show this, to me it seems fairly clear, but please take this as an opinion.) The problem of sending multiple copies of a mailing list in multiple uucp transfers has also been noted. Quite apart from the fix to sendmail apparently in 4.3, this is quite easy to avoid - if multiple people at one site are on a mailing list that site should set up a redistribution point - get just one copy, and redistribute it to all the local users - and possibly to others at more remote sites. This can be done now (or at least those of us with any kind of reasonable mailer (sendmail, mmdf, ..) can do it). In his articles, Chuq issued pleas for more facts on the net. I concur - but please lets have *facts* not pseudo-facts, I'd much rather read an article which is clearly someone's opinion, unsupported by anything, than one which pretends to be solid fact, and is wrong. Robert Elz seismo!munnari!kre kre%munnari.oz@seismo.css.gov ps: the mailing list mentioned above doesn't actually exist, please don't ask to be added to it :-)
chuqui@nsc.UUCP (Chuq Von Rospach) (09/18/85)
In article <915@munnari.OZ> kre@munnari.OZ (Robert Elz) writes: >In <3221@nsc.UUCP> Chuq created a formula that purported to >determine the number of users needed on a mailing list before >it becomes more economical to make it a mailing list. The >number derived was something between 40 and 107. > >Frankly, I thought that the inadequacy of the formula so >patently obvious, that providing "facts" to rebut it would >be hardly worth the trouble, but here goes anyway. sigh. The point I was TRYING to make with Lauren's comment is that many people on the net seem to think that saying "I don't like it so it isn't true" is a valid argument. WHY don't you like it, fergawshsakes? If you don't tell us WHY its wrong, why should we believe you any more than we should believe the Easter bunny? Because you comb your hair in the morning? If you explain what you don't like about a concept, then we can either agree to throw it away (because there was a flaw that wasn't caught) ro we can improve it. Just bitching at it doesn't give us a chance to do either, because it doesn't give us insight into the problems. [Editorial sidenote: besides the school of thought that has been beating on me mercilously because they disagree with me without telling me why they disagree with me (making me wonder if the disagreement is nothing more than an emotional or personal response and not a factual one), I'm getting a lot of responses that say, in essense, "dunderhead, you left this out, so the formula ain't right! throw it out!" Now, the reality is that I KNOW the silly thing isn't as rigorous as it could be, but the black/white attitude of its either right or its worthless is VERY discouraging. Many sectors of the net seem to think that if it doesn't spring out immediately into perfection it isn't worth working on, and that just isn't so. My hope is to take what I think is a good first approximation (and I'll bet that the turnaround point for an average mailing list IS between my worst and best case, somewhere) and turn it into a good second approximation. I happen to appreciate feedback, even/especially negative feedback because it helps me get a better perspective on the problem, but if your comments fall into the following categories, you're better off staying home and kicking your dog: o I hate anyone who looks like you o anything you do sucks o that idea sucks (unless you tell me why, so I can either fix it or understand why it isn't fixable. o you left out the second comma in the fourth sentence of the third paragraph, so obviously your idea is worthless. If you can't criticize constructively, the net is MUCH better off simply by having you keep quiet. I guarantee you that if an idea DOES suck, the word will get around, along with an explanation why. We ought to be working together on this stuff, folks, not tossing rotten tomatoes. end editorial sidebar, back to our program in progress] >According to Chuq, if I establish a mailing list with (lets >overestimate for safety) 200 members, then it is more economical >to the net at large to turn it into a newsgroup (perhaps >moderated). Numbers are the only criterion that counts... > >Well, my mailing list is in Australia - the formula doesn't >include localities as a parameter - in fact, it concerns >some local Australian TV soap opera that isn't seen outside >Australia, and never will be. > >Yet, somehow, amazingly, its more economical to the net as >a whole to turn this thing into a newsgroup than to leave it >as a mailing list! It is even MORE ecnomical to set up a newsgroup with an australian distribution (which is only sane, because you then drop all of the sites out of the formula that don't apply to australia). The same would be true about american soap operas using usa or na instead of net. A variation of this formula would apply, but you would need to build the numbers to the affected subnet. There was a 'basic' assumption to the formula that we were talking about things that have relatively even distribution across the net and even interests across the net. Add a flavor of geographic restriction to it, and you need to vary the equation to fit. It is much easier, though, to take a known formula and change it to fit a special case than to try to build all the special cases into the formula ahead of time, because you then end up with formulas that look like they came out of the BLS -- very accurate, but representative of nothing. >[Please don't interject & say I could create a local, or >Australian newsgroup - of course I could. But that isn't >what 'the formula' supposedly tells me - it says that a "net" >or "mod" group is appropriate for my list. Well, I have and that is because a formula is only as good as the assumptions surrounding it. One assumption was that we were talking about a 'general purpose' mailing list -> newsgroup conversion. Your example violates that assumption, so attempting to fit it into the formula is invalid. This isn't "THE FORMULA", this is just a formula that tries to find out what the general case is. Once we can agree on a general case, we can derive the formula to find the special cases, but worrying about special cases now is premature and will only keep us away from the real problems in the formula. >Come on - that formula is far too simplistic, to decide that >any particular mailing list would be more economical as a >newsgroup takes much more analysis than just counting members. Agreed. Among the things it leaves out because I couldn't quantify them (any suggestions?) are: o increased backbone loads to places like ihnp4 and seismo, since they'd see more traffic from a mailing list. o better quantification of the local call versus long distance calls in the cost of things. o dealing with the very expensive trans-oceanic links o dealing with the cost-free ARPA connections (excluding content problems involved) o added cost of a newsgroup caused by having a single message take multiple paths to places (we can factor in the duplicate article rejection rate in somehow, probably) o reduced cost of a mailing list moderator screening garbage o reduced cost of a mailing list moderator digesting stuff (fewer, larger messages helping reduce uucp overhead) o added advantage of reduced distribution time of a mailing list (mail being MUCH faster than news in general). Among other things, reduces garbage of duplicated postings because you don't have 99,000 people telling you what city "Hill Street Blues" is in. You also get needed information faster. o how redistribution points along the way for mailing lists affect things, since it causes some messages to 'share' a hop, making a mailing list cheaper in effect. >In his articles, Chuq issued pleas for more facts on the net. >I concur - but please lets have *facts* not pseudo-facts, I'd >much rather read an article which is clearly someone's opinion, >unsupported by anything, than one which pretends to be solid >fact, and is wrong. Well, I'll happily thank Robert for taking the time to pull apart my comments so I can try to put them together again. Beside allowing/forcing me to clarify things, he's brought up some good points, which I hope I've covered. I think my idea is valid. I think my implementation is good, as far as it goes. I KNOW I need help in improving it. I've gotten a fair amount of good and useful feedback on it to date as well as all the swill, and I've tried to discuss the feedback with people (and ignore the swill). Just because it isn't perfect, folks, don't throw it away. Make it better... Cooperation and discussion will help us bring the network forward. Rotten tomatoes don't solve anything... -- Chuq Von Rospach nsc!chuqui@decwrl.ARPA {decwrl,hplabs,ihnp4}!nsc!chuqui Take time to stop and count the ewoks...