[comp.society.development] How Usenet grew

wcs) (06/19/91)

Lots of you seem to be new to the net world.  The technology that
we have now has changed a lot in the last ten years or so, and
a number of you seem to be re-inventing the wheel or at least not
understanding the fundamental problems that shaped what we have now.
Some of the lessons we've learned over the years might help with the
problems you're trying to solve today.

Netnews Version A was first written about 10 years ago.
It wasn't the first of its kind - the PLATO Notes system had been
running for about 5-8 years on a big machine in Illinois (accessed nationwide)
and single-machine PC BBS's were invented around 1978.

At the time, "real" computers were expensive enough that they were
mostly at universities and big companies, though PCs were beginning to spread.
Email and networking worked like this:
- The ARPAnet was a government research network, mainly at
	universities, which supported mailing lists.
	You had to have special approval and dedicated access.
	It later grew into the Internet.
- A few X.25 networks like Telenet were around, which cost real money.
- IBM mainframe users probably had SNA by then, which used dedicated
	networks, but they were boring and not readily user-programmable.
- The rest of us had modems.  They were slow (usually 1200 baud), but
	they were cheaper than PCs are now, and almost anyone could buy one.
Unix operating systems were wide-spread at Universities, because (AT&T)
Bell Labs licensed it cheaply so everyone would buy it.  Unix came with uucp,
a file transfer program that worked with modems (or direct lines),
and a simple email system ("mail") that used uucp to ship messages.
Unix machines typically were minicomputers with dozens of users.

There WAS no central authority - all you needed to do was put somebody's
uucp information (phone number,password,etc.) in your systems file,
and have them put your information in theirs, and you could send mail.
Unix mail was store-and-forward - to send mail from your machine "A" 
to a machine "C" you didn't have information for, you could send it
through some machine "B" you both knew, or a string of machines.
Originally, you had to specify the whole path when you sent the message.
(This was annoying, and people wrote pathalias and smart mailers to do
this automatically.) 

Delay and reliability were problems - a modem connection can't get
through if the phones are all busy at the destination, which was common
except at very rich or lightly-loaded sites, so UUCP would retry hourly
until the mail got through.  Mail would usually get to destinations
inside your company in a day or so, and to almost anywhere in the country
within a week, though usually faster.  (At night, most of the users
go away, so the ports become free for uucp.)   Because of the economics
of the U.S. telephone system, long-distance email was normally sent overnight.  

Typically, a message going to someone far away
would go through several "well-known" systems that talked to lots of
people directly, and didn't mind forwarding other people's mail for free.
Before uunet, most of these systems were at universities or big companies
where "funny-money" budgets could hide what the real phone bills were,
and usually they served as the "gateway" machine for the organization's
own email traffic, so they could afford the equipment and personnel.

Sometime around 1981, Netnews Version A was written, which exchanged news
between Duke University and the University of North Carolina, using uucp.
It grew and spread, especially among universities and Bell Labs.
Like uucp, it was easy to join in - all you needed was a modem and a
friendly person at another site that got news who could send YOU the news.
News used a "flooding" protocol - when you received or originated an article,
you sent it to all your neighbors, who sent it to all their neighbors, ...
The spread of the TCP/IP Internet and Berkeley UNIX helped spread News.

It was strictly non-commercial, partly because it was an informal thing,
partly because the government-funded Arpanet didn't allow commercial use,
and partly because the companies that provided a lot of the core support
liked a lot of technical discussion (unix-wizards and the source groups 
were the major justification for carrying the news) but couldn't
justify providing free advertising use for the competition.

As the usenet got bigger, there were a bunch of informal efforts to
keep it organized - efficiency is a real problem in an anarchy,
and messages tended to get delayed a lot.  For a while, there was a relatively
clear backbone structure of 20-30 "important" machines,
and a cabal of people who made it work relatively well.
Eventually, the spread of the Internet and TCP/IP-based news transport
protocols has reduced the dependence on a backbone.

However, because flooding is a radically decentralized protocol,
you don't NEED a backbone - it just cuts down on the average cost
and administrative load, and improves performance.

UUNET is a project started by the Usenix association to provide a 
central location where anybody could get a newsfeed, and uucp service,
and email connections to the Internet, for a reasonable usage charge;
I forget if it was not-for-profit or profit-making.
It may have been the first expressly commercial netnews service,
but cost-sharing with the people you get news from is not uncommon,
and the Europeans have done cost-sharing to support a trans-Atlantic
newsfeed for a long time (leading to some messy politics!).
It's worked well - if I wanted to get a newsfeed, and didn't have
access to the Internet or a local free-telephone-call feed,
I'd probably use them - having a well-managed site provide email is
really valuable, and having them provide news also smoothes out the
traffic levels a lot.

Alternatives and Problems
-------------------------
The biggest problems with netnews are the high volume of traffic,
the low volume of interesting traffic, and the increasing number of
new immature users who haven't learned to be civilized.
Traffic has doubled every 1-2 years - the "Immanent Death Of The Net"
was predicted in a famous article back when the news was less
than a megabyte per day (~300 paper pages), and you COULD read it ALL.
It's now over 25 MB/day, and still growing exponentially.
Improved news reading software has made it easier to find the
subjects you're interested in, follow the discussions that are worthwhile,
discard the ones you don't care about, and generally survive.

You don't NEED a UNIX system to run something like netnews.
The Fidonet system is a distributed BBS that runs on MS-DOS,
and has a similar style of relatively decentralized email and news.
Or you can get communication software like Waffle or UUPC which run
UNIX-oriented protocols on MS-DOS, or build something with TCP/IP.

But there are two fundamental problems with this
1) Disk Space - if you have a lot of people,  writing a lot of stuff,
	you need a LOT of disk space, and it will ALWAYS be full.
	Usenet systems typically keep discussions for 1-2 weeks,
	which takes 200-400 MB or so these days, and it's growing.
	And once in a while, the news burps, and you get a huge
	amount in one day.

Impact-of-technology note:
	Fortunately, disk drive capacity doubles at about the same
	speed Usenet traffic does.
	The PLATO Notesfile system kept ALL articles ever written.
	This affects the style of discussion significantly,
	and also makes archiving easier.

2) Single-tasking operating systems, like MS-DOS, mean your computer
	can only do one thing well at a time.  If it's doing
	communications, then it's not doing your other work,
	which means email never really developed in the PC world,
	except through server systems like MCI Mail, AT&T Mail, Compuserve.
	After all, if you have to call someone on the phone to tell
	them to switch their PC to Email, you could just give them
	the message over the phone, or send it to their fax machine.
	There are background email programs for MS-DOS, including
	the system used by Reuters newswire, but they're only 
	really becoming common on networks.

Another alternative is big-server commercial networks, like Compuserve
for the higher-class folks, or Prodigy for the Home-Shopping-Network folks
who don't mind advertising, censorship, and really short message length limits.

Another option is the PC BBS world, which is far more anarchic than
the Usenet or Fido, simply because it's very decentralized,
with small disconnected communities instead of one big community.
I like the tied-together world better, but this at least solves the
traffic problem - if it's too full, you go somewhere else,
and you mostly talk to nearby people becuase it's cheaper.

COMMUNICATIONS
--------------
Usenet and Unix email grew up on modems, in a decentralized environment.
You can run a real world this way, but to make things work well, 
it really helps to have some sort of network management,
and a number of well-run systems that everyone can depend on.
Realistically, this either means systems that charge money,
or which run on someone else's large budget.

It's much more reliable, and somewhat more efficient, to use a dedicated
shared-use network of some sort, especially high-speed.

But 25 Megabytes per day of news is about 1 MB/hour of quasi-broadcast.
At 9600 baud, that's about 20 minutes per hour, per system.
Some people have jokingly accused AT&T of popularizing Usenet just so
everyone will use the telephones more :-)

There have been some experiments with satellite broadcast,
including the Stargate project run by Usenix in the mid-1980s,
which used spare broadcost capability on a major cable-tv system.
The distribution technology was relatively inexpensive,
and satellite dishes have since become much cheaper and more common.

Another technology for communications is packet digital radio, which
in the US is largely run by amateur shortwave operators.
Obviously, anything involving radio bandwidth becomes a political issue,
but it's an appropriate technology for developing countries.
(It may be a bit tough for Nepal, since it mostly uses line-of-sight
radio frequencies.)

ECONOMICS
---------
I've alluded to some of the economic issues above, and I won't say much more.
Some fundamental issues are that
1) If it's easy and cheap to connect, decentralized decision-making
	lets lots of people connect and the network grows fast.
1A) If somebody implements a good-enough system, and gives it away "free",
	lots of people will take it, like Usenet or Prodigy.
2) If it's easy and cheap to add traffic to an existing network,
	without getting much permission, people will.
3) If you have to wait for the government telephone company to decide something,
	you may have to wait a long time.  Unless they really want
	to do it, like the French Minitel, which was mainly installed
	for telephone directory access, but seems to have surprised
	the French in how fast it was used for chat-lines.
	Getting things done with most PTTs is rough.
4) Network management is a relatively difficult, and systems work
	much better if someone does it, especially if they can make money.
	The problem is how to do this in a decentralized environment
	(UUNET did it well.)
5) If your system is sucessful, traffic will grow far beyond your plans.
	It will probably grow far beyond your ability to manage it.
	If you're managing it centrally this is a problem.
	For instance, Prodigy email is no longer free, because
	people are using it a lot.  Of course, they're using it a
	lot partly because the BBS service on Prodigy is censored.
-- 
				Pray for peace;		  Bill
# Bill Stewart 908-949-0705 erebus.att.com!wcs AT&T Bell Labs 4M-312 Holmdel NJ
# No, that's covered by the Drug Exception to the Fourth Amendment.
# You can read it here in the fine print.

cmf851@anu.oz.au (Albert Langer) (06/22/91)

Thanks for the historical background Bill, in article 
<1991Jun19.055256.14960@cbnewsh.att.com> wcs@cbnewsh.att.com 
(Bill Stewart 908-949-0705 erebus.att.com!wcs)

Maybe I'm prejudiced but I see most of the history as confirming my
earlier assertion that the Usenet design is heavily influenced by the
environment it grew up in (research/academic) and in particular that
extending far beyond that environment, especially to developing countries,
needs to focus on providing the network management and systems
administration.

Technology trends since Usenet started have continued to be in the
direction of declining costs for everything, with disk space and
CPU power continuing to decline much faster than communications
bandwidth (and all these declining relative to sysadmin labor).

The early architecture was based on terminals hanging of multi-user
computers (sometimes even using X.25 networks to connect the terminals
to a central mainframe).

That no longer makes sense and a modern architecure should make use
of both the CPU power and mass storage capacity commonly available
on desktop computers, treating each user as a network node rather than
as a user of a much smaller number of network nodes.

Traffic can be expected to continue growing exponentially or faster
and filtering is a major issue. Classification into "newsgroups"
(with cross-posting) and further selection at the newsreader level
through kill files is already becoming inadequate.

Multi-media, groupware and other extensions beyond simple email and
news text are becoming important. (However for developing country
purposes the prime focus should be on cheap and reliable email and
news while keeping the future extensions in mind.)

Flooding algorithms and other inefficiencies that have resulted
from lack of centralized network administration are luxuries that
developing countries are unlikely to be able to afford. But
neither can they afford the overheads and delays of establishing
central bureaucracies or waiting for the existing PTT administrations
to take on the job.

Going through some specific points in your article:

>There have been some experiments with satellite broadcast,
>including the Stargate project run by Usenix in the mid-1980s,
>which used spare broadcost capability on a major cable-tv system.
>The distribution technology was relatively inexpensive,
>and satellite dishes have since become much cheaper and more common.

Seems to me this should be kept under review as DBS continues to 
become cheaper (I've even seen ads for using alfoil tape on window
blinds to pickup TV broadcasts in Europe using fresnel patterns, and
9600 baud SCPC satellite receivers are available as PC cards for less
than high speed modems).

But modems on the PSTN will still be needed for the uplink and most
aspects of the system can be designed without much reference to what
the actual communications carrier will be (even carrier pigeons with
floppy disks). Let's focus on designing the email and news system
rather than the communications carrier.

>ECONOMICS
>---------
>I've alluded to some of the economic issues above, and I won't say much more.
>Some fundamental issues are that
>1) If it's easy and cheap to connect, decentralized decision-making
>	lets lots of people connect and the network grows fast.

Requirement 1. Must be easy and cheap to connect (and to use).

>1A) If somebody implements a good-enough system, and gives it away "free",
>	lots of people will take it, like Usenet or Prodigy.

Requirement 2. Give away "freely available" software.

>2) If it's easy and cheap to add traffic to an existing network,
>	without getting much permission, people will.

Requirement 3. Exploit any available opportunities to add traffic
easily and cheaply to existing email and news networks such as Usenet
e.g. by providing transparent gateways.

Requirement 4. Provide adequate metering and accounting facilities,
both for gateways to other systems and internally so that:

a) It is easy for people to offer and use available communications
links without bureaucratic overheads keeping accounts and arranging
permissions. (e.g. Clarinet's Dynafeed software to automatically
add and remove groups when users subscribe to them, without 
"operator" involvement, but with associated logging and charging
for traffic).

b) (Also a security related requirement). People offering access to
communications links need not fear them being abused or ending up
paying other peoples bills.

Note: A full development of these requirements could be critical if
the inefficiencies of duplicated traffic, use of low speed and expensive
links when a high speed modem is available at another site, inability to
provide a high speed modem or leased line or satellite broadcast
individually when costs would be reduced for all if it was provided
collectively etc etc are to be avoided. Design should also avoid the
costs and other difficulties of a bureaucratic organization to
administer the accounting etc. This requirement is itself a consequence
of requirement 1 for cheap and easy connection and use, and of
requirement 5 below.

>3) If you have to wait for the government telephone company to decide 
>something, you may have to wait a long time.

Requirement 5. Don't rely on any single source of sponsorship. If
funds are needed for development, seek them widely, perhaps including
commercial uses. (May require careful reconciliation with requirement
2 for "freely available" software.)

>4) Network management is a relatively difficult, and systems work
>	much better if someone does it, especially if they can make money.
>	The problem is how to do this in a decentralized environment
>	(UUNET did it well.)

Requirement 6. Provide all network management for the decentralized
network. Design the architecture from the ground up with this problem
in mind since there may be NO local "sysadmins". (Tough one, focus on it).

>5) If your system is sucessful, traffic will grow far beyond your plans.
>	It will probably grow far beyond your ability to manage it.
>	If you're managing it centrally this is a problem.

Requirement 7. Design architecture to cope with massively larger
traffic than currently contemplated. Especially consider filtering
issues.

These of course are just notes about requirements, not properly
analysed and measurable requirements ready to hand over to specification
and design.

My point is we should be thinking now in terms of developing a
requirements statement.

--
Opinions disclaimed (Authoritative answer from opinion server)
Header reply address wrong. Use cmf851@csc2.anu.edu.au