[comp.protocols.tcp-ip] Using the domain name system for locating software

jkp@cs.HUT.FI (Jyrki Kuoppala) (09/06/89)

Motivated by the Internet Crucible's mention of the lack of new
applications of the Internet I'm posting this to the tcp-ip newsgroup.
I think it could well be possible to implement, but it certainly needs
more work.  A RFC should probably be written about the protocol
enhancements.

One of the biggest problems which does not only concern applications
like this is the failure of the tcp/ip protcols in general to cope
well with the situation that the Internet is a combination of several
networks which are not necessarily globally interconnected.  For
example, there already are several commercial sites to which direct
tcp/ip connections are possible only to a few hosts; yet these hosts
are also connected in the company's own tcp/ip networks.  Also, there
are some hosts / groups of hosts on Nordunet which are for some reason
or another not allowed to access the US side of the internet although
they may be allowed to access the Nordunet and European side of the
network.  The domain name system and the MX system don't offer tools
to handle situations like this which are rapidly becoming more and
more common as the network grows.

Well, here's the original document.  Take it as an idea put into words
for the first time; it may not be an ideal method but I think it could
be quite useful if implemented even in it's present form.

//Jyrki

Author: Jyrki Kuoppala (jkp@cs.hut.fi)
Last modified: Fri Aug  4 05:34:47 1989

The Software Location and Distrubution Service

or

How to Get All the Software In the World Without Knowing Where It Is

Document version 0.001


You all probably know the situation: you've got a new machine to get
up and working, or you just hear about a great program that is
available freely.  But where can you get it ?  You check your local
ftp server (if you are lucky enough to have one), then the one in your
state, then uunet.uu.net .. and you have to get dir -R from all of
them because you don't know what's the exact name of the software.
After an hour, you finally find the software; you ftp it home and
uncompress and untar it.  Then you take a look at the dates; it's from
year 1985, version 0.001.  Back to ftp'ing and consuming valuable
bandwidth from the network.

OK, so that's a bit overdoing it.  But there's got to be a better way
to do it.  I'm proposing the following solution: let's use the
Internet domain name server.  It already does quite a good job as a
very widely distributed database.  Also, it isn't confined to
resolving host names to addresses; already we have MX records to
handle sending mail.

That was the old days.

So let's take a time warp a year and a half to the future and see how
it works in the modern, well networked world (any resemblance to
persons, machines and software living or dead is purely coincidental
;-):

Let's see, I just read that emacs version 19.12 was published. Let's
install it.

jkp@sauna.hut.fi '~' 6: getsoftware -n emacs.gnu
Transferred edist-19.21.tar.Z from sauna.hut.fi, 6942321 bytes in 327 seconds.

The flag `-n' stands for newest version.

We were lucky, it was found near enough so we didn't have to wait for
it or for long because it was so near.  Of course, if the net was
up we could have gotten it anyhow even directly from it's home, but
now the transfer is not `costly' so we have good conscience because we
don't have to answer to the `The cost to get the software is 514
units and probably will be half that 10 hours from now, do you still
want to get it (y/n) ?' question.

So, it seems that others have already fetched it somewhere in or near
Finland.

Just for curiosity, let's see why we got it from sauna and who else
has it.

jkp@sauna.hut.fi '~' 7: nslookup
Default Server:  hut.fi
Address:  128.214.3.1

> set type=software
> emacs.gnu
Server:  hut.fi
Address:  128.214.3.1

emacs.gnu.software preference = 12, ftp server = prep.ai.mit.edu
emacs.gnu.software preference = 5, ftp server = freja.diku.dk
emacs.gnu.software preference = 3, ftp server = sauna.hut.fi
sauna.hut.fi     inet address = 128.214.3.119  pathname=pub/gnu/edist-19.21.tar.Z
etc.
> 

Some history about how this system was taken into use and what
caused it to evolve into the one we all know and use every day now:

Most software has some kind of central archive place / clearinghouse
for patches etc.  After all, software is generally written by someone
and even if the author doesn't have time to include bugfixes and do
work on the software, someone else usually does.  The problem was to
coordinate who is the 'owner' of the software; that is, who to send
the bug reports to and from where to get the newest version of the
software and all the 'official' bugfixes.

At first, the domain name server system wasn't be altered at all.  It
helps a lot even to know where to get the official version of the
software package.  Somebody just needed to register ie. the top-level
domain `.software' and coordinate the domains under that.  It was some
trouble to coordinate what domains fall under software, but many came
easily to mind:

gnu.software		for the GNU project software
net.software		for software published on the Usenet
- comp.unix.net.software: comp.sources.unix archives
- alt.net.software: alt.sources
- sources.net.amiga: comp.amiga.sources
- athena.software: the project Athena stuff
mail.software		for various mailers
editors.software	for editors

The advantages of the hierarchical system is that one person needn't
manage the huge amount of information concerning ALL the software
available.  Just as with the domain name server system, one
organisation(or person) keeps up-to-date the information about how the
reach one particular organization )or, in this case, piece or group of
software).

Of course, you may think, what did this solve, as you must anyway know
the name of the program, how does this differ from the old way of
distributing lists of where to get the software ?  Even with this
system, you still had to distribute lists - and even now, with the
system working quite well, but now they contain only the name of the
software in this domain system.

The important difference is that the list distributed (`The World
Software Catalog') now doesn't contain incorrect information, as it only
contains a list of software that has been written and not the places
where they can be gotten or version number information.  These can be
gotten from the `Software Location Service'.

Useful software quite rarely ceases to
exist, and even if it changes it's name, the old name can still be
kept in the domain name system pointing to the new name for some time.
As new software gets published, the person writing the software
allocates a name for it and write a short description of it to be added
to the distributed `Software Location Service'.  Also a mention of the
name and a very short description of the purpose of the software is
added to `The World Software Catalog'.  After we implemented the type
'software' in the domain system, we could put that additional short
description (version number, author, patchlevel) to go along with the
ftp server address.


OK, now we had implemented the Ultimate Software Location Service and
have a few thousand persons in different parts of the world keeping
the world-wide distributed database up to date.  Also, we have The
World Software Catalog (all of it freely distibutable of course, is
there any other kind of software ??) with descriptions posted monthly
to Usenet.  Of course, the catalog isn't very complete but then, if
you hear from a friend about a piece of software or happen to read
about it in a newsgroup you can always ask the software location
service more about it and look at it, even easily download it.

Back to our original problem.  We want to have the latest and greatest
version of Gnu emacs, and just want to say a command like `getsoftware
gnu.emacs' and after at most a few minutes emacs-18.54.tar.Z magically
appears in the current directory.  So, we now have an easy way to
locate the places where the software can be gotten from.

So what ?  The world still has distances, even though they are
diminishing rapidly.  With current technology, it wouldn't be very
nice if we just grab emacs from prep across the pond (remember, we are
in Finland now) when it already happens to be stored in the next-door
department's computer - nobody just happened to tell me because the
department has their own coffee room and ours has it's own.

It may also be that for some reason or another, it's not possible for
me to ftp to States at all.  There are some administrational and
political reasons this might be so; for example, Eunet (roughly the
organization responsible for uucp network in Europe) is planning to
set up a European TCP/IP network which could also be open for
commercial sites.  For these sites, specific clearance with the U.S.
networks people has to be made to connect to the U.S. side of the
world, although they might by default gain access to Nordunet, the
TCP/IP network of the universities in Scandinavia.

So the one thing we need is to decide which of the ftp servers is
closest to us, or to which the `cost' is most cheap (I damn well hope
that the `cost' isn't ever gonna be a literal cost - that is, you are charged by the
packet in an internet; it would very quickly destroy this great
community of sharing information and software, the whole idea behind
the old anonymous ftp and the present `Software Location and
Distribution Service'.  This was a bit of a problem in the internet.
It could't be easily determined.  Of course, you could ping all the
hosts which carry the software, but that's not very good use for the
network.

We needed to have a server for calculating `distances' for
different IP addresses.  Of course, it should also be distributed so
it fits the rapidly changing network where in practice the distance
from place a to place b can go to eternity in a link failure, for
example.  Ideally, this `distance server' should be integrated with
the `software location' domain server system, so when you ask for the
place where the get the software, the priorities are adjusted
according to the distance between the server and your host.  This way,
you still can get the software if you happen to be in a commercial
company whose policy is that it only has one internet gateway -
assuming that it already is somewhere at your company.

I'll skip over the implementation of the distance server as many of
you have probably studied it in connection with other network
technology; it has many more uses that this software location
services.  It wasn't be that hard to implement; with modern network
monitoring tools, much statistic information is collected to describe
the connectivity of different networks and those are easy to change
into `distance data'.  Perhaps surprisingly, however, the distance
calculation was the most difficult single obstacle to overcome in the
way to get the `Software Location and Distribution Service' into reality.

Back to business (that is, getting newest version of emacs).  Now we
know that we could get it from prep, but it isn't very wise since prep
is far, far away and net.gods will be angry if everybody overloads the
network and prep by getting emacs from there every day.  Also, now we
know that we can get it elsewhere (remember, the department next-door
has it).

But what if the next-door guy doesn't use emacs and just installed it
a few years ago to please some users and those users have left the job
?  Then the version of emacs he (pardon my sexism, I wish everyone
spoke Finnish, it doesn't have a different word for she / he) would have
probably is OLD.  You don't want that.

So, we again face the problem with old versions.  Why did old versions
stick around ?  Or, to get to the root of the problem, how did the
next-door guy get the emacs in the first place ?  You guessed it, he
manually grabbed it from prep and after that just forgot it on his
anon. ftp area.  Why did he have to do it manually ?  Yep, because
back then we didn't have this great location and distribution service.

Back in the old days, before the `Software Location System' was
working, the main channels of distribution was that somebody just
heard in a coffee or lunch break (oh mine, where would we be if we
wouldn't have to eat / dirnk coffee) about a great piece of software
and traced it to it's origin.  Then, being a nice folk, she also put it
for anonymous ftp in her machine after having to first convince her
boss that she wasn't just wasting the University's money for nothing,
that it benefited all Universities in the country (you didn't believe
this was the Real World now, did you?).  That is, almost all anonymous
ftp areas were managed by volunteers doing it on the side of their
Real jobs.

But that was in the old days.  Now, of course, there's no such thing
as an `old version' for anonymous ftp unless you specificly ask for
it.  What changed the thing ?

Remember, we have the `Software Location Service'.  Also, it
calculates the distance from the software needer and the provider.
Now, we also have the unwritten law that every organization who joins
the network as a routine matter provides at least 200 meg (or more
for commercial organizations) of disk space for the `general good' to
keep the software service working.  So, every time somebody asks for a
piece of software, the priority of it is calculated as usually.  After
that, a version number is asked from the one special `home server' of
the software and if it differs from the version number of the
`cheapest' server, a message is sent to the `cheapest' server to throw
away that software.

Also, every time somebody asks for the software, a counter is added to
keep statistics where the software is needed.  Based on these
statistics, we send messages to the ftp servers near the area where
the software asker is to get the software.  The servers may decide to
ignore the messages, if other software gets more demand.  Anyhow, the
idea is that servers keep a cache of the software needed by the
clients.  If the distance to all the servers for a certain software is
too big (in other words, the hosts are not reachable) the service
sends a message to a server nearby the client with a flag that this
order should be carried out; the client gets a message to try again
later and an approximation of the time how long the software will take
to get near enough.

That's all there is to it !

Stay tuned, now that we have this service working as well as it is, we
are taking a look at `The Worldwide Electronic Telephone Catalog' and
`The Worldwide Newspaper Archive Service'.  Of course you already
heard about `The Internet's Guide to Travelling', now in the
implementation phase, an almost-real-time travel planning system which
lets you plan your trip all across the world, calculates current
prices and even takes into account various strikes which might be
going on.

-------
Back to August of year 1989.

Don't tell me, there's OSI.  It probably has all this already
implemented and in addition to it it can cook your morning coffee and
wash your dirty laundry, huh ?  Please tell me if this is so, and
where to FTP it from ;-)

Happy hacking,
-- 
Jyrki Kuoppala    Helsinki University of Technology, Finland.
Internet :        jkp@cs.hut.fi           [128.214.3.119]
BITNET :          jkp@fingate.bitnet      Gravity is a myth, the Earth sucks!