jkp@cs.HUT.FI (Jyrki Kuoppala) (09/06/89)
Motivated by the Internet Crucible's mention of the lack of new applications of the Internet I'm posting this to the tcp-ip newsgroup. I think it could well be possible to implement, but it certainly needs more work. A RFC should probably be written about the protocol enhancements. One of the biggest problems which does not only concern applications like this is the failure of the tcp/ip protcols in general to cope well with the situation that the Internet is a combination of several networks which are not necessarily globally interconnected. For example, there already are several commercial sites to which direct tcp/ip connections are possible only to a few hosts; yet these hosts are also connected in the company's own tcp/ip networks. Also, there are some hosts / groups of hosts on Nordunet which are for some reason or another not allowed to access the US side of the internet although they may be allowed to access the Nordunet and European side of the network. The domain name system and the MX system don't offer tools to handle situations like this which are rapidly becoming more and more common as the network grows. Well, here's the original document. Take it as an idea put into words for the first time; it may not be an ideal method but I think it could be quite useful if implemented even in it's present form. //Jyrki Author: Jyrki Kuoppala (jkp@cs.hut.fi) Last modified: Fri Aug 4 05:34:47 1989 The Software Location and Distrubution Service or How to Get All the Software In the World Without Knowing Where It Is Document version 0.001 You all probably know the situation: you've got a new machine to get up and working, or you just hear about a great program that is available freely. But where can you get it ? You check your local ftp server (if you are lucky enough to have one), then the one in your state, then uunet.uu.net .. and you have to get dir -R from all of them because you don't know what's the exact name of the software. After an hour, you finally find the software; you ftp it home and uncompress and untar it. Then you take a look at the dates; it's from year 1985, version 0.001. Back to ftp'ing and consuming valuable bandwidth from the network. OK, so that's a bit overdoing it. But there's got to be a better way to do it. I'm proposing the following solution: let's use the Internet domain name server. It already does quite a good job as a very widely distributed database. Also, it isn't confined to resolving host names to addresses; already we have MX records to handle sending mail. That was the old days. So let's take a time warp a year and a half to the future and see how it works in the modern, well networked world (any resemblance to persons, machines and software living or dead is purely coincidental ;-): Let's see, I just read that emacs version 19.12 was published. Let's install it. jkp@sauna.hut.fi '~' 6: getsoftware -n emacs.gnu Transferred edist-19.21.tar.Z from sauna.hut.fi, 6942321 bytes in 327 seconds. The flag `-n' stands for newest version. We were lucky, it was found near enough so we didn't have to wait for it or for long because it was so near. Of course, if the net was up we could have gotten it anyhow even directly from it's home, but now the transfer is not `costly' so we have good conscience because we don't have to answer to the `The cost to get the software is 514 units and probably will be half that 10 hours from now, do you still want to get it (y/n) ?' question. So, it seems that others have already fetched it somewhere in or near Finland. Just for curiosity, let's see why we got it from sauna and who else has it. jkp@sauna.hut.fi '~' 7: nslookup Default Server: hut.fi Address: 128.214.3.1 > set type=software > emacs.gnu Server: hut.fi Address: 128.214.3.1 emacs.gnu.software preference = 12, ftp server = prep.ai.mit.edu emacs.gnu.software preference = 5, ftp server = freja.diku.dk emacs.gnu.software preference = 3, ftp server = sauna.hut.fi sauna.hut.fi inet address = 128.214.3.119 pathname=pub/gnu/edist-19.21.tar.Z etc. > Some history about how this system was taken into use and what caused it to evolve into the one we all know and use every day now: Most software has some kind of central archive place / clearinghouse for patches etc. After all, software is generally written by someone and even if the author doesn't have time to include bugfixes and do work on the software, someone else usually does. The problem was to coordinate who is the 'owner' of the software; that is, who to send the bug reports to and from where to get the newest version of the software and all the 'official' bugfixes. At first, the domain name server system wasn't be altered at all. It helps a lot even to know where to get the official version of the software package. Somebody just needed to register ie. the top-level domain `.software' and coordinate the domains under that. It was some trouble to coordinate what domains fall under software, but many came easily to mind: gnu.software for the GNU project software net.software for software published on the Usenet - comp.unix.net.software: comp.sources.unix archives - alt.net.software: alt.sources - sources.net.amiga: comp.amiga.sources - athena.software: the project Athena stuff mail.software for various mailers editors.software for editors The advantages of the hierarchical system is that one person needn't manage the huge amount of information concerning ALL the software available. Just as with the domain name server system, one organisation(or person) keeps up-to-date the information about how the reach one particular organization )or, in this case, piece or group of software). Of course, you may think, what did this solve, as you must anyway know the name of the program, how does this differ from the old way of distributing lists of where to get the software ? Even with this system, you still had to distribute lists - and even now, with the system working quite well, but now they contain only the name of the software in this domain system. The important difference is that the list distributed (`The World Software Catalog') now doesn't contain incorrect information, as it only contains a list of software that has been written and not the places where they can be gotten or version number information. These can be gotten from the `Software Location Service'. Useful software quite rarely ceases to exist, and even if it changes it's name, the old name can still be kept in the domain name system pointing to the new name for some time. As new software gets published, the person writing the software allocates a name for it and write a short description of it to be added to the distributed `Software Location Service'. Also a mention of the name and a very short description of the purpose of the software is added to `The World Software Catalog'. After we implemented the type 'software' in the domain system, we could put that additional short description (version number, author, patchlevel) to go along with the ftp server address. OK, now we had implemented the Ultimate Software Location Service and have a few thousand persons in different parts of the world keeping the world-wide distributed database up to date. Also, we have The World Software Catalog (all of it freely distibutable of course, is there any other kind of software ??) with descriptions posted monthly to Usenet. Of course, the catalog isn't very complete but then, if you hear from a friend about a piece of software or happen to read about it in a newsgroup you can always ask the software location service more about it and look at it, even easily download it. Back to our original problem. We want to have the latest and greatest version of Gnu emacs, and just want to say a command like `getsoftware gnu.emacs' and after at most a few minutes emacs-18.54.tar.Z magically appears in the current directory. So, we now have an easy way to locate the places where the software can be gotten from. So what ? The world still has distances, even though they are diminishing rapidly. With current technology, it wouldn't be very nice if we just grab emacs from prep across the pond (remember, we are in Finland now) when it already happens to be stored in the next-door department's computer - nobody just happened to tell me because the department has their own coffee room and ours has it's own. It may also be that for some reason or another, it's not possible for me to ftp to States at all. There are some administrational and political reasons this might be so; for example, Eunet (roughly the organization responsible for uucp network in Europe) is planning to set up a European TCP/IP network which could also be open for commercial sites. For these sites, specific clearance with the U.S. networks people has to be made to connect to the U.S. side of the world, although they might by default gain access to Nordunet, the TCP/IP network of the universities in Scandinavia. So the one thing we need is to decide which of the ftp servers is closest to us, or to which the `cost' is most cheap (I damn well hope that the `cost' isn't ever gonna be a literal cost - that is, you are charged by the packet in an internet; it would very quickly destroy this great community of sharing information and software, the whole idea behind the old anonymous ftp and the present `Software Location and Distribution Service'. This was a bit of a problem in the internet. It could't be easily determined. Of course, you could ping all the hosts which carry the software, but that's not very good use for the network. We needed to have a server for calculating `distances' for different IP addresses. Of course, it should also be distributed so it fits the rapidly changing network where in practice the distance from place a to place b can go to eternity in a link failure, for example. Ideally, this `distance server' should be integrated with the `software location' domain server system, so when you ask for the place where the get the software, the priorities are adjusted according to the distance between the server and your host. This way, you still can get the software if you happen to be in a commercial company whose policy is that it only has one internet gateway - assuming that it already is somewhere at your company. I'll skip over the implementation of the distance server as many of you have probably studied it in connection with other network technology; it has many more uses that this software location services. It wasn't be that hard to implement; with modern network monitoring tools, much statistic information is collected to describe the connectivity of different networks and those are easy to change into `distance data'. Perhaps surprisingly, however, the distance calculation was the most difficult single obstacle to overcome in the way to get the `Software Location and Distribution Service' into reality. Back to business (that is, getting newest version of emacs). Now we know that we could get it from prep, but it isn't very wise since prep is far, far away and net.gods will be angry if everybody overloads the network and prep by getting emacs from there every day. Also, now we know that we can get it elsewhere (remember, the department next-door has it). But what if the next-door guy doesn't use emacs and just installed it a few years ago to please some users and those users have left the job ? Then the version of emacs he (pardon my sexism, I wish everyone spoke Finnish, it doesn't have a different word for she / he) would have probably is OLD. You don't want that. So, we again face the problem with old versions. Why did old versions stick around ? Or, to get to the root of the problem, how did the next-door guy get the emacs in the first place ? You guessed it, he manually grabbed it from prep and after that just forgot it on his anon. ftp area. Why did he have to do it manually ? Yep, because back then we didn't have this great location and distribution service. Back in the old days, before the `Software Location System' was working, the main channels of distribution was that somebody just heard in a coffee or lunch break (oh mine, where would we be if we wouldn't have to eat / dirnk coffee) about a great piece of software and traced it to it's origin. Then, being a nice folk, she also put it for anonymous ftp in her machine after having to first convince her boss that she wasn't just wasting the University's money for nothing, that it benefited all Universities in the country (you didn't believe this was the Real World now, did you?). That is, almost all anonymous ftp areas were managed by volunteers doing it on the side of their Real jobs. But that was in the old days. Now, of course, there's no such thing as an `old version' for anonymous ftp unless you specificly ask for it. What changed the thing ? Remember, we have the `Software Location Service'. Also, it calculates the distance from the software needer and the provider. Now, we also have the unwritten law that every organization who joins the network as a routine matter provides at least 200 meg (or more for commercial organizations) of disk space for the `general good' to keep the software service working. So, every time somebody asks for a piece of software, the priority of it is calculated as usually. After that, a version number is asked from the one special `home server' of the software and if it differs from the version number of the `cheapest' server, a message is sent to the `cheapest' server to throw away that software. Also, every time somebody asks for the software, a counter is added to keep statistics where the software is needed. Based on these statistics, we send messages to the ftp servers near the area where the software asker is to get the software. The servers may decide to ignore the messages, if other software gets more demand. Anyhow, the idea is that servers keep a cache of the software needed by the clients. If the distance to all the servers for a certain software is too big (in other words, the hosts are not reachable) the service sends a message to a server nearby the client with a flag that this order should be carried out; the client gets a message to try again later and an approximation of the time how long the software will take to get near enough. That's all there is to it ! Stay tuned, now that we have this service working as well as it is, we are taking a look at `The Worldwide Electronic Telephone Catalog' and `The Worldwide Newspaper Archive Service'. Of course you already heard about `The Internet's Guide to Travelling', now in the implementation phase, an almost-real-time travel planning system which lets you plan your trip all across the world, calculates current prices and even takes into account various strikes which might be going on. ------- Back to August of year 1989. Don't tell me, there's OSI. It probably has all this already implemented and in addition to it it can cook your morning coffee and wash your dirty laundry, huh ? Please tell me if this is so, and where to FTP it from ;-) Happy hacking, -- Jyrki Kuoppala Helsinki University of Technology, Finland. Internet : jkp@cs.hut.fi [128.214.3.119] BITNET : jkp@fingate.bitnet Gravity is a myth, the Earth sucks!