richv@hpinddu.cup.hp.com (Rich Van Gaasbeck) (02/07/91)
Here is a problem that is probably common to many large news-reading organizations. Scenario A: "Typical". Say you have a large population of news readers (10,000 people), a large number of news machines (100) each with some disk space (200Meg). This might typically be arranged as 100 machines running B or C news, with 100 users on each machine. Lets say 200Meg holds about 1 Month of news. Lets say that you decide that 1 Month's worth of news is not enough. You want to keep "local" groups and other groups that your organization thinks important for several years. Lets say that you think that you will need 2 Gigabytes to store one copy. Lets also assume that your organization doesn't want to buy 2 Gigabytes for each of 100 machines. Given the resorce limitation and your realization that each machine is storing pretty much the same information, you come up with a new scheme for your local news network. Scenario B: "Central". You gather all the spool disk drives from all the news machines and put them on one super colossal machine. Users read news using an nntp based news reader or log in via telnet to use a local news reader. This doesn't work out very well either. You may not find a machine fast enough to handle 10,000 users. If your organization is world-wide there may be no time when it is night everwhere so it will be imposible to take down the system to do backups. In fact if the system goes down all 10,000 users can't read their news. Here is the way I would like to see C-news/nntp work. I think it would be possible to do but I don't think it can today (could be wrong, I suppose I should look at the docs and source). Scenario C: "Ideal". Start with Scenario A. Take away 100 Meg from each machine and give it to a central machine (for a total of 10 Gigabytes). Configure the 99 "local" machines to automatically expire articles that haven't been read recently (optionally just use the current expire mechanism with parameters to keep it under 100 megs). Change the nntp daemon to list both local information and information from the central machine when asked about active newsgroups, headers, etc. When asked to retrieve an article it would get it from the central server if necessary, give it to the user and also store it in the local spool. Basically it would act as a giant cache. To the news reading programs it would look like a single central machine, but with the advantages that the central machine would be less busy, could be located on the far side of a low performance network and the local machines could continue to allow access to cached articles while the central machine is down. Additionally a larger amount of news can be made available to a greater number of people while using much less disk space. Slight variations could also be useful. If your caching algorithm has a high hit rate you might be able to get by with a much smaller spool (5 or 10 Meg). You might also want to distribute the central "machine". For instance one "central" machine might hold comp.sources.unix, a different one rec.*. Like I mentioned above, I don't think that the current c-news/nntp implementation can handle Scenario C. Several parts are missing. I don't think c-news can expire articles based on readership patterns. I also don't think that nntpd can keep track of a local spool and potentially several remote sources of articles and present them to the remote news reader as if they came from the same machine. I would be interested in hearing any comments on the above. Specifically, does this seem like a common problem for large organizations? Does this mixture of nntp, c-news and caching concepts look like a good solution to the problem? Are either the nntp or c-news authors doing any work in this area? What kinds of changes would be needed to the c-news and nntp sources to make Scenario C work? Richv richv%hpinddu.cup.hp.com@hplabs.hpl.hp.com
jerry@olivey.olivetti.com (Jerry Aguirre) (02/08/91)
In article <RICHV.91Feb6175707@hpinddr.cup.hp.com> richv@hpinddu.cup.hp.com (Rich Van Gaasbeck) writes: >Scenario C: "Ideal". Start with Scenario A. Take away 100 Meg from >each machine and give it to a central machine (for a total of 10 >Gigabytes). Configure the 99 "local" machines to automatically expire Another point that would interfear with this scheme is that existing news transmission does not preserve the article number and current news readers depend on that. Thus if the user is reading along thru article 98, 99, 100, and article 101 is not available on the local system then when it goes to to the master system it is possible that article 100 will be the same as article 98 and article 105 is what was intended. This is very evident if one switches from one news server to another. Even if they start out aligned the cancel messages alone will guarantee that they gradually drift apart. I have considered writing an "nntpslave" that would function like nntpxfer but instead of processing the article thru rnews it would just store it in the appropriate place in the news spool partition. (The "Xref" line would come in handy for cross posted articles.) Given that plus a copy of the master system's active and history files one would have a slave server that could be used transparently with the master or other slaves. But that does not, by itself, eliminate the duplicate storage. One could split the storage and servers by news groups with one system handling alt, another handling comp, etc. There was a version of the "vn" news reader that was set up to do that. It would switch NNTP servers based on the group being read. That would distribute the load and the storage among many systems. Presumably a master system would deal with the external world and redistribute to the partial slave systems. If one was really spread out across time zones, such as America and Europe, then I would strongly recomend different servers for each geographical area. As you say there would never be a good time to take the system down and then there is the question of transmitting the article every time someone reads it rather than once. If the only problem is to handle reading by a large number of local users then there are specialized servers that can handle very large amounts of NFS and disk traffic. Jerry Aguirre
gary@proa.sv.dg.com (Gary Bridgewater) (02/09/91)
In article <50331@olivea.atc.olivetti.com> jerry@olivey.olivetti.com (Jerry Aguirre) writes: >In article <RICHV.91Feb6175707@hpinddr.cup.hp.com> richv@hpinddu.cup.hp.com (Rich Van Gaasbeck) writes: >>Scenario C: "Ideal". Start with Scenario A. Take away 100 Meg from >>each machine and give it to a central machine (for a total of 10 >>Gigabytes). Configure the 99 "local" machines to automatically expire > >Another point that would interfear with this scheme is that existing >news transmission does not preserve the article number and current news >readers depend on that. Thus if the user is reading along thru article >98, 99, 100, and article 101 is not available on the local system then >when it goes to to the master system it is possible that article 100 will >be the same as article 98 and article 105 is what was intended. This is >very evident if one switches from one news server to another. Even if >they start out aligned the cancel messages alone will guarantee that >they gradually drift apart. >... This might be a an application for the broadcast packet technology that is being experimented with by the tcp/ip folks. Have multiple NNTP server systems with each one having some portion of the hierarchy. As the news is fed in, it is sent out on the net and the appropriate server wakes up and stores it locally. Cross-posted articles across servers would just get saved more than once but that could possibly be worked around ( or maybe that is a feature). When the user reads news, broadcast packets for the groups of interest are sent out and the server(s) responsible would respond. The client nntp would then pick a server (if more than one responded) and make a "conventional" connection to transfer the goups. In fact, if such redundancy were designed in, I expect a lot of sites would jump on it. You would never have to worry about losing your ~spool/news partition again. -- Gary Bridgewater, Data General Corporation, Sunnyvale California gary@sv.dg.com or {amdahl,aeras,amdcad}!dgcad!gary C++ - it's the right thing to do.
kurt@rufus.almaden.ibm.com (Kurt Shoens) (02/12/91)
Rich Van Gaasbeck (richv@hpinddu.cup.hp.com) ... Scenario C: "Ideal" ... Change the nntp daemon to list both local information and information from the central machine when asked about active newsgroups, headers, etc. When asked to retrieve an article it would get it from the central server if necessary, give it to the user and also store it in the local spool. This looks similar to the Andrew File System (AFS). Perhaps you could get the same effect by having your clients AFS mount the news spool. AFS will cache recently read files locally on the clients. By such a scheme, one would have only a single news system (that on the server) to administer. You could handle many many news-reading clients by building a central server with lots of disk space to hold all the articles and a number of AFS clients connected to the central server that would each run nntp to handle the news-reading load. The AFS clients would tend to cache articles that had appeared in the last few days .... With respect to expire ... you can easily modify expire to use different criteria for deleting articles. The approximate form is: gather info on all articles one might potentially delete, sort the info by order of desirability, and remove as many of the least desirable articles as you have to to meet some goal (say, a particular amount of free space). Now, if you had a few years of netnews online, it might be useful to reengineer the storage structure of the history file. As it currently stands, the history file has to be read and written completely each time you expire. -- Kurt Shoens