page@ulowell.UUCP (Bob Page) (05/30/86)
I have been following the talk/backbone discussion from a distance, since we are lucky enough to have about a dozen Usenet neighbors that are all local calls. However, I *am* paying attention, as the net result (sorry, couldn't resist) matters greatly. I'd like to know what you think of this proposal for sending news. I do not have the time to implement all of this, but I would be willing to coordinate the effort, as well as code where/when I could. Let me know. ------------------------------------- Sending News by Prioritized Batches A Proposal Bob Page (ulowell!page) May 29, 1986 PART 0: INTRODUCTION: One of the major problems with Usenet is the cost of sending it. Reducing the volume that gets sent is obviously the way to go. We seem to be heading in two directions: to reduce the size and number of postings (look for included text, junker, etc.), and reduce the total amount of time spent transferring the news (priorities, subgroups like talk.all, etc.). I think both ways should be explored and implemented for the optimal solution. This proposal is of the latter persuasion. In fact, Dave Taylor's remarks on newsgroup priority are similar, since this is a system based on newsgroup priority. Here are some of the nuts and bolts. You have just been shown the latest phone bill, and are told to cut it by x percent. Well, that x percent can be expressed as a number. We'll say you can only spend $400/month on phone charges, for the purposes of this proposal. I'm also neglecting non-news costs, but you can figure that in for your equation (say $50 for mail means $350 for news), and adjust where/when necessary. ------------------------------------ PART 1: THE SITE NEWS CONNECT FILE You set up a Site News Connect file ("SNC file") for every site that you feed news to that contains the following information: #S day-of-month-of-start-of-billing-information #B budget-for-this-billing-period #R time-period-one-and-phone-connect-rate #R time-period-two-and-phone-connect-rate #R time-period-three-and-phone-connect-rate (etc.) #G group-name-and-priority #G group-name-and-priority #G group-name-and-priority (etc.) Order of lines is not relevant, as long as the info is there. A completed entry might look like: #S 5 (Telco bill period: 5th to 4th) #B 350.00 ($350 allotted for this period) #R 0800-1700Mo-Fr:0.24 (24c/min during 'day' time) #R 1700-2300Sa-Fr:0.17 (17c/min during 'evening' time) #R 2300-0800Sa-Fr:0.11 (11c/min during 'night' time) #R 0800-2300Mo-Fr:0.17 (17c/min during 'weekend' time) #G mod.sources:A (A is highest priority) #G net.sources:B (B is lower than A, still high) #G net.abortion:X (X is pretty low priority) #G net.jokes:U (U is higher than X, still low) #G mod.amiga:B (same prio as net.sources) #G net.news.group:C (still rather high prio) (etc.) Defaults for #G entries are M (Middle of alphabet, think Middle or Medium priority), if a newsgroup is not specified, it is a "priority M" newsgroup. Defaults for the other entries can be decided on by local conventions when you build the news software. All go in the /usr/lib/news/snc directory. One SNC file for each site. The FEEDING site has this file; you'll have to send the site's News Admin your site's SNC if you poll them for news, so they know what your priorities are for news feeding. It is possible for abuse here, since the system you poll for news may decide to change your SNC w/o you knowing, but between cooperating sites, it seems too remote to worry about. After all, their transfer time will be down too. Maybe in the future we could add a control message that runs out to the sites in your sys file and checks/updates your SNC file upon request. In fact, it could be the first thing transferred before a news feed starts, I suppose, so it's always current. Another piece of data needed is the amount of money that's actually been spent this billing period. This goes into a separate file than the SNC, since we don't want the news/uucp software messing with the SNC. So a "costs" file, such as /usr/lib/news/costs has one line per site that looks like: sitename: cost-so-far-this-period cost-so-far-today An example file could be: cbosgd: 43.12 0.00 decvax: 26.73 3.90 ucbvax: 531.67 45.28 ulowell: 130.44 12.63 The cost fields are reset at the appropriate times. cost-so-far-today is needed when more than one news tranmission takes place over the course of a day. ----------------- PART II: BATCHING When the batcher runs, it batches like it does now, except for two things: the batched file name is different, and the batcher works on the group-priority scheme above. All "priority A" groups are batched together, all "priority B" groups together, and so on. Batched file names look like: sitenamePyddds Where "sitenam" is the first seven characters of the site name you are feeding, P is the priority of the batch (A-Z), y is 0-9, indicating the last digit of the year (we'll have minor problems at the turn of each decade but they're easily handled), ddd is the number of days in the year so far, and n is the batch sequence character. A typical batch file would have a name like: ulowellM6150b which would mean "batch to ulowell, this batch priority M, batch created May 30, 1986, second batch today for this site at this priority." This naming scheme was chosen for a few reasons: o It is easily understandable. o It is easy to change the overall priority of the batch if one so desires. Renaming the file "ulowellL6150a" would put it essentially at priority L if that's what you want to do. Similar for dates, too. This "manual override" is what will make transitions between decades easy. A script could handle it if need be. o It is no larger than 14 characters, the limit on some UN*X systems. o A straight ASCII sort of the filenames will present the batches in priority order for transmission. ------------------------------ PART 3: SENDING THE BATCHES Sendbatch (or an equivalent) grabs each batch file (already sorted by filename, so already in priority order) and transmits it. Sendbatch computes the cost of sending the batch, based on the time it takes to send each batch and the current connect rate from the SNC data. It then adds the cost of the transfer to the amount stashed away in the costs file. If the total is within 5% of the budget field (or greater than the budgeted amount!), no more batches will be sent, and a mail message is sent to user 'news' on both machines. In the usual case, sendbatch will send (Budget - total-so-far) / number-of-days-left-this-period units (dollars, francs, pounds, etc.) worth of news each day, as to pace the amount of news transmitted. If a site goes down for a few days, that's no problem, since when it comes back up and news resumes, the amount/day to transfer will be larger. News admins can muck with the SNC to fool sendbatch in special cases, like their estimate was too low, too high, rates changed, etc. ----------------- PART 4: CONCLUSIONS: In operation, a site will not get all of its news today, and some of the lower-priority groups don't get sent. Tomorrow, the batches that weren't sent are still there, along with the new day's batches. The effect of this is (voila!) within priority levels, the oldest batches are sent first. Result is increased delay for lower-priority groups, but the groups are still carried. Batches that do not get transmitted after a certain amount of time (default one week) get deleted silently. This system, when implemented, will hold news costs to a budgeted level. Any other advances in news transmission (higher speed transfers, less overall volume, etc.) in the future will not effect the batching operation as long as it knows transmission costs and budgeted amounts per site. The entire operation is handled on a per-site basis. YOU say what your budget is, and what groups are important to you. If the less- important groups manage to make it through (because of low volume in the more important groups), that's fine, since you'll still be under budget. -- UUCP: wanginst!ulowell!page Bob Page, U of Lowell CS Dept VOX: +1 617 452 5000 x2233 Lowell MA 01854 USA