[news.admin] Map expiration dates

blm@cxsea.UUCP (Brian Matthews) (01/11/88)

I was wondering why the uucp maps posted to comp.mail.maps have such long
expiration periods, about a month and a half.

It seems to me that if someone is using the maps, they'll extract them within a
week or so, and not need the articles anymore.  If they aren't using the maps,
they don't need the articles at all, and certainly not for a month and a half.
Do people use the maps in such a way that they do need the articles for a long
period of time?

I realize I can explicitly expire the maps ignoring the expiration date.
However, if everyone has to do this, might it not make more sense to not have
the expiration dates in the first place?

This isn't meant as a flame, I'm just curious as to why the maps have the
explicit expiration date.

-- 
Brian L. Matthews                               "A power tool is not a toy.
...{mnetor,uw-beaver!ssc-vax}!cxsea!blm          Unix is a power tool."
+1 206 251 6811
Computer X Inc. - a division of Motorola New Enterprises

pleasant@rutgers.rutgers.edu (Mel Pleasant) (01/12/88)

In article <2323@cxsea.UUCP> blm@cxsea.UUCP (Brian Matthews) writes:

> I was wondering why the uucp maps posted to comp.mail.maps have such long
> expiration periods, about a month and a half.
> . . . .

Back when I did postings only once a month, being late by one day meant that
sites would be without maps at all.  Like you, I thought that this wouldn't
be such a big deal.  As you point out, the maps are read by a program and
after that they aren't needed, right?  Well, wrongo!!  The number of people
who came out of the woodwork complaining of loss of access to map files was
astounding.  I still find it hard to believe!!  There are many more people,
more than one would imagine, that actually use the map files to generate
paths by hand.  When the map files expire they're at a loss.  You would
think that any one person wouldn't use the map files to manually generate
paths all that often.  After all, a person is likely to generate a path to
another site once and use the path until it fails, right?  Well, the number
of complaints I received would indicate that if this assertion is true then
the number of people manually generating paths is significant.

At any rate, the postings are built around a 30 day cycle.  Even with the
new distribution system in place, the cycle still exists but it isn't so
obvious.  We are now posting on the 1st of each month those map files that
have not been updated in 30 days.  Given the 45 day expiration period, this
should mean that the files are always present.  We could change the
expiration period by making it shorter, even shorter than 30 days.  This
will have an impact on those who manually generate paths.  Ultimately, we
may all decide that this is the way to go anyway.  If we do, I just wanted
to make sure that we did so with the knowledge that there would be persons
affected by such a change.

---

Given this response, there will no doubt be other comments vis a vis the map
posting procedures.  I'm all ears.  For those of you interested in lowering
the volume, here is a challenge for you.  How can it be done such that we
never have to make a full one-shot posting of all of the files ever again?
Well, let's make it slightly simpler ...  A change in the procedure *may*
require another full posting just to get it started.  Ok, given that we do
one more full posting, now answer the same question e.g. no more full
postings beyond what's needed to start a new procedure....  The problem with
"diff" postings is that it doesn't do you any good if you don't have the
original file.  You're also sunk if "diff" postings arrive out of order or
not at all, something not unheard of in the netnews system.  Depending upon
"diff" postings in the general case would mean full postings at some larger
interval.  Yuck....

The UUCP Mapping Project
-- 

                                  Mel Pleasant
 {backbone}!rutgers!pleasant   pleasant@rutgers.edu     mpleasant@zodiac.bitnet

blm@cxsea.UUCP (Brian Matthews) (01/13/88)

Mel Pleasant (pleasant@rutgers.rutgers.edu) writes:
|In article <2323@cxsea.UUCP> blm@cxsea.UUCP (Brian Matthews) writes:
|> I was wondering why the uucp maps posted to comp.mail.maps have such long
|> expiration periods, about a month and a half.
|Back when I did postings only once a month, being late by one day meant that
|sites would be without maps at all.  Like you, I thought that this wouldn't
|be such a big deal.  As you point out, the maps are read by a program and
|after that they aren't needed, right?  Well, wrongo!!  The number of people
|who came out of the woodwork complaining of loss of access to map files was
|astounding.  I still find it hard to believe!!  There are many more people,
|more than one would imagine, that actually use the map files to generate
|paths by hand.

Gak!  Don't these people know what computers are for??  :-).

|Given this response, there will no doubt be other comments vis a vis the map
|posting procedures.
|...
|The problem with
|"diff" postings is that it doesn't do you any good if you don't have the
|original file.  You're also sunk if "diff" postings arrive out of order or
|not at all, something not unheard of in the netnews system.  Depending upon
|"diff" postings in the general case would mean full postings at some larger
|interval.

I was considering this very question, and decided to see just how much
smaller diff postings would be (ignoring for the moment the problems that
Mel points out with diff postings).  I took some map shell scripts from
comp.mail.maps, unpacked them in a different directory than my normal map
directory, and ran diff (note:  not a context diff) against the resulting
files and the previous versions of the same files.  On the average, the
diffs were about the same size!  And remember, this is a normal diff.
The much preferable context diffs were almost always larger than the
original map file!  Now, admittedly I only tested about 20 files, so they
may have been particularly bad examples.  It was still surprising.

-- 
Brian L. Matthews                               "A power tool is not a toy.
...{mnetor,uw-beaver!ssc-vax}!cxsea!blm          Unix is a power tool."
+1 206 251 6811
Computer X Inc. - a division of Motorola New Enterprises

jerry@oliveb.olivetti.com (Jerry Aguirre) (01/14/88)

In article <7815@rutgers.rutgers.edu> pleasant@rutgers.rutgers.edu (Mel Pleasant) writes:
>Given this response, there will no doubt be other comments vis a vis the map
>posting procedures.  I'm all ears.  For those of you interested in lowering
>the volume, here is a challenge for you.  How can it be done such that we
>never have to make a full one-shot posting of all of the files ever again?

There is an obvious solution that has been previously suggested.  Break
the d.* and u.* files into the individual sites.  This is the increment
that gets changed when a site sends in an update or a new entry.  It
also represents the minimum useful stand alone value.

Getting patches when you don't have the original does cause problems.
But if I get the entire new entry for a site then it is useful without
the other entries in that particular [du].* file.  It also becomes
simple to find and distribute entries that haven't been updated in N
months.

The only disadvantage that I can see is that is will use up more inodes
and disk space.  Right now we have ~140 [du].* files containing ~4000
site entries using about 1.7Meg.  Splitting this out would create 4000
files using about 3 Meg.  The directories could be kept reasonable by
using something like u/usa/ca/site to contain each entry.  It has also
been pointed out that a simple program could pack this into less space
and still provide individual site updates.  How about using "ar"?  It
seems to have the required functionality to replace and print entries.

I, for one, would be willing to have 3 Meg of pathalias data if it
would result in more up to date maps.  (I can't seem to get a entry
published in is less than 6 months.)  Any site that can't afford the
extra disk has several alternatives.  One is to grep out the '^#'
lines.  These are not used by pathalias and constitute 50% of the
disk storage!  This would cut their storage back to less than the
current method though they would still use more inodes.  Or they could
run an abbreviated pathalias and specify a nearby smart mailer.

I would suggest that we test breaking up the maps by splitting just the
d.* maps into their individual entries.  This represents an expansion
from 44 to 163 files and shouldn't break anybody.  Also, the domain
sites are paying for their entry, they should get quicker updating.
This way "d.olivetti.com" could be distributed as soon as it is
received instead of waiting to bundle it with the other 30 d.usa.ca.2
entries.

					Jerry Aguirre
					Systems Administration
					Olivetti ATC

rsalz@bbn.com (Rich Salz) (01/14/88)

In news.admin jerry@oliveb.UUCP (Jerry Aguirre) suggests posting the maps
as one file per site, using the d. files as an experiment.  Systems
short on inodes could pack the files into an archive (map ranlib, anyone?)
or a tarfile.

This is a great idea, and I hope the map folks give it some thought.
-- 
For comp.sources.unix stuff, mail to sources@uunet.uu.net.

usenet@nusdhub.UUCP (USENET News Admin) (01/14/88)

In article <7815@rutgers.rutgers.edu>, pleasant@rutgers.rutgers.edu (Mel Pleasant) writes:
> How can it be done such that we
> never have to make a full one-shot posting of all of the files ever again?
> Well, let's make it slightly simpler ...  A change in the procedure *may*
> require another full posting just to get it started.  Ok, given that we do
> one more full posting, now answer the same question e.g. no more full
> postings beyond what's needed to start a new procedure....

Hi,
	I am a relative novice at net-mail who has basicly no experience
with pathalias [etc.]  So I am going to give an "how I would do it from
scratch" off the top of my head.  My spelling sucks, so 'where the
mess ;->  THIS TEXT HAS NOT BEEN PROOFED, AND MAY BE HAZER-DOS TO YOUR
MENIAL HELTH %^) [but the idea is serious, and, I think, quite good]

These are my "given"s:
	1)  Backbone sites re-route almost everything.
	2)  Backbone sites _always_ re-route something in the form:
		backbone!machine!user where machine is non-adjcent.
	3)  The maps at backbone sites are generally up to date.
	4)  A user will "never" send any mail to a site of which that
		user has never heard.  [no mail, no news, no phone
		conv. etc.]
	5)  If the conversation is by mail only, the user either knows
		the complete path, or at least one that works.
	6)  As news is expensive to pass around, each site admin tries
		to pass that news allong the cheapest thread possible.
	7)  No site is really concerned with how it got the mail, what
		each site is interested in is getting rid of the
		mail.
	8)  It should not be difficult to have a "dummy batch listing"
		created which could be shipped off to another program.
		In fact all that would really be necessary is for a
		copy of every "Path:" header to be piped to this
		mapper.

Here is the mold for a news-based self-mapping scheme.:

	Each site will maintain two somple lists.  These lists will be
saved in a way similar to the way "history" info is saved under news. 
[i.e.  the cheapest/most look-up-able way convenient]  the first list
will be a relativly short list of those sites which would like to be
considered backbones for mail, followed  by the "easiest way to get
there" spelled out as basic text  [i.e. site!site!site!site]  This
field will be built from the main map.

	The construction of the next list sounds very complex, though it
should, in fact be rather easy, though a little messy.  For the
purposes of the following I will assume infinite memory availability
as I have no idea what the actual memory requirements would be.  These
are the structures [in C] required:

struct la_list {			/* Linked list of array segments */

struct la_list	*next;		/* Link Hook */
struct site_d	*uss[10]; /* Array of pointers to up stream sites */
unsigned long int lnnum[10]; /* "Line Numbers" */
int weght[10];  		/* weighting index*/
	}

struct treent {			/* Tree node entry		*/

struct site_d	*less, *great;	/* Lt and Gt branch pointers	*/
				/* NOTE: Each step must be double
				   indirected [through the site_d
				   entry itself]	 */
	}

struct site_d {			/* Site Discription		*/

struct treent name, lnum;	/* Structures for 2 trees	*/
struct la_list *adjsys;		/* pointer to head of linked list
				   describing adjcent system	*/
char	*sysname;		/* Name of system		*/
int	weight;			/* Systems "frequency of apearence"
					portrayed as a "weight" */
struct la_list *bkrout;		/* Backwards routing info	*/
				/* for convience in jumping around */
				/* [possibly optional ?????] */
	}

Each line of the file looks like:

sysname	weight [line#	weight]...

i.e. One System name, followed by that systems "weight", optionally
followed by zero or more line-number/weight pairs.  [This assumes
ASCII text representation] followed by a newline character.

	Primary Loading:

	At the start of processing the old file is loaded into this
"infinite memory" by the following sequence:
	The file is read sequentally, filling up one site_d structure
per line.
	A number of la_list structures sufficient to hold all the line
number/weight pairs in the respective arrays are malloc(ed) and linked
into a list.  These are filled with values.  All the pointers are
set to NULL and the first free array cross-section [i.e. the same
index is always used on each aray so that element 2 of each of the
three arays refers to one conceptual lump. etc.] has it's weight entry
set to -1 [an ilegal weight] to signify the end of valid entries.
	When the entire file has been loaded, the tree structures are
used to resolve the line number entries in the la_list as pointers to
the approprate site_d structures.  Durring this stage, the la_list
pointed to by *bkroute is filled with the weight from the refrencing
*adjsys and a pointer to the refrencing site_d structure. [This builds
a "back track" of sorts, and may not be necessary.]
	[Logically each system entry now possesses a list of the
systems adjcent to it.  Because of the way this list is created, it is
a set of weights as to the "next" system "from here".  See the
routing method below]

	Processing updates:

	In general this system is self updating.  The "Path:" headders
from all the news articles are pumped through the update system.  The
easiest way to generate this information is to setup rnews/inews to
"rippoff" a copy of the "Path:" line and drop it in a file durring
processing.
	As the system scans the "Path:" line It will always pre-fetch
two strings beyond the one it is currently working on.  Each line is
assumed to start with a "dummy" entry representing the current system.
For all intents and purposes the program will only deal with two cases
1) str1!str2!str3
2) str1!str2

	In both cases the site_d->weight of "str1" is de-incremented
while in case 1, "str2" is operated on relative to "str1", that is
to say a) if necessary "str2" is issued a site_d, b) if necessary a
la_list entry is created in "str1"'s la_list with a weight of 800, c)
"str2"'s weight in "str1"'a la_list is de-incremented, d) if "str2"'s
entry in "str1"'s la_list is less than 100 it is re-set to 100.  In
Case 2 the processing ends because "str2" is most likely a user, or
something equally spesific and un-useful.
	Any time a new site_d is created it is murged into the linked
list of _names_ and given a weight of 800.
	As a special ditinction to the above NO site with a weight
greater than 1000 will EVER be de-incremented.  period.
	After all the "Path:"s have been processed, execution 
continues with "Closing Processing"


	Special Case Processing:
	At times the wounderful-world-of-mapping people will want to
inform the net of bogus and dis-continued sites.  To do this they will
issue a  special message, the body of which will be formatted as a
group of lines with the following format:

Sitename	[+|-]weight

where the [+|-] is infrequent.
	After a standard loading of the data file, the normal path
processing will comence.  On completion of the normal path processing
for each site "Sitename" the weight "weight" will be substituted in
the site_d->weight entry.  If there is a "+" or "-" the signed quantity
will be applied as a modifier.
	When increments are used, no weight will be incremented above
1000 nor de-incremented below 100.
	Certan numbers have special meaning.  See "Closing Processing"
and "Ageing"

	Closing Processing:
	To end the updating of the map and write the Map out to the
disk, you follow this procedure:
	1) A Dummy entry representing the local system is generated by
using the local-system-name, a site_d->weight of 0, the contents of the
/usr/lib/news/sys file, and a local routing file.  This entry
supercedes any entry already in place for the local system.
	2) The tree of site_d structures as refrenced by Sitename is
walked least-to-greatest. Each site with a non-negitave number is
given a new line-number [in order from 1 to whatever]
	3) The tree is walked again, by Sitename, and all the la_list
entries are sorted from least-to-greatest current weight.  Step 4 may
also take place durring this walk, but the sort MUST precede the write
for greatest efficency.
	4) The tree is walked again, by Sitename [as the line-number
tree is long since ruined] and the Sitename and weight, followed by
all the line-numbers [fetched by pointer, not from the local aray, as
the local aray may no longer be valid] and weights are output to the
new "Map" file.  IMPORTANT:!!! Nodes possessing a negitive weight are
NOT output.  They have no line number, and they are simpley skipped.
This effectively delets them.
	5) The tree and lists are walked, and freed.
	6) The update program exits.

	Ageing:
	Once a day, the map file is "aged" by adding one (1) to every
site-weight [column 2] which is greater than 100 and less than 1000.

	Weight Interpretation:

0		Local Site.
1-99		Official Bakcbones.
100		Backbone-by-courtisy [un-official, busy sites]
101-999		Normal Sites.
1000		Seveerly aged [generally dis-used] sites.
1001->		Restricted Sites.  [No mail/unreliable/etc]


	Figuring Routes:

	I have laid out a good mapping scheme, I do not presently
possess the math or algorythm for reducing the possible choices to
check, but the method of comparing one possible entry to another is as
follows:
	1) preform the standard starting load.
For each path, or as you calculate you must
	2) make a sum of:
	2a) The weight associated with each site multiplyed by an
		urgency factor [3 to 7 with 7 applied to a most
		important message, as this will increase "detail"
		in otherwise identical results by increasing the
		importance of the connection information]
	2b) The weight of the app link [in column 4, 6, 8, etc].
	3) The _LOWEST_ total is the best route.
	4) Unknown destinations should be routed to the closest
		backbone.

	Rationale:

	This system is based, not so much on the "cost" of the link,
but more on the "willingness" of each site to handle traffic.
Generally, news travels fastest by the cheapest link, and later
apearences of the same article at any site are ignored.  Adjcent
sites will tend to get _ALL_ the news from every site that even
sort-of feeds them, which is good, considering that direct mail
between adjcent sites should be encouraged.
	Because of this fastest-cheapest implication in the news
system, each site will not have to be informed about the "cost" of the
link explicitly.
	Because the mapping-gods can reset the weight of any site,
system wide, without "updating a map section" the network traffic is
reduced in all cases.
	Because the mapping-gods can delete a site by giving a single
negitive number, again we loose the "new map section"
	Because the map is based solely on electronic topology, and is
generally refrenced by line numbers, individuals can write relitively
simple programs to manipulate the information, using shell scripts and
"cut", "awk", and "sed"  Though these may not be as effecient,
sometimes they can be quite convient.  [for instance, extract a list
of all the leaf nodes becomes get all the entries with only 2 columns]
	Because there is a large range of ranking possible, but the
expression of same is quite visual, the "master maps" [As in the
current archives] can be compared to the system-generated in an easily
automated way [i.e. awk $2=1000 {print $1} | check_for_invalid_master |
send "where is your map entry" message]  Without costing the system in
general much in the way of accuracy and efficency.
	Because the system is self-mapping in general, people end up
with only the parts of the maps they are likely to need.  [i.e. if we
only ger the comp newsgroups, we probably arn't going to need to post
to a machine which only ever posted to rec newsgroups.]
	Because every group [or close enough to it] passes through a
backbone, and because the "master maps" should still be kept, when a
system needs to post to a machine which has never sent an article that
reached that machine, that machine can route to a backbone, and make
use of the master map [like systems can now].

Rob.
nusdhub!usenet
nusdhub!rwhite

Disclaimer:  This entire posting is off the top of my head.  I have
	gotten into the guts of any other mail/mapping system to date,
	nor have I been on the net long [5 months maby] so this can't
	be stolen.

Other Disclaimer:  I am not enough of a programmer [now] to implement
	this [now].  Programming, for me, [now] still takes me much to
	long to do the simple things, for me to get anything back out
	in a reasonable time.  I only do algorythems [now].

More Disclaimers:  1)  I know my spelling sucks, but I don't have
	enough time left today to correct it.
		   2)  If you didn't want to know you should not have
	asked.
		   3)  The Idea is free, the attitude is gon'a cost
	you.
		   4)  My constants are mine, and may not apply
outside my own little world.

rsalz@bbn.com (Rich Salz) (01/15/88)

In news.admin (<579@nusdhub.UUCP>), rob@nusdhub.UUCP gives an interesting
and deatiled method for sites to maintain dynamic mapping information.

In his words,
>Here is the mold for a news-based self-mapping scheme.:

It's a cute idea, except that it has one fatal flaw.
	A news link does not imply a mail link.
I won't name specifics, but here's an example.  Our gateway machine bbn.com
exchanges articles with host.do.main, yet if you send mail to us with
the "path" ...!host.do.main!a!b!user, that mail stands a good chance
of getting rejected because...
	we don't send mail UUCP-like mail to host.do.main!

-- 
For comp.sources.unix stuff, mail to sources@uunet.uu.net.

syd@dsinc.UUCP (Syd Weinstein) (01/15/88)

In <579@nusdhub.UUCP> usenet@nusdhub.UUCP (USENET News Admin) suggests
an automatic routing scheme based on news paths.

I also would like to disagree with this scheme.  My complaints relate
to two points.  Use of the map files for other than automatic routing
and speed of delivery.

1.  Many times I have accessed the directory with the maps and done a
grep on a company name to see if they have a site on the net.  I have
yet to know their site name, and perhaps I am trying to contact someone
there for the first time.  It would be a real waste if I had to call
the company, then ask them their system node name and relative path.

Imagine calling a large company and getting their switchboard operator,
now ask her their system node name or even perhaps whom do you ask the
system node name.

I might not even know the exact address and phone of the company, nor
which division or location is their net postmaster.

2.  News and mail do take different paths.  At dsinc, we use one node
for our news feed and another node for our main mail feeds.  This
prevents us from getting our news when we post outbound mail.  Those
with HDB uucp have the grading feature, but not all sites do.  Thus if
we post an outbound mail message, I don't necessarly want to receive
the news on the same phone call.  I would like my mail to go out asap,
but the news to appear mostly off hours.  The mapping scheme here
doesnt take into account the off hours idea of news, nor sites we dont
exchange news with.

Also in relation to news and mail, there are many sites in the maps
that don't get news at all.  We were in the maps for three-five years 
before we got the news.  We have several sites off of us that are in
the maps that do not get news.  You news based mapping algorithm would
never find those sites.
-- 
=====================================================================
Sydney S. Weinstein, CDP, CCP
Datacomp Systems, Inc.				Voice: (215) 947-9900
{allegra,bellcore,bpa,vu-vlsi}!dsinc!syd	FAX:   (215) 938-0235

geoff@desint.UUCP (01/17/88)

In article <296@fig.bbn.com> rsalz@bbn.com (Rich Salz) writes:

> In news.admin jerry@oliveb.UUCP (Jerry Aguirre) suggests posting the maps
> as one file per site, using the d. files as an experiment.  Systems
> short on inodes could pack the files into an archive (map ranlib, anyone?)
> or a tarfile.
> 
> This is a great idea, and I hope the map folks give it some thought.

Unfortunately, those systems (like me) that are short on i-nodes are also
those which are short on disk space.  Packing files into a 'tar' archive
requires temp space equal to twice the size of the files and doesn't allow
individual file replacement.  'ar' is stupider about temp space, and requires
as much as three times the total map size in the temp space.  Grepping
out the '#' lines is possible, but unfortunately throws away useful
information like geographical locations and how to reach the administrator
at a particular site.

At the moment, my map files (compressed, but with '#' lines still in) take
1990 blocks;  the file system the live on has 2124 free at the moment, but
goes down as low as 1400 late each week.

If the net decides to go this way, I'm sure I'll manage to figure out a
way to live with it.  But I won't be real happy.
-- 
	Geoff Kuenning   geoff@ITcorp.com   {uunet,trwrb}!desint!geoff

pac@munsell.UUCP (Paul Czarnecki) (01/18/88)

In article <306@dsinc.UUCP> syd@dsinc.UUCP (Syd Weinstein) writes:
>Imagine calling a large company and getting their switchboard operator,
>now ask her their system node name or even perhaps whom do you ask the
>system node name.

Sorry...

Imagine calling a large company and getting their switchboard
operator, now ask to speak directly to the person you are trying to
contact.  You establish voice communication and you start talking.
Quickly and concisely you make your point, you say goodbye, and the
connection is broken.

Does everything have to be email?

If you are trying to contact somebody for the first time and you don't
know their mail address, try the *other* use of the telephone.

>I might not even know the exact address and phone of the company, nor
>which division or location is their net postmaster.
>
>Sydney S. Weinstein, CDP, CCP

All you need is the main number.  The corporate operator should be
able to connect you after to get in.  To get the main number, you can
call information, or you can look it up in the phone book.  If it is
long distance, remember that phone books are available at your local
library in the reference section.

					pZ
-- 
		       Paul Czarnecki -- Spam, spam, spam, Usenet, and spam
	{{harvard,ll-xn}!adelie,{decvax,allegra,talcott}!encore}!munsell!pz

edward@engr.uky.edu (Edward C. Bennett) (01/19/88)

In article <1560@ostwald.munsell.UUCP> pac@ostwald.UUCP (Paul Czarnecki) writes:
]In article <306@dsinc.UUCP> syd@dsinc.UUCP (Syd Weinstein) writes:
]>Imagine calling a large company and getting their switchboard operator,
]>now ask her their system node name or even perhaps whom do you ask the
]>system node name.
]
]Imagine calling a large company and getting their switchboard
]operator, now ask to speak directly to the person you are trying to
]contact.  You establish voice communication and you start talking.
]Quickly and concisely you make your point, you say goodbye, and the
]connection is broken.

If you don't have the map files, how are you supposed to get the
name of the contact person? ;-)

-- 
Edward C. Bennett				DOMAIN: edward@engr.uky.edu
					UUCP: {cbosgd|uunet}!ukma!ukecc!edward
"Goodnight M.A."				BITNET: edward%ukecc.uucp@ukma
	"He's become a growling, snarling white-hot mass of canine terror"

jerry@oliveb.olivetti.com (Jerry Aguirre) (01/19/88)

In article <1662@desint.UUCP> geoff@desint.UUCP (Geoff Kuenning) writes:
>Unfortunately, those systems (like me) that are short on i-nodes are also
>those which are short on disk space.  Packing files into a 'tar' archive
>requires temp space equal to twice the size of the files and doesn't allow
>individual file replacement.  'ar' is stupider about temp space, and requires
>as much as three times the total map size in the temp space.  Grepping
>out the '#' lines is possible, but unfortunately throws away useful
>information like geographical locations and how to reach the administrator
>at a particular site.

The point about temp space is a good one.  For this and other reasons I
would not recommend storing the maps in one huge maps.a file.

What I would suggest is maintaing some of the current organization so
that, for example, "u.usa.ca.a" could be an archive of all the
California entries.  Updates could do an "ar r u.usa.ca.a tempfile".
This would minimize the temp space and copy overhead because only that
fraction of the map would be handled.

The point is that the split into u.usa.ca is functional.  The split into
u.usa.ca.[1-6] is purely a result of uucp restrictions.

					Jerry Aguirre @ Olivetti ATC

rwhite@nusdhub.UUCP (Robert C. White Jr.) (01/20/88)

In article <300@fig.bbn.com>, rsalz@bbn.com (Rich Salz) writes:
> In news.admin (<579@nusdhub.UUCP>), rob@nusdhub.UUCP gives an interesting
> and deatiled method for sites to maintain dynamic mapping information.
> 
> In his words,
> >Here is the mold for a news-based self-mapping scheme.:
> 
> It's a cute idea, except that it has one fatal flaw.
> 	A news link does not imply a mail link.


Hi,
	It's me again, with an explination of why that would not be a
problem.

	As you recall, in our last episode, It was mentioned that
any weight over 1000 would not be reduced in any circumstances.  The
net-map gods also had the ability to set high weights.  Further, any
"site weight" encountered was multiplied by a number between 3 and
7.  Now on with our story:

	Site "not-me", in their offical map entry declairs that they
will not carry mail.
	not particularly surprised, the net-map gods issue a warning
to the net in the form of a mapping-message containing the line

not-me	3001

making the link "to expensive" [by a power of three to seven] to
pass mail accross.

	As I said at first, the posting was entirely off the top of
my head, and at posting time.  I may have to add an adendum threshold
of > 3000 means "no mail"  An even allow a site to post their own
weight [with verify an only to signify "no mail"]

	These problems are soluble.

Rob.

rwhite@nusdhub.UUCP (Robert C. White Jr.) (01/20/88)

I must first, and once again, state that that entire idea was off the top
of my head.  As you have brought up two valid dificulties, I will attempt
an adendum to address these problems here:

In article <306@dsinc.UUCP>, syd@dsinc.UUCP (Syd Weinstein) writes:
> In <579@nusdhub.UUCP> usenet@nusdhub.UUCP (USENET News Admin) suggests
> an automatic routing scheme based on news paths.
  
> I also would like to disagree with this scheme.  My complaints relate
> to two points.  Use of the map files for other than automatic routing
> and speed of delivery.
  
> 1.  Many times I have accessed the directory with the maps and done a
> grep on a company name to see if they have a site on the net.

	An "optionally empty" entry would have to be created in the
site_d structure of the form "char *org_n" to contain the organization
name.  The entry for this new info in the data file would take the form:

site	weight	[linenum	weight] ... 	:orgname

note that the only change is the "\t:orgname" which is a free-form 
field at the end of the record, and is totally optional.

	This material, which is totally optional, may also be gleaned
from the news [in general] by the use of processing options.  This
info is included in any postings originated by that site.

	To support this, and "full maps" the "mapping messages" from
the map-gods would come in a new flavor.  The old form of multiple lines
in the form: "site	weight" remain basicly the same, except that "full
maps" become:

site	weight	:orginfo
...

^FULL  (note that that "^" is literal, not a "^F")

sitea!siteb!map	weight		(ascribes a weight to the connection
				 the literal "map" is discarded in the
				 normal processing.  If the links all
				 are to have the same weight, there
				 may be more than two valid sites on
				 the same line.  i.e.)
sitea!siteb[[!sitename]...]!map	weight
...
 
> Imagine calling a large company and getting their switchboard operator,
> now ask her their system node name or even perhaps whom do you ask the
> system node name.
> I might not even know the exact address and phone of the company, nor
> which division or location is their net postmaster.
  
> 2.  News and mail do take different paths.  At dsinc, we use one node
> for our news feed and another node for our main mail feeds.  This
> prevents us from getting our news when we post outbound mail.  Those
> with HDB uucp have the grading feature, but not all sites do.  Thus if
> we post an outbound mail message, I don't necessarly want to receive
> the news on the same phone call.  I would like my mail to go out asap,
> but the news to appear mostly off hours.  The mapping scheme here
> doesnt take into account the off hours idea of news, nor sites we dont
> exchange news with.
  
	The original intent was that sites which do not pass mail
would be ascribed _very high_ weights which, when combined with the
multiplier used durring route calulations would make the route
unacceptable.  perhaps any weight over 3000 should be defined as
"no mail"

> Also in relation to news and mail, there are many sites in the maps
> that don't get news at all.  We were in the maps for three-five years 
> before we got the news.  We have several sites off of us that are in
> the maps that do not get news.  You news based mapping algorithm would
> never find those sites.

	Sites interested in this extra information should simply 
receive and process the "^FULL" entries in the same way that the 
current maps are processed now.  It was always my intention that 
the map-gods and routing backbones would keep master maps in the
original form [or perhaps the new, considering this adendum].

	The original issue was a method of dooing the mapps without
the large-long expired-redundant map postings.  With this system
there are no "diffs" and no "originals" just new information when
desired, and whatever any given site whishes to keep less much
of the overhead.


Rob.
nusdhub!rwhite
nusdhub!usenet

duncan@comp.vuw.ac.nz (Duncan McEwan) (01/27/88)

In article <7815@rutgers.rutgers.edu> pleasant@rutgers.rutgers.edu 
(Mel Pleasant) writes:

>For those of you interested in lowering
>the volume, here is a challenge for you.  How can it be done such that we
>never have to make a full one-shot posting of all of the files ever again?

This definitely includes us, given the relatively high communications
charges of receiving usenet in this part of the world.

I originally mailed a response to this directly to Mel.  I never got
any feedback from my suggestions/queries, so I don't know if he
received it (though I guess more likely, he was just too busy to
respond).  I have waited a week or so to see if anyone else has said
anything similar but I haven't seen anything, so here goes...

>The problem with "diff" postings is [possibility of losing parts or
>parts arriving out of sequence].

I have supported the idea of posting diffs in the past.  I don't think
Mel's reasons against them are insurmountable.

If the diff listings contained sequence numbers you could tell if you
missed something.  If various sites around the net offered an
automated "send me map file n" service in much the same way that some
sites operate source archive servers, those that missed part of the
map could get the bits they missed.

[As an aside - even with the current map posting scheme it is not possible
to tell if you have missed something.  I think it would be a good idea if
whatever scheme is used, the map postings contained a sequence number in
the subject].

Anyway, a much more convincing argument against diff postings was in a
followup to Mel's original article by Brian Matthews in which he said
he had exerimented with diff, and on average found diff listings to be
larger than the new files!  A larger sample might be required to see
if this is generally the case.

A final (overwelming?) argument against diff postings is that they
would make much more work for all us overworked sys admins :-).  If a
posting is missed, we would have to manually intervene before
subsequent updates can be used.  For many, maintaining an up to date
map is not a top priority, and with the current scheme, we can choose
to wait until the next update, and put up with a slightly inaccurate
map for a while.

So assuming diff postings will not work, we (finally) get to the point
of this article.

>We are now posting on the 1st of each month those map files that
>have not been updated in 30 days.

This is the part that doesn't make sense to me (if people really want the
map posting volume reduced).  While the new posting scheme is great for
ensuring quick update propagation, it will obviously result in a volume
increase as long as this 30 day reposting is carried out.

So why is it required?

It might reduce the problem of people missing map postings.  But
they may still miss parts of the periodic repost so it doesn't
eliminate it altogether.

It also allows sites new to usenet to get a full copy of the map.  But
it is quite likely that their usenet neighbour has a copy of the map
that they can pass on.  And for those sites for which this is not the
case, a monthly posting listing sites that have the map available for
anonymous uucp should suffice.

>As you point out, the maps are read by a program and after that they
>aren't needed, right?  Well, wrongo!!  The number of people [...]
>complaining of loss of access to map files was astounding.  There are
>many [...] that actually use the map files to generate paths by hand.
>When the map files expire they're at a loss.

To satisfy these (backward :-) people without a periodic repost, why
can't the map's be posted with a long expiry date (sometime in the
21st century should do).

I have experimented a little with the effects of the Supersedes header
on articles that have such long expiry dates - checking in particular
that the history record for the original article isn't maintained for
ever (which would result in history files steadily getting larger).  As
far as I could tell, this was not the case.  Perhaps someone who knows
for sure could confirm/deny this.

Two other problems with long expiry dates spring to mind.

1) I believe in order to cut down on abuse of the expiry header some
   sites run expire with the options that cause it to ignore expiry dates.
   These sites would lose the map after their normal expiry period.

2) For sites that have not installed a news system that handles the
   "Supersedes: " header, the map directory will keep getting bigger.
   (with the current scheme they will temporarily have multiple copies
   of a particular map file, but eventually the older copies will
   disappear).

The solutions to both these problems are fairly obvious, and not that
unpalatable.  Others can decide whether they are worth a decrease in
map volume posting.  I think they are.

I am sure that without the periodic reposting of the entire map, the
current scheme would result in a reasonable map update volume.  If
this is not the case, perhaps some of the larger files could be
further split up to reduce the impact should they be modified too
frequently (though I doubt this will be necessary).

I would like to hear from anyone who can think of problems not mentioned
here, of either long expiry dates, or not having the periodic repost
of the unmodified map files.

---
Duncan

Internet: duncan@comp.vuw.ac.nz		Path: ...!uunet!vuwcomp!duncan

dhesi@bsu-cs.UUCP (Rahul Dhesi) (01/29/88)

A suggestion I made once before:

Stripped maps (all the lines beginning with # missing) posted every
week or so.  This is all you need for pathalias.

Complete maps posted at longer intervals, e.g. every 3 months with
updates every month.
-- 
Rahul Dhesi         UUCP:  <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi

duncan@comp.vuw.ac.nz (Duncan McEwan) (02/01/88)

In article <1977@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes:
>
>Stripped maps (all the lines beginning with # missing) posted every
>week or so.

I checked out the map files that we have here on comp.vuw.ac.nz (which being as
far away from the rest of the world as we are, may not be the most up to date),
and found that there were approx 1.3Mb of lines with comments out of a total of
1.7Mb.  So posting the maps with comments stripped, once per week would still
result in volume of approximately 1.6Mb per month, with worse turnaround
than we get with the UUCP projects current posting scheme.

> This is all you need for pathalias.

True.  But it ignores the other useful info that should be updated as well
(contact person, latitude/longitude if you happen to live on a fault line
like we do here in Wellington :-)

---
Duncan

Domain: duncan@comp.vuw.ac.nz		Path: ...!uunet!vuwcomp!duncan

jack@swlabs.UUCP (Jack Bonn) (02/02/88)

In article <13226@comp.vuw.ac.nz>, duncan@comp.vuw.ac.nz (Duncan McEwan) writes:
> In article <7815@rutgers.rutgers.edu> pleasant@rutgers.rutgers.edu 
> (Mel Pleasant) writes:
> >We are now posting on the 1st of each month those map files that
> >have not been updated in 30 days.
> 
> This is the part that doesn't make sense to me (if people really want the
> map posting volume reduced).  While the new posting scheme is great for
> ensuring quick update propagation, it will obviously result in a volume
> increase as long as this 30 day reposting is carried out.
> 
> So why is it required?
> 
> It might reduce the problem of people missing map postings.  But
> they may still miss parts of the periodic repost so it doesn't
> eliminate it altogether.
> 
> It also allows sites new to usenet to get a full copy of the map.  But
> it is quite likely that their usenet neighbour has a copy of the map
> that they can pass on.  And for those sites for which this is not the
> case, a monthly posting listing sites that have the map available for
> anonymous uucp should suffice.

It also allows a site to miss a map posting and eventually get an up
to date map (even if they miss an occasional repost).  Why not increase
the 30 day reposting to a larger number (say 90 days), until the
number of maps reposted is reduced to almost none.  This would be
because, in the normal order of things, almost all (all?) the maps have 
a "refresh time" of less than 90 days.

> >As you point out, the maps are read by a program and after that they
> >aren't needed, right?  Well, wrongo!!  The number of people [...]
> >complaining of loss of access to map files was astounding.  There are
> >many [...] that actually use the map files to generate paths by hand.
> >When the map files expire they're at a loss.

I was under the impression that map entries would have an expiration
date such that a map for a given region would ALWAYS be in the news SPOOL 
directory.  Assuming this is the case, I wrote the following script:

	for I in /usr/spool/news/comp/mail/maps/*
	do
	    sed '1,/cat.*SHAR_EOF/d
		/^SHAR_EOF/,$d' <$I
	done | pathalias -f | pathproc >/usr/tmp/path$$
	mv /usr/tmp/path$$ $HOME/paths
	exit 0

By running a script like this (from cron) I don't have to keep two copies 
of the map data around, one in the news SPOOL directory and another in the 
uucp map directory.  And as a side effect, all map data is available
for a SENDME, assuming that the needy site has access to a cross
reference between message-id's and geographic areas.  Certainly, a
script to generate a cross reference like this could easily be generated.

Of course, the script will have to be modified if the here document 
terminator is quoted.  Or if any of the shars have more than one file 
in them.  But, if these things could be standardized, a convenient script 
like this could be ensured to work (most of the time:-)).

Anyone else doing something similar?  Or is everyone?
-- 
Jack Bonn, <> Software Labs, Ltd, Box 451, Easton CT  06612
uunet!swlabs!jack

duncan@comp.vuw.ac.nz (Duncan McEwan) (02/05/88)

In article <2056@swlabs.UUCP> jack@swlabs.UUCP (Jack Bonn) writes [in
response to my article about eliminating the periodic reposts of unmodified
map articles]

>It also allows a site to miss a map posting and eventually get an up
>to date map (even if they miss an occasional repost).  Why not increase
>the 30 day reposting to a larger number (say 90 days), until the
>number of maps reposted is reduced to almost none.  This would be ...

I don't think it is much of an advantage if it is going to take a site 3 months
or more before they get a full map again :-).  I think it would be far better
to incorporate some mechanism for allowing a sys admin to figure out that they
had missed a map part, and having facilities available for them to obtain the
missing part.

At the moment it is not easy to tell if you are missing something.  In my
original posting, I suggested putting some kind of sequence number in the map
posting's subject line, which would help [I have seen no feedback about this
suggestion -- am I the only one that thinks this simple change could be useful
:-)]

An idea that I didn't include (the article was already too long :-) was to have
a periodic posting of a shell script by the map maintainer that would compare
the id's of map articles that had been posted with what the receiving site had,
and report to the admin the id's of anything that was missing.

>... because, in the normal order of things, almost all (all?) the maps have 
>a "refresh time" of less than 90 days.

I suggested having no repost, and "infinite" expiry dates because I wasn't sure
what proportion of map files actually got updated over any given period.

But if it is the case that almost all map files are modified over (say) a 90
day period, then having a repost of all unmodified files after that period
would be fine.  We would still have reduced map posting volume to about one
third of the old "once per month full repost" while gaining the advantages the
new fast update turnaround time.

Some feedback from someone in the UUCP mapping project, who actually knows how
long it takes before "almost all" map files have been updated would be helpful.

>I was under the impression that map entries would have an expiration
>date such that a map for a given region would ALWAYS be in the news SPOOL 
>directory.

Well, it should be, given the current 30 day reposting.  I think Mel puts a 45
day expiration date on articles that get posted, which means if you receive
either an update, or the periodic repost, the file will always be present.
However, if for some reason you miss one, and that file is not updated again
soon enough, your news map directory will be missing part of the map (for a
little while at least).

For this reason, I think infinite expiry dates would still be useful, even if
a periodic reposting of unmodified articles is carried out.

>
> [Script to generate map via pathalias directly from news spool directory
> deleted]
>
>By running a script like this (from cron) I don't have to keep two copies 
>of the map data around,

I actually keep one and a half, since the one used to generate the map is
compressed.  The reason for keeping the other copy is because it is required
by John Quarterman's "uuhosts" shell script (automates unbatching, and also
provides a map lookup facility).

But thinking about it, now that the map is not being posted in a small number
of large articles, the unbatching facility of uuhosts may not be required, and
it's lookup facility could be modified to work directly from the news map
directory.  (Or maybe it doesn't even need modifying, since it already has to
skip leading and trailing garbage before displaying the map entry you want).

Duncan

Domain: duncan@comp.vuw.ac.nz		Path: ...!uunet!vuwcomp!duncan