dplatt@coherent.com (Dave Platt) (10/04/90)
This morning, I found that my /etc/uucp/paths file had developed some serious anomalies. Routes to several sites appeared to have developed long, circular paths which looped repeatedly through a serquence of three sites. For example: alphacm ames!mimsy!uunet!mtndew!dhw68k!mbf!homebas!dhw68k!mbf!homebas!dhw68k!mbf!homebas!dhw68k!mbf!homebas!dhw68k!mbf!homebas!dhw68k!mbf!homebas!dhw68k!mbf!homebas!dhw68k!alphacm!%s 0 Examining the map files in question (e.g. u.usa.ca.3) didn't show any real anomalies. [There's a possibly-bogus "homebas=homebas" alias, but removing it didn't affect the problem]. Running pathalias manually produced inconsistent results... sometimes it would spit out the same looped paths, and sometimes would die with a "too many terminal links; call the authorities". Increasing the size of the heap didn't resolve the problem. Further investigation showed that pathalias was spitting out negative path costs for the erroneous routes. I installed some additional debugging code in mapit.c, and isolated the problem. The problem appears to have occurred when pathalias tried to walk through a series of (DEAD) terminal links between homebas, mbf, and dhw68k, and several other nodes. Each time it walked over one of these links, it would call "costof()", which would note that the link was dead and would add a very high cost (INF, defined as 100*MILLION) to the total cost for the path. After it worked its way through enough dead links, it would have added INF to the cost so many times that the cost overflowed and went negative. This caused total confusion in other portions of the program, and allowed the same links to be followed multiple times. The workaround for this problem is fairly simple... I added some code to "costof" which checks for overflow and deals with it in a somewhat more graceful fashion. With this code in place, the problem seems to have abated... pathalias runs to completion without running out of heap space, and produces meaningful-looking paths and valid path-costs for the systems in question. Here are the diffs against the version of pathalias distributed back in May of this year: *** mapit.c.old Wed Oct 3 15:36:31 1990 --- mapit.c Wed Oct 3 15:36:56 1990 *************** *** 269,281 **** register node *prev; register link *l; { register node *next; ! register Cost cost; if (l->l_flag & LALIAS) return prev->n_cost; /* by definition */ next = l->l_to; ! cost = prev->n_cost + l->l_cost; /* basic cost */ /* * heuristics: --- 269,281 ---- register node *prev; register link *l; { register node *next; ! register Cost cost, basecost; if (l->l_flag & LALIAS) return prev->n_cost; /* by definition */ next = l->l_to; ! basecost = cost = prev->n_cost + l->l_cost; /* basic cost */ /* * heuristics: *************** *** 298,303 **** --- 298,310 ---- || (NETDIR(l) == LRIGHT && (prev->n_flag & HASLEFT))) cost += INF; /* mixed syntax */ } + + if (cost < basecost) + { + fprintf(stderr, "Cost overflow from %s to %s!\n", + prev->n_name, next->n_name); + cost = basecost + 1; + } return cost; } It might also be useful to reduce the definition of INF from 100 million to some smaller number (10 million might work), in order to keep the overflow from occurring. I haven't tried this yet.