[news.admin] Pathalias loop... diagnosis and a patch

dplatt@coherent.com (Dave Platt) (10/04/90)

This morning, I found that my /etc/uucp/paths file had developed some
serious anomalies.  Routes to several sites appeared to have developed
long, circular paths which looped repeatedly through a serquence of
three sites.  For example:

alphacm ames!mimsy!uunet!mtndew!dhw68k!mbf!homebas!dhw68k!mbf!homebas!dhw68k!mbf!homebas!dhw68k!mbf!homebas!dhw68k!mbf!homebas!dhw68k!mbf!homebas!dhw68k!mbf!homebas!dhw68k!alphacm!%s  0

Examining the map files in question (e.g. u.usa.ca.3) didn't show any
real anomalies.  [There's a possibly-bogus "homebas=homebas" alias, but
removing it didn't affect the problem].

Running pathalias manually produced inconsistent results... sometimes it
would spit out the same looped paths, and sometimes would die with a
"too many terminal links;  call the authorities".  Increasing the size
of the heap didn't resolve the problem.

Further investigation showed that pathalias was spitting out negative
path costs for the erroneous routes.  I installed some additional
debugging code in mapit.c, and isolated the problem.

The problem appears to have occurred when pathalias tried to walk
through a series of (DEAD) terminal links between homebas, mbf, and
dhw68k, and several other nodes.  Each time it walked over one of these
links, it would call "costof()", which would note that the link was dead
and would add a very high cost (INF, defined as 100*MILLION) to the
total cost for the path.

After it worked its way through enough dead links, it would have added
INF to the cost so many times that the cost overflowed and went
negative.  This caused total confusion in other portions of the program,
and allowed the same links to be followed multiple times.

The workaround for this problem is fairly simple... I added some code to
"costof" which checks for overflow and deals with it in a somewhat more
graceful fashion.  With this code in place, the problem seems to have
abated... pathalias runs to completion without running out of heap
space, and produces meaningful-looking paths and valid path-costs for
the systems in question.

Here are the diffs against the version of pathalias distributed back in
May of this year:

*** mapit.c.old Wed Oct  3 15:36:31 1990
--- mapit.c     Wed Oct  3 15:36:56 1990
***************
*** 269,281 ****
        register node *prev;
        register link *l;
  {     register node *next;
!       register Cost cost;

        if (l->l_flag & LALIAS)
                return prev->n_cost;    /* by definition */

        next = l->l_to;
!       cost = prev->n_cost + l->l_cost;        /* basic cost */

        /*
         * heuristics:
--- 269,281 ----
        register node *prev;
        register link *l;
  {     register node *next;
!       register Cost cost, basecost;

        if (l->l_flag & LALIAS)
                return prev->n_cost;    /* by definition */

        next = l->l_to;
!       basecost = cost = prev->n_cost + l->l_cost;     /* basic cost */

        /*
         * heuristics:
***************
*** 298,303 ****
--- 298,310 ----
                 || (NETDIR(l) == LRIGHT && (prev->n_flag & HASLEFT)))
                        cost += INF;                    /* mixed syntax */
        }
+
+       if (cost < basecost)
+         {
+           fprintf(stderr, "Cost overflow from %s to %s!\n",
+                   prev->n_name, next->n_name);
+           cost = basecost + 1;
+         }

        return cost;
  }

It might also be useful to reduce the definition of INF from 100 million
to some smaller number (10 million might work), in order to keep the
overflow from occurring.  I haven't tried this yet.