[net.news.b] News bug -- a major black hole

chuqui@nsc.UUCP (Chuq Von Rospach) (08/02/85)

> I think I've found a major problem with news, one that is causing some of
> our black holes out there.

I did some more investigating. Yes, it IS a major black hole, caused by a
couple of misfeatures in the unbatcher and a bug in inews. 

Misfeatures:
    o unbatch.c sends its error messages to stderr. When run by uux as a
    process called by rnews, this means that all the error messages get
    thrown away. Should use log() and logerr() so that messages are
    actually seen by something other than /dev/null.

    o unbatch.c doesn't save the garbage it finds in error for hand parsing
    and repair later. Of course, all of news is poor in error checking, but
    this is no excuse.

Bug:
    o control messages with no body are munged improperly by inews, causing
    messages to be sent out that aren't terminated by a newline. When these
    messages are batched, they then cause the batching to be generated
    incorrectly because the batch header (#! rnews) does not start at the
    beginning of a line.

As far as I can tell, every null-body control message causes one article to
be eaten PER SITE it passes through when sent by batching. Batching type
(sendbatch or csendbatch, regardless of revision of compress) is
irregardless -- the damage is done after it is uncompressed. The logic of
this supposition is:

    o control message comes in, gets places in F queue.
    o queue is batched, and article behind the control message gets munged
    o queue is shipped, unpacked, and sent to rnews.
    o rnews reads the control message properly, stores it in F queue.
    o rnews barfs on next message, causes one 'out of sync error' to be
      sent to stderr for each line in the article until it finds the
      beginning of the next article, and then resyncs and starts saving
      articles again properly. The next article saved will go into the F
      queue and be eaten on the NEXT hop downstream. This continues
      indefinitely until the control message is fixed, not passed on, or
      you reach the final leaf (or run out of messages to eat, of course).

In some occasions, and they seem to be the strong minority, the message
gets to inews and inews throws up instead with a 'inbound news is garbled'
message (which thankfully does get logged, or I'd never have seen this). I
don't know what causes this to happen yet, and it may be incidental to the
main problem here -- a fortunate coincidence perhaps.

I just did a quick check of my control directory, and found that 15 of 65
articles showed the sign of the bug (0 in header line, and a ^? in the 
body). These 15 articles passed through 172 sites (11.4 each on average).
If we assume that most sites are now using some form of batching, that is
at least 150 articles eaten, silently. Also, 25 of the other 50 control
articles I looked at showed a header with 0 lines in the body, and a body
with a '#' in it. I THINK that this is a sign of some site upstream having
the first character being eaten off of a #!rnews line. This probably
disables the bug (maybe it is when the 'inbound news is garbled' shows up?
but until it happens things get eaten. I can show based on these figures
that about 0.02% of the messages are not getting to my site (about 6000
articles in a two week period per seismo, about 172 can be shown to be
eaten by bugs on the way). When you realize that this doesn't include the
articles eaten before the bug was disabled (the '#' messages?) or the
articles that were eaten before they got to seismo, this is probably
conservative. Losing 0.02% may not sound like much, but it will when your
article gets eaten. And it will. And it already has. If you don't see this
article, BTW, it is because a God with a sense of humor gave it to this
bug....

Until we can find a fix for this sucker, I suggest HEAVILY that all control
messages be sent out with non-null bodies. rn, bless its little heart,
already does this. This, dear net-friends, is a bugger, and I think its 
been around since 2.10, perhaps longer -- it seems inherent in the code
of unbatch.c, and we just never noticed before. As a famous celebrity might
say, oop, ack.

chuq


-- 
:From the carousel of the autumn carnival:        Chuq Von Rospach
{cbosgd,fortune,hplabs,ihnp4,seismo}!nsc!chuqui   nsc!chuqui@decwrl.ARPA

Your fifteen minutes are up. Please step aside!

bukys@rochester.UUCP (Liudvikas Bukys) (08/02/85)

The unbatcher just counts characters when it unbatches.  It doesn't
care whether the "#! rnews" starts at the beginning of the line.
(I verified this by looking at the code and with an experiment,
tacking some un-newlined-text onto a couple of articles before
they got batched.  They still arrived at the destination intact.)

--> So there is no black hole.  Go back to sleep. <--

Having the batcher produce stuff like this is aesthetically unpleasing,
but it's not hurting anything.

Liudvikas Bukys
rochester!bukys (uucp) via allegra, decvax, seismo
bukys@rochester (arpa)

chuqui@nsc.UUCP (Chuq Von Rospach) (08/04/85)

In article <10855@rochester.UUCP> bukys@rochester.UUCP (Liudvikas Bukys) writes:
>The unbatcher just counts characters when it unbatches.  It doesn't
>care whether the "#! rnews" starts at the beginning of the line.
>(I verified this by looking at the code and with an experiment,
>tacking some un-newlined-text onto a couple of articles before
>they got batched.  They still arrived at the destination intact.)
>
>--> So there is no black hole.  Go back to sleep. <--

I disagree, since I've been able to reproduce the bug here on actual runs
of news batches. Things do get out of sync, and data is being lost.

chuq
-- 
:From the carousel of the autumn carnival:        Chuq Von Rospach
{cbosgd,fortune,hplabs,ihnp4,seismo}!nsc!chuqui   nsc!chuqui@decwrl.ARPA

Your fifteen minutes are up. Please step aside!

randy@bcsaic.UUCP (randy groves) (08/14/85)

There are also a considerable number of control messages at our site that have
'Lines: 0' and some number, as well as the '#' and '^?' messages.

-- 
===========================================================================
... only a hollygram, but one more is gone.
===========================================================================
randy groves
...!uw-beaver!uw-june!bcsaic!randy