chuqui@nsc.UUCP (Chuq Von Rospach) (08/02/85)
I think I've found a major problem with news, one that is causing some of our black holes out there. It is a very bizarre combination of events. The basic components are control messages, compress (cunbatch) and dropping error messages on the floor. I don't have a fix (yet) but at least I know what I'm looking for now. What happens is this: Someone sends a control message, such as a cancel, without a text body. A lot of sites out there have a known bug that cause these messages to go out with an '^?' as the message body because it ends up writing the EOF char before realizing it is EOF. This message is then stored somewhere by the F command in the sys file. Later on, all of these wonderful things are batched together, sent through compress (I have 3.0, but I don't think it is limited to this version) and shipped across. When it gets to the other side, it is de-compressed and shipped to rnews for de-batching. Unfortunately, you end up with the following sequence of lines: #! rnews 999 [control message header] ^?#!rnews 9999 [next message header] unfortunately, the rnews batcher doesn't seem to be able to handle this. You get one of two things to happen: o the well known 'Inbound news is garbled' message since the !rnews isn't there as expected. You end up losing the rest of the batch unless you get lucky -- permanently. At least you know you lost something, and can ask your feed to reship it, if they save the batching files for you. o The not so well known problem I just found. Instead of getting a garbled message, I found that rnews will recognize its problem and start issuing 'Out of sync -- skipping: [lost data]' lines. This actually seems to be more likely to happen than the 'inbound news is garbled' and for some reason these messages don't get logged or mailed to anyone -- uucp seems to throw them away. The end result, unfortunately, is a number of messages that simply get thrown away somewhere with no error message recorded in any log. I don't know how much data is being lost, but I don't like seeing data lost silently. I found this quite accidently. I happened to be mucking with news while my feed was coming in, and saw the garbled message in the log. I grabbed the compressed file from uucp, decompressed it by hand, and then ran it through rnews to find the garbaged spot. It was the spot immediately after the control message. I edited out the part of the file before the problem, ran it through again, and shoved it back into rnews. All of a sudden, rnews was complaining on stdout (perhaps stderr) about being out of sync. Checking, it was immediately after ANOTHER control message. the rnews batcher does NOT log these errors, meaning the data is lost completely. What to do? I don't know yet, but I'm going to explore the following: o make sure that when you store a control message, it ends with a newline. This should be done for locally posted stuff AND anything that passes through. This doesn't protect you from upstream sites and this problem, but will keep you from screwing your downstream sites. Since this bug seems to be degenerative (each site can pass on a control message, which comes through fine, and have a new set of message get eaten every time news gets shipped) this is a really NASTY problem -- if you thought the line-eater was bad, realize it only mucked up a single article, and the one messed up by the poster. This bug is a virus, and eats random articles at random sites. o modify the rnews unbatcher to do two things: LOG these error messages and store ANY data that it can't deal with somewhere to be unpacked by hand later. This will cause more work for an SA, but at least messages won't get lost. Any suggestions are more than welcome. I found this on a fluke, and frankly, I don't know if we can ever quantify how much data is getting eaten by this thing. If I see if properly, it can even attack (silently) a site running without batching with sendbatch, so stopping the compression doesn't seem to be a solution. With the success we've had in eradicating the line eater, I'm really scared about what this does to the net. chuq -- :From the carousel of the autumn carnival: Chuq Von Rospach {cbosgd,fortune,hplabs,ihnp4,seismo}!nsc!chuqui nsc!chuqui@decwrl.ARPA Your fifteen minutes are up. Please step aside!
howard@cyb-eng.UUCP (Howard Johnson) (08/13/85)
> I think I've found a major problem with news, one that is causing some of > our black holes out there. It is a very bizarre combination of events. Bizarre, yes. (But I won't cross-post to net.bizarre just yet. :-)) > What happens is this: Someone sends a control message, such as a cancel, > without a text body. [...] ends up writing the EOF char before realizing > it is EOF. [...] > > #! rnews 999 > [control message header] > ^?#!rnews 9999 > [next message header] Fortunately, this doesn't seem to happen on the most widely-distributed version of 2.10.2 (9/18/84 version, which I have). > unfortunately, the rnews batcher doesn't seem to be able to handle this. > o make sure that when you store a control message, it ends with a > newline. [...] > > o modify the rnews unbatcher to do two things: LOG these error messages > and store ANY data that it can't deal with somewhere to be unpacked by > hand later. This will cause more work for an SA, but at least messages > won't get lost. The version of news I have does the first of these, but not the second.