em@dce.ie (Eamonn McManus) (03/26/91)
rob@mtdiablo.Concord.CA.US (Rob Bernardo) writes: >I've also had some difficulty lately with 'out of sync' unbatching >problems. Unfortunately, Eamonn McManus's patchbatch didn't work. >Below is a shell archive for a more robust program to fix batches >with bad article character counts. There are advantages and disadvantages to each of our programs. Patchbatch is designed to be run automatically on all incoming batches, whereas Rob's program (rebatch) is to be run by hand on known bad batches. Running automatically from newsrun means that the fixer doesn't have to worry about decompression and the like. The reason I wrote patchbatch to fish around in the vicinity of the supposed article end, rather than scanning through every line as rebatch does, was that it provides a greater degree of transparency. If an article happens to contain the string "#! rnews" at the beginning of a line, rebatch will assume it ends there. Patchbatch is only susceptible to problems if an article contains such a string very near the end. Also, if an article is truncated in the middle of a line, so that the "#! rnews" of the following article is not preceded by a newline, rebatch will not find that article. Of course if it were changed to look for "#! rnews" anywhere in a line it would go ape on articles like this one. There is a problem with hacks like these, of striking a balance between fixing corrupt batches and leaving alone correct ones. Patchbatch stays closer to the latter at the expense of sometimes failing to do the former. However, I think people should try increasing the value of FUDGE before resorting to a more promiscuous program like rebatch. You might also need to change the size of the buf[] array when doing this; I can't remember if the version I posted had a magic constant 64 as the size (ugh). Another noteworthy difference between the programs is that patchbatch modifies the batch in place rather than creating a replacement. This means that it is much faster. In particular, if you only occasionally get corrupt batches you can afford to run patchbatch over every incoming batch, since there is very little overhead in checking through a correct batch. There is a theoretical problem, in that the size of an article may change from an n-digit number to a (n+1)-digit number, in which case patchbatch will fail. I never saw this happen in practice. , Eamonn