paul@vixie.UUCP (04/15/87)
I'm willing to do some of the work for this, but I want to see what else has been done or thought about it before I start. So.......... What about compressing the news data in the spool directories? If compress can save half the transmission time, it ought to be able to save almost that much in the storage costs as well. It isn't quite the same win, since there is the file system's frag size to consider -- but it's a win, just the same. The hard part is the headers -- they should not be compressed because they are examined independent of the (much longer) data quite often -- in expire, in subject searches, etc. In my view, the headers would be better left uncompressed. So we can either put compressed and uncompressed data in the same file -- not simple, but possible -- or we can seperate the data into other files. This irritates the waste in the file system frag size, but I think we would gain more in the compression than we would waste in the half- empty blocks we would add in this scheme. Perhaps an additional header line is in order -- starting with 'X-' in the tradition of lines which should not be passed out of the current system. How about 'X-Data-File: xxx [-c]' where 'xxx' is the name of the file where the data is stored, and '-c' would indicate whether the data is compressed? Alternately, we could use the 'magic number' of the data file to determine its compression -- but uncompressed data has a random magic number, so this may be problematic. The last architectural problem is in somehow tying the data files back to the header files -- since the header files will inevitably become corrupted at some point, 'orphaning' any associated data files. They could be given similar names -- perhaps the same 'article number'-style name, but in a '.data/' directory? This would remove some of the need for the 'X-Data-File:' header, but there would still have to be something in the headers to tell the various news reading and transmission programs that the data is elsewhere -- otherwise they would think the article had no text. If every stored article had its data stored seperately, this would not be a problem; however, as a news administrator, I would prefer to have some things compressed and others not -- during the conversion, mainly, but there are other possibilities. A new flag in the LIBDIR/active file could tell whether to compress new articles added to a particular newsgroup. It occurs to be that there would be a terrible waste of CPU time if the articles were batched and compressed in the current format, only to be uncompressed, sorted out, then recompressed for storage. A new batching format is called for -- perhaps the headers and data could travel under different covers, or at a minimum, the '#! cunbatch nnn' lines could have a second argument added -- the length of the headers. Leaving the first argument as it is would make the format portable to older software on the receiving end -- but newer software could send the first 'nnn' bytes though compress, yielding the headers, then send the remainder directly to a file somewhere, to be uncompressed by the news readers. This may all be moot, given the onset of C news. I don't know if C news does this or not. It's also possible that I'm overlooking something -- there could be very good reasons why this scheme won't work, but if so, they elude me. All the news readers would have to change -- I don't know if their various implementors and maintainers would go for it. I'm aware of 'readnews', 'vnews', 'rn', and 'vn'. Then there's 'notesfiles' and all their myriad gateway software. Changing 'inews' and 'batch' starts to look easy in comparison. Perhaps this is why it hasn't been done yet? If you reply via mail, let me know whether I can quote you in the summary that I will post if I get lots of responses. Somehow I think it's more likely that this one will be either (a) shot down quickly by one of those 'good reasons' that are eluding me, or (b) hashed out severly in this and other news groups. Send mail if you want to, though -- I started it, so I'll take on the task of summarizing your comments. -- Paul A. Vixie {ptsfa, crash, winfree}!vixie!paul 329 Noe Street dual!ptsfa!vixie!paul@ucbvax.Berkeley.EDU San Francisco CA 94116 paul@vixie.UUCP (415) 864-7013