steve@gapos.bt.co.uk (Steve Rooke) (02/05/91)
I have two sites, one sending a file, the other receiving. There is a lot of corruption at the receiving end caused by line noise. I am not able to use any standard form of error correction on the line but I can request retransmission of the file, as many times as needed, over another link. I need to be able to compair these files and reconstruct the original with reasonable confidence. By that I mean that if two, or more, files have the same text at a certain point then I am reasonably confident that the text is OK. I orignally thought of basing this on diff by trying to understand it's output and collecting up the lines that are not different in each file but the error rate may be high enough to cause some corruption on each line. I expect the routine would have to look for substrings and be able to re-sync when characters are lost/gained. Has anyone tried to do this sort of thing before and how did you do it, please? Solutions based upon diff would do, I guess, for a majority of the time. Thanks a lot, Steve -- Steve Rooke steve@gapos.bt.co.uk (...mcsun!ukc!gapos!steve) UK + 394 693595 BT, CSD/AS, Area 106, Anzani House, | "You roll the dice with your heart Trinity Ave, FELIXSTOWE, Suffolk, UK | and soul, But some times you #include <std/disclaimer> | just don't know." - Sam Brown
hunt@dg-rtp.rtp.dg.com (Greg Hunt) (02/06/91)
In article <steve.665748651@paddy>, steve@gapos.bt.co.uk (Steve Rooke) writes: > I have two sites, one sending a file, the other receiving. There is a lot > of corruption at the receiving end caused by line noise. I am not able to > use any standard form of error correction on the line but I can request > retransmission of the file, as many times as needed, over another link. > > I need to be able to compair these files and reconstruct the original with > reasonable confidence. By that I mean that if two, or more, files have the > same text at a certain point then I am reasonably confident that the text > is OK. > > Has anyone tried to do this sort of thing before and how did you do it, > please? Solutions based upon diff would do, I guess, for a majority of > the time. When I've had file transmission problems, I've used sum(1) to produce a checksum of the file on both the sending side machine and the receiving side machine and compared the results. If they weren't the same, then I knew that something got corrupted in the transmission and I got the file again. If the systems you're working with have sum(1) that might be an easy thing to use. Also, sum(1) will work for any sort of file, it doesn't just have to be text (which is the only thing diff(1) can look at). -- Greg Hunt Internet: hunt@dg-rtp.rtp.dg.com DG/UX Kernel Development UUCP: {world}!mcnc!rti!dg-rtp!hunt Data General Corporation Research Triangle Park, NC, USA These opinions are mine, not DG's.
meissner@osf.org (Michael Meissner) (02/07/91)
In article <1991Feb6.142829.20725@dg-rtp.dg.com> hunt@dg-rtp.rtp.dg.com (Greg Hunt) writes: | When I've had file transmission problems, I've used sum(1) to produce | a checksum of the file on both the sending side machine and the | receiving side machine and compared the results. If they weren't the | same, then I knew that something got corrupted in the transmission and | I got the file again. | | If the systems you're working with have sum(1) that might be an easy | thing to use. Also, sum(1) will work for any sort of file, it doesn't | just have to be text (which is the only thing diff(1) can look at). The only hitch is that sum(1) produces different results on System V based systems and Berkeley based systems. I think sum -r on System V gives the BSD behavior. -- Michael Meissner email: meissner@osf.org phone: 617-621-8861 Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142 Considering the flames and intolerance, shouldn't USENET be spelled ABUSENET?
lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (02/07/91)
In article <MEISSNER.91Feb6163623@curley.osf.org> meissner@osf.org (Michael Meissner) writes: : In article <1991Feb6.142829.20725@dg-rtp.dg.com> : hunt@dg-rtp.rtp.dg.com (Greg Hunt) writes: : : | When I've had file transmission problems, I've used sum(1) to produce : | a checksum of the file on both the sending side machine and the : | receiving side machine and compared the results. If they weren't the : | same, then I knew that something got corrupted in the transmission and : | I got the file again. : | : | If the systems you're working with have sum(1) that might be an easy : | thing to use. Also, sum(1) will work for any sort of file, it doesn't : | just have to be text (which is the only thing diff(1) can look at). : : The only hitch is that sum(1) produces different results on System V : based systems and Berkeley based systems. I think sum -r on System V : gives the BSD behavior. Since this is cross-posted to comp.lang.perl, I suppose it's okay for me to mention that you can emulate System V sum with #!/usr/bin/perl undef $/; while (<>) { print unpack("%32C*", $_) % 65535, " ", int((length()+511)/512), " $ARGV\n"; } The Book, by the way, is wrong when it says you can emulate sum with "%16C*". That is only guaranteed to work on files less than 256 bytes long (512 if there are not eighth bits). Teach me to choose my test cases better... No, I didn't have any sources to consult. The man page says sum does a 16-bit checksum, and it lies. It does modulo 65535 (not 65536). Ah well. The above code will only work on files up to 2**24 bytes long or so. Some machines may need to change the "%32C*" to "%31C*" until 4.0 comes out, since some machines think that 1 << 32 == 1, GRRR! I won't mention any names, because I don't want to get sun4's into trouble... :-) Larry
raja@bombay.cps.msu.edu (Narayan S. Raja) (02/07/91)
In article <1991Feb6.142829.20725@dg-rtp>, (Greg Hunt) writes:
< When I've had file transmission problems, I've used sum(1) to produce
< a checksum of the file on both the sending side machine and the
< receiving side machine and compared the results. If they weren't the
< same, then I knew that something got corrupted in the transmission and
< I got the file again.
I've also used sum for the same purpose.
However, according to the man page, sum
may give different checksums if sizeof(int)
is different on the two machines.
Narayan Sriranga Raja.
les@chinet.chi.il.us (Leslie Mikesell) (02/08/91)
In article <steve.665748651@paddy> steve@gapos.bt.co.uk (Steve Rooke) writes: >I have two sites, one sending a file, the other receiving. There is a lot >of corruption at the receiving end caused by line noise. I am not able to >use any standard form of error correction on the line but I can request >retransmission of the file, as many times as needed, over another link. Unless one or both of the sites are truely arcane or mismanaged, you should be able to get a version of kermit working to provide error correction during the transfer. If you have a problem with getting appropriate permissions for outbound calls with kermit, you might consider using a PC as an intermediate, placing the calls into both machines. Both the unix and PC versions can be script-driven so you can probably make it run unattended. Les Mikesell les@chinet.chi.il.us
worley@compass.com (Dale Worley) (02/08/91)
When I've had file transmission problems, I've used sum(1) to produce a checksum of the file on both the sending side machine and the receiving side machine and compared the results. If they weren't the same, then I knew that something got corrupted in the transmission and I got the file again. But you're forgetting that in this application there are so many errors that one cannot expect that more than a few lines get through without error. The probability that the the entire file gets through without error is infinitesimal, and waiting for it to happen twice would take forever. Here's an idea: Break up lines into, say, ten-character lines. (In fact, you are using newlines in the file to resynchronize the line-breaking algorithm.) The line length should be chosen so that at least 3/4 of the created lines have no errors in them. Then apply Gnu diff or diff3 (for speed) to the resulting files. Since most of the ten-character lines get through uncorrupted, diff should be able to discern how the two files correspond. Then you can integrate the output of one or more diffs to reconstruct the file. Dale Worley Compass, Inc. worley@compass.com -- PHOTOVOLTAICS: safe and clean (but not cheap) electricity from the SUN.
guy@auspex.auspex.com (Guy Harris) (02/08/91)
>Some machines may need to change the "%32C*" to "%31C*" until 4.0 comes >out, since some machines think that 1 << 32 == 1, GRRR! I won't mention >any names, because I don't want to get sun4's into trouble... :-) Or PCs and clones or other 386-based machines, or 3B{2,5,15}s, or perhaps DECstations, or MIPS boxes, or MIPS-based SGI boxes, or.... SPARC is hardly unique in that regard; maybe MIPS's compiler generates instructions to compensate for the fact that the shift count is taken modulo 32, but other compilers don't.
steve@gapos.bt.co.uk (Steve Rooke) (02/08/91)
In article <steve.665748651@paddy>, steve@gapos.bt.co.uk (Steve Rooke) writes: > I have two sites, one sending a file, the other receiving. There is a lot > of corruption at the receiving end caused by line noise. I am not able to > use any standard form of error correction on the line but I can request > retransmission of the file, as many times as needed, over another link. > > I need to be able to compair these files and reconstruct the original with > reasonable confidence. By that I mean that if two, or more, files have the > same text at a certain point then I am reasonably confident that the text > is OK. > > Has anyone tried to do this sort of thing before and how did you do it, > please? Solutions based upon diff would do, I guess, for a majority of > the time. Thanks for all your initial replys about using file xfer protocols, kermit and such, checksuming of the file at each end and splitting the file into short line lengths to enable diff to at least produce some matches. I guess I should have expanded on the real problem. As I stated, I cannot use any standard xfer protocols which is due to the sending and receiving equipment being dumb (ie LIKE telex [no flames please!]). The file is then passed onto a U*IX system where it is checked and actioned upon. The feedback path, for error retransmission, is by voice as, up to now, a human has checked the file and pieced the contents together from a number of transmissions. As you can see I have no way of runing kermit (or the like), checksums or altering the standard xmission of the file but I can request a resending as many times as necessary. The only way, I can see, for rebuilding the correct contents is to compair sets of files and select matching sub-strings in them. If you have any further ideas or some code fragments then please let me know. Thanks again, Steve -- Steve Rooke steve@gapos.bt.co.uk (...mcsun!ukc!gapos!steve) UK + 394 693595 BT, CSD/AS, Area 106, Anzani House, | "You roll the dice with your heart Trinity Ave, FELIXSTOWE, Suffolk, UK | and soul, But some times you #include <std/disclaimer> | just don't know." - Sam Brown
louie@sayshell.umd.edu (Louis A. Mamakos) (02/11/91)
This is (probably) not a perl solution, but you might investigate using some sort of forward error correcting code, since the "cost" of retransmitting parts of the data is "high." louie