5132ts2@hound.UUCP (T.SCHONFELD) (12/18/85)
I am sure you have seen this before, but if anyone has info describing the Xmodem protocol, I would greatly appreciate it if you could send me the info. ihnp4!hound!5132ts2
reintom@rocky2.UUCP (Tom Reingold) (12/21/85)
>I am sure you have seen this before, but if anyone has info describing >the Xmodem protocol, I would greatly appreciate it if you could send >me the info. > ihnp4!hound!5132ts2 > Here it is. I got it from a bulletin board. I don't remember which one. Tom Reingold New York City -------------------------------------------------------------- XMODEM File Transfer Protocol By Larry Jordan When transferring files between computers using the telephone system, there is always the chance that electrical noise will result in data transmission errors. To ensure proper transfer of files it is necessary to detect data transmission errors and to retransmit data that contains errors. Most people think that asynchronous parity error detection provides that capability. It does not. Parity error detection does tell you when a data transfer error has occurred, but it is up to you to retransmit the data to correct errors. The problem is that parity error detection is not actually performed by most IBM PC communication packages. If a package does perform the error detection, it may not inform you of errors in such a way that you know to immediately retransmit the data. ASCOM, for example, places an asterisk in a file where parity errors are detected, but you may not realize the errors occurred until long after the file is transferred. To ensure "error-free" data transfer you need a protocol file transfer technique. Andrew Fluegelman has added such a technique to PC-TALK.III called the XMODEM protocol. A protocol is a set of rules and conventions that apply to a specific area of communications that allow participants to properly communicate regardless of the hardware brand or software package being used. The protocol file transfer is a set of rules for transferring files which specifies a set of ASCII handshaking characters and the sequence of handshaking required to perform certain file transfer functions. Protocol handshaking signals allow communication software to transfer text, data and machine code files, and to perform sophisticated error-checking. The handicap in using protocol file transfer techniques is that the computers on both ends of the communications link must be using compatible software; there is no standard that controls these protocols and almost all communication packages that have a protocol file transfer option use a protocol unique to that package. This means that a business or group of people must standardize its microcomputer communications software to take advantage of protocol transfers. The Ward Christensen XMODEM protocol is one specific file transfer protocol that may become a default standard in personal communications because of its widespread use on bulletin boards and because of its inclusion in low cost personal computer communication packages such as PC-TALK. It has not gained widespread acceptance in business communication packages partly because the protocol is public domain; most business communication package designers use unique protocols to force businesses to use their software on both ends of communication links. By providing you with this insight into protocol transfer and explaining in detail the operation of the XMODEM protocol, I hope to add momentum to the development of a "standard protocol" whether it be the XMODEM model or some other model. Users of communication software deserve a standard protocol that will allow them to use the technique with any microcomputer regardles of the software packages employed. The XMODEM protocol is illustrated in Figure 1. As you can see from that figure, XMODEM does not begin the transfer of data until the receiving computer signals the transmitting computer that it is ready to receive data. The Negative Acknowledge (NAK) character is used for this signal and is sent to the transmitting computer every 10 seconds until the file transfer begins. If the file transfer does not begin after 9 NAK's are sent, the process has to be manually restarted. After a NAK is received, the transmitting computer uses a Start of Header (SOH) character and two block numbers (a true block number followed by a 1's complement of the number) to signal the start of a 128-byte block of data to be transferred then sends the block followed by an error-checking checksum. The checksum is calculated by adding the ASCII values of each character in the 128 character block; the sum is then divided by 255 and the remainder is retained as the checksum. After each block of data is transferred, the receiving computer computes its own checksum and compares the result to the checksum received from the transmitting computer. If the two values are the same, the receiving computer sends an Acknowledge (ACK) character to tell the receiver to send the next sequential block. If the two values are not the same, the receiving computer sends the transmitter an NAK to request a retransmission of the last block This retransmission process is repeated until the block of data is properly received or until 9 attempts have been made to transmit the block. If the communications link is noisy, resulting in improper block transmission after 9 attempts, the file transfer is aborted. XMODEM uses two block numbers at the start of each block to be sure the same block is not transmitted twice because of a handshake character loss during the transfer. The receiving computer checks the transmitted block to be sure that it is the one requested and blocks that are retransmitted by mistake are thrown away. When all data has been successfully transmitted, the transmitting computer sends the receiver an End of Transmission (EOT) character to indicate the end of file. The XMODEM protocol offers the IBM PC several advantages over other protocols and file transfer methods. First, the protocol is in the public domain which makes it readily available for software designers to incorporate into a communications package. Second, the protocol is easy to implement using high level languages such as BASIC or Pascal. Third, the protocol only requires a 256-byte communication receive buffer which makes it attractive for IBM PC owners who only have 64K systems. Forth, the protocol allows a user to transfer non-ASCII 8-bit data files (i.e., COM, EXE and tokenized BASIC) between microcomputers because it calculates the end of a file based on file size and uses handshake signals to indicate the end of a file instead relying on an end of file marker character (control-Z) to terminate a file transfer. Fifth, XMODEM error-checking is superior to normal asynchronous parity error checking. The parity method of error-checking is 95% effective if the software on the receiving end checks for parity errors. XMODEM error-checking is 99.6% effective, and the software on the receiving end must check for errors. Parity errors detected also do not result in automatic retransmission of the bad data; XMODEM detected errors result in data retransmission until no errors are detected or until 9 retransmissions have been attempted. Finally, the protocol is used by many CP/M bulletin boards and having the protocol in a communications package allows the IBM PC user to receive error-checked files from these bulletin boards. Andrew Fluegelman has given the XMODEM protocol a real boost in the IBM PC world by including it in his package. He has also added significant power to the package by including the protocol Rumor has it that Don Withrow will soon add to the XMODEM momentum by adding it to his HOSTCOMM software package. Keep up the good work guys -- we will get a standard one way or the other! [This article was derived from material contained in a book written by Larry Jordan and Bruce Churchill to be published this Summer by The Brady Company. The article will also be in the 5th issue of PC World magazine.] XMODEM Protocol File Transfer Receiving Transmitting Computer Computer Ready to Ready to Receive Transmit | | | | |---------------------\NAK\--------------------->| | | |<------/SOH/Blk #1/Blk #1/Good Data/CkSum/------| | | |---------------------\ACK\--------------------->| | | |<------/SOH/Blk #2/Blk #2/Good Data/CkSum/------| | | |---------------------\ACK\--------------------->| | | |<------/SOH/Blk #3/Blk #3/Garbled Data/CkSum/---| | | |---------------------\NAK\--------------------->| | | |<------/SOH/Blk #3/Blk #3/Good Data/CkSum/------| | | |---------------------\ACK\--------------------->| | | |<--------------------/EOT/----------------------| | | |---------------------\ACK\--------------------->| | | V V File File Receipt Transmit Ends Ends Figure 1 ------------------------------------------------------------------- MODEM PROTOCOL OVERVIEW 1/1/82 by Ward Christensen. I will maintain a master copy of this. Please pass on changes or suggestions via CBBS/Chicago at (312) 545-8086, or by voice at (312) 849-6279. NOTE this does not include things which I am not familiar with, such as the CRC option implemented by John Mahr. Last Rev: (none) At the request of Rick Mallinak on behalf of the guys at Standard Oil with IBM P.C.s, as well as several previous requests, I finally decided to put my modem protocol into writing. It had been previously formally published only in the AMRAD newsletter. Table of Contents 1. DEFINITIONS 2. TRANSMISSION MEDIUM LEVEL PROTOCOL 3. MESSAGE BLOCK LEVEL PROTOCOL 4. FILE LEVEL PROTOCOL 5. DATA FLOW EXAMPLE INCLUDING ERROR RECOVERY 6. PROGRAMMING TIPS. -------- 1. DEFINITIONS. <soh> 01H <eot> 04H <ack> 05H <nak> 15H <can> 18H -------- 2. TRANSMISSION MEDIUM LEVEL PROTOCOL Asynchronous, 8 data bits, no parity, one stop bit. The protocol imposes no restrictions on the contents of the data being transmitted. No control characters are looked for in the 128-byte data messages. Absolutely any kind of data may be sent - binary, ASCII, etc. The protocol has not formally been adopted to a 7-bit environment for the transmission of ASCII-only (or unpacked-hex) data , although it could be simply by having both ends agree to AND the protocol-dependent data with 7F hex before validating it. I specifically am referring to the checksum, and the block numbers and their ones- complement. Those wishing to maintain compatibility of the CP/M file structure, i.e. to allow modemming ASCII files to or from CP/M systems should follow this data format: * ASCII tabs used (09H); tabs set every 8. * Lines terminated by CR/LF (0DH 0AH) * End-of-file indicated by ^Z, 1AH. (one or more) * Data is variable length, i.e. should be considered a continuous stream of data bytes, broken into 128-byte chunks purely for the purpose of transmission. * A CP/M "peculiarity": If the data ends exactly on a 128-byte boundary, i.e. CR in 127, and LF in 128, a subsequent sector containing the ^Z EOF character(s) is optional, but is preferred. Some utilities or user programs still do not handle EOF without ^Zs. * The last block sent is no different from others, i.e. there is no "short block". -------- 3. MESSAGE BLOCK LEVEL PROTOCOL Each block of the transfer looks like: <SOH><blk #><255-blk #><--128 data bytes--><cksum> in which: <SOH> = 01 hex <blk #> = binary number, starts at 01 increments by 1, and wraps 0FFH to 00H (not to 01) <255-blk #> = blk # after going thru 8080 "CMA" instr, i.e. each bit complemented in the 8-bit block number. Formally, this is the "ones complement". <cksum> = the sum of the data bytes only. Toss any carry. -------- 4. FILE LEVEL PROTOCOL ---- 4A. COMMON TO BOTH SENDER AND RECEIVER: All errors are retried 10 times. For versions running with an operator (i.e. NOT with XMODEM), a message is typed after 10 errors asking the operator whether to "retry or quit". Some versions of the protocol use <can>, ASCII ^X, to cancel transmission. This was never adopted as a standard, as having a single "abort" character makes the transmission susceptible to false termination due to an <ack> <nak> or <soh> being corrupted into a <can> and canceling transmission. The protocol may be considered "receiver driven", that is, the sender need not automatically re-transmit, although it does in the current implementations. ---- 4B. RECEIVE PROGRAM CONSIDERATIONS: The receiver has a 10-second timeout. It sends a <nak> every time it times out. The receiver's first timeout, which sends a <nak>, signals the transmitter to start. Optionally, the receiver could send a <nak> immediately, in case the sender was ready. This would save the initial 10 second timeout. However, the receiver MUST continue to timeout every 10 seconds in case the sender wasn't ready. Once into a receiving a block, the receiver goes into a one-second timeout for each character and the checksum. If the receiver wishes to <nak> a block for any reason (invalid header, timeout receiving data), it must wait for the line to clear. See "programming tips" for ideas Synchronizing: If a valid block number is received, it will be: 1) the expected one, in which case everything is fine; or 2) a repeat of the previously received block. This should be considered OK, and only indicates that the receivers <ack> got glitched, and the sender re-transmitted; 3) any other block number indicates a fatal loss of synchronization, such as the rare case of the sender getting a line-glitch that looked like an <ack>. Abort the transmission, sending a <can> ---- 4C. SENDING PROGRAM CONSIDERATIONS. While waiting for transmission to begin, the sender has only a single very long timeout, say one minute. In the current protocol, the sender has a 10 second timeout before retrying. I suggest NOT doing this, and letting the protocol be completely receiver-driven. This will be compatible with existing programs. When the sender has no more data, it sends an <eot>, and awaits an <ack>, resending the <eot> if it doesn't get one. Again, the protocol could be receiver-driven, with the sender only having the high-level 1-minute timeout to abort. -------- 5. DATA FLOW EXAMPLE INCLUDING ERROR RECOVERY Here is a sample of the data flow, sending a 3-block message. It includes the two most common line hits - a garbaged block, and an <ack> reply getting garbaged. <xx> represents the checksum byte. SENDER RECEIVER times out after 10 seconds, <--- <nak> <soh> 01 FE -data- <xx> ---> <--- <ack> <soh> 02 FD -data- xx ---> (data gets line hit) <--- <nak> <soh> 02 FD -data- xx ---> <--- <ack> <soh> 03 FC -data- xx ---> (ack gets garbaged) <--- <ack> <soh> 03 FC -data- xx ---> <ack> <eot> ---> <--- <ack> -------- 6. PROGRAMMING TIPS. * The character-receive subroutine should be called with a parameter specifying the number of seconds to wait. The receiver should first call it with a time of 10, then <nak> and try again, 10 times. After receiving the <soh>, the receiver should call the character receive subroutine with a 1-second timeout, for the remainder of the message and the <cksum>. Since they are sent as a continuous stream, timing out of this implies a serious like glitch that caused, say, 127 characters to be seen instead of 128. * When the receiver wishes to <nak>, it should call a "PURGE" subroutine, to wait for the line to clear. Recall the sender tosses any characters in its UART buffer immediately upon completing sending a block, to ensure no glitches were mis- interpreted. The most common technique is for "PURGE" to call the character receive subroutine, specifying a 1-second timeout, and looping back to PURGE until a timeout occurs. The <nak> is then sent, ensuring the other end will see it. * You may wish to add code recommended by Jonh Mahr to your character receive routine - to set an error flag if the UART shows framing error, or overrun. This will help catch a few more glitches - the most common of which is a hit in the high bits of the byte in two consecutive bytes. The <cksum> comes out OK since counting in 1-byte produces the same result of adding 80H + 80H as with adding 00H + 00H.
SECRIST%OAK.SAINET.MFENET@lll-mfe.arpa (12/23/85)
Date: Mon, 23-DEC-1985 08:11 EST To: INFO-MICRO@BRL.Arpa Message-ID: <[OAK.SAINET.MFENET].B244CBA0.008E7ED5.SECRIST> Organization: Science Applications Int'l. Corp., Oak Ridge, Tenn. Geographic-Location: 36 01' 42" N, 84 14' 14" W CompuServe-ID: [71636,52] X-VMS-Mail-To: ARPA%"INFO-MICRO@BRL.Arpa" +--------------------------------------------+ | The Christansen File Transfer Protocol | +--------------------------------------------+ by Richard C. Secrist Science Applications International Corp. Oak Ridge, Tenn. SECRIST%OAK.SAInet.MFEnet@LLL-MFE.Arpa After CP/M had made floppy-based computer systems a reality in the late 1970s, public domain software written by computer enthusiasts was in great demand. In response to this demand, one hobbiest, Ward Christansen of Chicago, Illinois, developed a simple, error-checking communications protocol to reliably transfer files between different computers via modems. Today, this frame-oriented protocol is known by the names given to Ward's original public domain implementations: XMODEM or MODEM7, or simply as "the Christansen protocol". When trying to pass data from one computer to another, talk is not usually cheap - particularly if one builds custom data transfer hardware for each computer device involved. For this reason, the connections between many computer devices such as modems and terminals was standardized some years ago. Most computers today are connected to terminals in an almost universal manner, in both hardware (the EIA RS-232 standard) and software (the ASCII character code). The XMODEM protocol takes advantage of the fact that most computers have been made to adhere to these standards. By tying two computers together through their RS-232 ports, and tricking each computer into believing that the other one is a terminal, two computers with cooperating programs can transfer data to each other by speaking in a communications protocol. A communications protocol is needed for several reasons: 1) just because the computers are plugged together doesn't mean they know how to send or receive data to the other machine, 2) one computer may be able to send data faster than the other can take it in, and some kind of flow control must be employed to prevent data overruns, 3) the data can become garbled during transmission because of electrical interference or other causes, corrupting the information one desires to be transferred. To define how the machines will speak to each other and solve all of these problems is what a communications protocol is all about. To prevent corruption of data and to synchronize communication, the cooperating computers mix this flow control and error-detecting information with the the actual data one wishes to transfer. A data communications protocol is defined by the way this mix of data is formatted, and sent between computers. The XMODEM protocol formats its data into individual packets of information that contain synchronization information, 128 bytes of data, and a checksum for error detection. Since the information is tucked away into packets that are interpreted by the XMODEM software and not by the command level of the host operating system directly, it is possible to send even binary data without having the host operating system misinterpret the control characters that may be part of the data itself. An XMODEM packet looks like this : 1 2 3 4 131 132 +-----+-----+-----+-----------------------------------------+-----+ ! SOH ! BLK ! BLC ! 128 bytes of 8 bit data padded with 0 ! CHK ! +-----+-----+-----+-----------------------------------------+-----+ Where: SOH - Start of header, ASCII 1 BLK - Block number, 0 to 255 BLC - 1s compliment of the block number CHK - Checksum of the data field mod 255 The first character is an ASCII SOH (Start-Of-Header, A) which delimits the beginning of the packet. The second byte is a sequential block number from 0 to 255. The block number is used to make sure the receiver gets everything in the right order. Since the block number could be corrupted by line noise, the third byte contains the one's complement of the block number. If the block number summed with it's one's complement is not equal to zero, the XMODEM packet has been corrupted and needs to be re-transmitted. Bytes 4 through 131 are 128 bytes of file data. This number of bytes was chosen to coincide with a CP/M disk sector, which makes sense if you consider XMODEM's CP/M heritage. The final byte is a checksum of the 128 data bytes. If the transmitted checksum does not equal the checksum the receiver calculates from the data area, the packet is bad and would need to be re-sent. The flow control mechanism serves to keep both computers synchronized, as well as provide a means to re-transmit bad packets. The flow control information is represented by certian ASCII characters : o Control-F (ASCII ACK, $06) is an ACKnowledgment that the packet has been transferred without error o Control-U (ASCII NAK, $15) is a Negative AcKnowledgement, meaning the packet was not received correctly or at all o Control-D (ASCII EOT, $04) signifies end-of-text o Control-X (ASCII CAN, $18) is a request from the sender to CANCEL the transfer o Control-A (ASCII SOH, $01) signifies Start-Of-Header, the beginning of an XMODEM packet A sample transaction between two computers provides the best illustration of the Christensen protocol in action : Typically the XMODEM programs have a dumb terminal mode to call up the other machine and go through any required login sequence. After that, XMODEM is invoked on the other machine by the caller, and is told to send or receive a file. Then you return to your machine with some magic control sequence and issue a complementary send or receive command to your own machine. After that, the transfer is all up to the computers. First things the machines have to do is get into synch. The receiving computer looks for a packet for ten seconds. If it hasn't seen a packet after that time it "times out", sending a NAK to the sender, and then starts waiting again with a read posted. This NAK is the cue for the sending machine to start, and off goes the packet. If the receiver gets a good packet -- if the block number is okay and the data passes the checksum -- the receiver sends an ACK to the sender. The sender interprets this as "okay, he said he got that alright, so I'll ship him another one", and promptly transmits anoter packet. Of course if he got a NAK that means "say again, something isn't right", and the sender would obligingly re-transmit the same packet. This process is repeated until the end-of-file, at which point the sender transmits an EOT (end of text). After a final ACK from the receiver, the transfer is complete. At this point the XMODEM program usually returns to "dumb terminal" mode. Trying to send the data over noisy lines of course complicates matters. For example, if the sender transmits a block and then misses the ACK from the receiver, the sender will time-out. Upon time-out, the sender will re-transmit the packet over again - even though the receiver has already got it. When the receiver checks the block number, it will know it already has got this packet, drop it on the floor in discust, and ACK the sender, putting them back in synch. There are numerous different error cases one can experience. Every time one of them happens a counter gets bumped. If the sender misses the ACK over and over again the XMODEM protocol gives up after 10 tries. In fact, 10 is sort of the magic error number: 10 seconds between tries up to 10 tries. Many implementations also send an ASCII CAN to the sender if the receiver aborts. Under scrutiny, this protocol has some holes in it. In practice, it is fairly reliable and effective. Over the years people have started doing strange things with the protocol and as it gets bent and twisted there are some logical consequences of all this. First off, when transferring machine-specific files from one machine to another you have to think through what you're doing. Remember when transferring ASCII text data you are going to take all of the environmentally-dependent features of how the text is stored on the source system to the destination computer. This may require some neat hacking on your own part to set matter straight. For example, moving a text file from CP/M-80 to a VAX under VAX/VMS will make the file look rather scatterbrained when you get it there. Simply jumping into TECO and exiting back out will clean it up for the most part, except for the Z's all over the end which can simply be deleted with your favorite editor. Other things to watch for include XMODEM implementations that also offer CRC-support instead of data checksums (make sure you're both in the same mode before you start the transfer - if you "time out" 10 times on the first try mismatched modes is frequently the cause), transferring BASIC files that may be stored in a tokenized format, or porting binaries over that include operating system specific calls to things like the CP/M BIOS. Pseudo-code of the SEND algorithm (ref. 2) ------------------------------------------ open the file to be sent; initialize the modem; while (there are still sectors to be sent) { repeat { send an SOH; send the sector number; send the complemented sector number ; send the data and compute a checksum; send the checksum; wait for a response with timeout; } until (the response is an ACK ); } send an EOT character; wait for an acknowledgement; close the file; Pseudo-code of the RECEIVE algorithm (ref. 2) --------------------------------------------- create the new file in the directory; initialize the modem; repeat { wait for an initial SOH, EOT, or TIMEOUT; if (the character is an SOH) { get the sector number; get the complemented sector number; get the data and compute a checksum; get the checksum; if (checksum = computed checksum) send an ACK; else send a NAK; } if (the character is an EOT) { close the new file; send an ACK; } } until (the initial character was an EOT); References: 1) Kermit Users' Guide, Third Edition; Catchings, et.al.; Columbia University, 1983 2) Lmodem: A Small Remote-Communication Program; Clark, David D.; BYTE magazine, Nov. 1983 3) Chapter 16: Protocol Transfers; Blue, et. al.; ASCII Express: The Professional, Instruction Manual; Southwestern Data Systems, 1982