mark@unisec.UUCP (03/30/87)
Here is the definition of the Punter C1 file transfer protocol, as written
by the man himself. You will note that there a a few garbled lines in this
document. They are due to the fact that this document was actually a series
of "bulletins" on Punter's main BBS. Since they were not available in the
download area, I had to use the "capture buffer" technique to collect them.
No essential information was lost.
-------------------------------------------------------------------------------
C1 Protocol
by
Steve Punter
C1 Protocol
The following document, describing the C1 (new Punter) protocol
was 'captured' from Steve Punter's BBS (PNET Node 1).
Inception
During the summer of 1981, when I first got the idea of
putting up a BBS, I started work on a simple protocol for
transfering programs to and from the BBS. This protocol was
similar in structure to XMODEM, and had about the same
reliability. Under good line conditions, it would give error free
transfers (this was to be expected). Under moderate noise
conditions, the protocol would hold up, and would still give
error free transmissions. It was under poor line conditions that
it, and XMODEM, would fall apart.
In the summer of 1984, I started work on a very ambitious
project; to produce a protocol that was both fast, and extremely
reliable, even under the worst of line conditions. From this work
came the "C1" protocol; not a simple block/checksum affair, but a
complete communication system for the computer.
Be warned, therefore, that under- standing the ins and outs
of "C1" will not be easy, but with enough patience, there's no
reason why even the least skilled programmer cannot be
comfortable with it.
Concepts
The concept behind the "C1" protocol was simple; to allow
two computers to "talk" with one another (while transferring
data) in such a way that nothing short of a complete distortion
of the transmission line could result in a misunderstanding. If
this concept could be realized, then files could be transferred
between computers without fear of line noise causing a breakdown
in the protocol, or that the received data would differ, in any
way, from that which was sent.
Nothing is perfect though, and I don't, for a minute, claim
that "C1" is completely infallible, but I can say, with
reasonable comfort, that "C1" can deliver bad line accuracy not
found in any other microcomputer transfer protocols. For this
accuracy though, there is a price to pay, and it is complexity;
the protocol is extremely difficult to duplicate without a
complete and utter understanding of the intricate workings of
"C1". This document will attempt to give you that required
understanding.
A Simple Conversation
Page 1
C1 Protocol
In first deciding how the protocol would function, I thought
of how two people could carry on a conversation under high noise
conditions, where misunderstanding would be the norm. The senario
I'm going to give differs from the protocol in that the people
talking have no way of verifying the accuracy of that they
believe they have heard. What it is meant to demonstrate is how
the the two computers "talk" with one another, and discuss the
neccessary repetition, or non-repetition, of each block of data
(the cornerstone of a checksum based transfer protocol).
Ken and John are attempting to assemble a machine in the middle
of a very noisy machine shop. Ken reads the instructions to John,
who carries them out. Even at close proximity, the two have
difficulty hearing one another, so they adopt of form banter
which allows each instuction to be verified and acknowledged.
Here is how the conversation might go:
John: Put part "A" in hole "D".
Ken: Understood, putting part "A"
in hole "D".
John: Acknowledged, let me know when
you are ready for the next
instruction.
Ken: Go ahead, what do I do next?
John: Put screw "E" through slot
"T".
Ken: I didn't understand that,
could you please repeat.
John: Oh, ok, tell me when you're
ready for that instruction
again.
Ken: Ready now.
The conversation continues on in this fashion, guaranteeing
that both John and Ken are fully aware of what the other is
doing. In real life,@Q=A15Rt=U19"BY"!Q%9"=ZA*A5RD!Q=IQz&KW6.Kbut
that's why they make more mistakes than a computer.
It is just this sort of "conversation" that the two
computers have between each other, only the language is
different; the instruction is replaced by the block of data, and
all other statements by special codes.
Communication Codes
One of the areas where simple protocols fall apart is in the
transmission of "handshaking codes". It's called handshaking
Page 2
C1 Protocol
because is implies that the two computers are having a dialogue,
rather than a monologue. These other protocols rely on single
byte (8 bit) words for their communication codes, and that could
spell trouble, since the likelihood of any one 8 bit code being
transposed into another is greater than for multiple byte codes.
For this reason, "C1" uses 3 byte (24 bit) codes which are
sufficiently different that the likelihood of a transposition is
extremely low. Not only that, but as you will soon learn, the
method of receiving 3 byte codes is designed such that if there
is sufficient line noise to make the neccessary transpositions,
there would most likely be extra characters sent; "C1" can avoid
this situation.
Five distinct codes are used in the protocol; "GOO", "BAD",
"ACK", "S/B", and "SYN". Each has it's own meaning, just like any
English word, and all are used in a specific sequence such that
synchronization difficulties would be automatically identified
and corrected.
Checksums
When a block of data is sent, we must have a way of
determining if it is correctly received or not. This accomplished
by using what is known as a checksum. Quite simply, a checksum is
a number which is mathematically derived from all the bytes
within the block. The receiving computer recalculates the sum and
compares it with the sum it received along with the block.
Theoretically, any fault in the transmitted data will result in
the two checksums not matching; but that's theory. In reality,
the accuracy of the checksum is based on the type of mathematical
operation used to calculate it, and what kind of noise it
encounters.
The simplest way to create a checksum is to add up all the
ASCII values of the bytes contained in the block. This is fine
for many types of errors, but not the type which inverts a
particular bit. Should two identical inversions occur on two
opposite bits, the sum will remain the same. For example, take
the following two bytes:
11010011 = 211
Plus 01101101 = 109
-------- ---
320
Now assume that the forth bit from the right of both of these
bytes becomes inverted by line noise:
11011011 = 219
Plus 01100101 = 101
-------- ---
320
Page 3
C1 Protocol
As you can see, the sum remains 320, even though line noise has
made obvious changes to the bytes. A better system is one called
"Cyclic Redundancy", which works on a somewhat different
principle. The checksum is 16 bits long, and is created in the
following fashion; each byte from the block is Exclusive OR'ed
with the low order part of the checksum. The checksum is then
ROTATED one bit to the left, and the procedure repeated with the
next byte.
Even this highly superior method can be tripped up, so I
have combined BOTH an additive checksum and Cyclic Redundancy
checksum to create one very hard to beat 32 bit "super" checksum.
Listening For Code Words
Although 3 byte code words are more reliable than 1 byte
code words, nothing is perfect. It was once said that if you let
an infinite number of monkeys bash away at typewriters for an
infinite amount of time, one of them would eventually type "To be
or not to be, that is the question". Although this stretches
statistical probability to it's limit, this kind of thing can
easily happen on a smaller scale; the letters "GOO" could quite
conceivably be produced by purely random line noise.
To try and eliminate ALL possible errors isn't feasible, but
"C1" makes an attempt at trying to eliminate as many as possible.
One reasonably probable fact is that any noise capable of
randomly producing "GOO", would not stop there; more likely, it
would produce a string of characters, something like "HGOOEK".
Were we to allow the protocol to listen exclusively for three
letter combinations, it would most assuredly pick out the "GOO"
in that string.
My specifications for "C1" call for a code recognition
routine which will ONLY make code word comparisons on the LAST 3
RECEIVED bytes. This is accomplished in my coding by going back
and testing for further characters after I have identified a
three byte code word. Should another byte be present, the
identified code word is thrown away, and the search will
continue.
Statement and Listen Loops
One immediate drawback to the system described above is that
a REAL code word, masked within some random noise, would be
rejected by the receiving computer. This would also be true of a
code word simply damaged by noise (like "GOE"). For a protocol to
be impervious to this sort of corruption, it must be capable of
restating code words over and over until the receiving computer
can understand, yet it must also have a way of knowing whether
the receiving computer got the code word or not. This was a fact
that eluded me when I wrote the original protocol.
Page 4
C1 Protocol
When we talk to other people, the cornerstone of
understanding is recognition. If we ask "What do you think?", yet
get no reply, we ask again. Only when we receive a reply from the
person to whom we are talking do we continue on with our next
statement. It would be pointless wasting our breath on someone
who isn't listening.
Within "C1", communication between computers is handled
through a similar system which I call the "Statement and Listen
Loop". It's quite simple really; when one computer has to "say"
something to the other, it does so, then waits for a
predetermined time for a known response. Should it fail to
receive a response within that period of time, the code word is
said again, and the computer listens for the reply. This
continues until the required response is heard. The system is
further enhanced by the fact that both computers are ALWAYS
engaged in a "Statement and Listen Loop".
Synchronization Lock
That rather ominous sounding title is actually rather
simple; it refers to a condition whereby the "Statement and
Listen Loops" of each computer become locked together. This is
analogous to two people speaking at the same time, over and over,
such that no effective communication takes place. In order to
guarantee that the two computers never get into this state, the
wait times of the loops are altered slightly.
Assume that the fixed wait loop time was 0.5 seconds; this
is called a "Short" wait. We also have a "Long" wait, which would
be slightly longer, say 0.6 seconds (actually, the delay within a
"Statement and Listen Loop" is not particually critical, but
should be somewhere in the neighbourhood of one half second).
Each time the computer goes through an SLL, a counter would
determine which type of wait to use; Long or Short. The sequence
is broken into three; the transmitting computer will use a
Long-Long-Short, while the receiving computer will use a
Short-Short-Long.
Block Structure
Each block of data contains somewhat more than just a
collection of characters taken from disk, it also contains a
"header". The header is 7 bytes long, and contains the following
information:
Byte 1: Low part of ADDITIVE checksum
Byte 2: High part of ADDITIVE checksum
Byte 3: Low part of CLC checksum
Byte 4: High part of CLC checksum
Byte 5: Size of NEXT block
Byte 6: Low part of Block Number
Page 5
C1 Protocol
Byte 7: High part of Block Number
As you remember from the section on "checksums", there are
two distinctly different, 16 bit (2 byte) checksums. One is an
additive checksum, composed of the mathematical sum of the
CBMASCII values of all the DATA bytes (and bytes 5 through 7 of
the header). The other checksum is calculated using Cyclic (CLC)
Redundancy (on the same bytes). These 32 checksum bits are placed
in the first 4 bytes of the header.
The 5th byte is the length of the NEXT block. This may seem
odd to some, but consider the difficulties in sending the size of
the current block in that self same block. You need to know the
block size to calculate the checksum, but you can't know for sure
that the block size is correct unless you have verified the
checksum. We call this a Catch-22. By sending the size of any
given block in the PREVIOUS block, the size is known for a fact
BEFORE the checksum is calculated.
In the 6th and 7th byte are the block number. This was added
quite early on in the development of "C1" under the assumption
that it would be necessary (as it is in XMODEM). As it turned
out, "C1" uses a method of handshaking which makes this
unnecessary. None the less, my specifications call for it's
inclusion, as certain uses of the block number could be made.
Also, the high order part of the block number (byte 7 of the
header) is used to flag the last block.
Varying Block Size
The reason that block size was included in the header was
originally to allow the last block only to vary in size (one can
never guarantee that the amount of data to be sent will divide
nicely into a preset block size). It quickly dawned on me that
"C1" was set up in such a way that ANY block size could be used
for ANY block in the transmission.
Varying block size has it's advantages; under reasonably
clean line conditions, large blocks transmit the most data with
the least handshaking (which is mildly time consuming). Smaller
blocks are superior under bad noise conditions, since smaller
blocks run a higher chance of making it through the noise
unscathed; and should it still fail to make it, less time is
required to repeat a smaller block.
My current implementation of "C1" allows the user to pick a
fixed block size between 40 and 255 bytes, but in other
implementations, there is no reason why block size couldn't be
varied DURING transmission to adapt to CHANGING line conditions.
One final thing concerning block structure is how would one
presume to know the size of the FIRST BLOCK if that is revealed
only in the block that came before it (quite a paradox). "C1"
Page 6
C1 Protocol
requires that the first block contain ONLY a header, which would
make that block 7 bytes long. This header would do little more
than supply the receiving computer with the size of first REAL
block. Accuracy of this first "dummy" block is guaranteed since
it must still pass the checksum tests. You must make the block
number for this dummy block "0".
Communication Syntax
Now that you understand the block structure, handshaking
methods, and code word vocabulary, it comes time to find out how
this all comes together.
Most procotols have very simple handshaking between blocks
which is easy to trip up, given sufficiently noisy conditions.
Usually, the transmitting computer sends the block, then waits
for a response from the receiving computer; either "good" or
"bad". The transmitting computer then proceeds to send the next
block (if "good") or resend the last block (if "bad"). This
system falls apart the moment the transmitting computer receives
a false indication of "good" or "bad" and goes on to transmit the
wrong block (and whether the receiving computer likes it or not,
it has to tackle with another block). Should things get out of
sync, and the transmitting computer sends the next block when it
should have sent the last one again, XMODEM attempts to make
corrections by use of the block number encoded within each block.
"C1" does nothing so crude; it's very communication syntax
guarantees that neither computer will get out of phase with the
other. Whereas XMODEM uses a single statement monologue between
each block, "C1" uses a multiple part dialogue. This makes "C1"
about 3% slower than XMODEM, but this small trade-off in speed
for accuracy will be well worth it the first time you run into
trouble with XMODEM.
XMODEM communcations would look something like this:
Xmit: Transmits Block
Rec : "Good"
Xmit: Transmits Next Block
Rec : "Bad"
Xmit: Transmits Same Block Again
In "C1", the transmission would look something like this:
Xmit: Transmits Block
Rec : "Good"
Page 7
C1 Protocol
Xmit: Good block acknowledged
Rec : Send next block for me
Xmit: Transmits Next Block
Rec : "Bad"
Xmit: Bad block acknowledged
Rec : Send that block again
Xmit: Transmits Same Block Again
In this type of transmission dialogue, neither computer can
get out of sync, since should it receive the opposite response
than it expects, it goes back to give the correct code word for
the response it DID RECEIVE, thus regaining proper
synchronization. Couple this with the "Statement and Listen
Loops", and you can readily see than communication would be hard
to break down.
Syntax Description
,ti +5 The following diagram should give you an understanding of
the flow of information between blocks:
For a Good Block:
Xmit: [Block] "ACK" [Next Block]
Rec : "GOO" "S/B"
For a Bad Block:
Xmit: [Block] "ACK" [Same Block]
Rec : "BAD" "S/B"
Actually, the two are identical; the only difference is the
substitution of either "GOO" or "BAD" as the response to the
received block.
Immediately after receiving the block, the receiving
computer recalculates the checksum to determine validity of the
data. In the meantime, the transmitting computer starts to wait
for a "GOO" or "BAD" signal. Since it can "say" nothing until it
receives one of these codes, it merely waits. That may sound
suspiciously like a good place to "hang up" the protocol, but the
receiving end is eventually going to finish receiving the block,
either because it timed out waiting, or it finished collecting
the correct number of bytes from the transmitting computer.
Page 8
C1 Protocol
At that time, the receiving computer sends the appropriate
code word ("GOO" or "BAD") and begins to wait for an
acknowledgement ("ACK"). If it doesn't receive the "ACK" in about
one half second, it sends the "GOO" or "BAD" code word once
again. Meanwhile, the transmitting computer has been patiently
awaiting the reception of the "GOO" or "BAD" code. Once it
receives it, it transmits an "ACK" and starts to wait for an
"send block" signal ("S/B"). If it doesn't get the "S/B" within
about one half second, it sends "ACK" again.
Back at the receiving computer, which is waiting for this
"ACK" signal, it receives it and sends the "S/B" signal and
begins to wait for the block. Should it receive an "ACK" while
waiting for the block, or receives nothing at all for
approximately 5 seconds, it assumes that the transmitting
computer hasn't heard the "S/B" and transmits it again. In the
meantime, the transmitting computer is waiting for the "S/B", and
upon reception, starts sending the block. The process has now
started all over again.
A quick analysis of this system will reveal that it's damned
near impossible to get any type of noise which could possibly
mimick the code sequences required. Also, no noise could stop the
eventual completion of the above sequence, since each computer is
aways "sending and waiting". If two people keep repeating their
sentences over and over, and continue to listen to the other
person, even a noisy room couldn't stop them from hearing one
another EVENTUALLY.
Of course, some line noise is just so horrendous, that even
this method of communication could fail. Then again, this type of
noise would make it damned near impossible for the user to be
online in the first place, so it can be considered an unlikely
event.
But, should one of the computers go offline for any reason,
we wouldn't want the other computer to keep looping and looping
until it died of old age. Although I haven't built in such
protecbinn into the terminal program I distribute in the public
domain, my BBS program does have abortion code. Should the
protocol on the BBS have to go through the "Statement and Listen
Loop" more than 12 times in row (which is hightly unlikely if the
other computer is still online), it will abort the transfer.
Similar code could be used in your implementation.
The End-Off Situation
When the final block is transmitted, the high order part of
the block number should be made HEX "FF" (255 decimal). This will
inform the receiving computer that this is the last block of
data, and to expect no more. The question now arises; how can
both computers be 100% sure that the other is fully aware of the
file completion? A fair question, but not one with a simple
Page 9
C1 Protocol
answer.
When the transmitting computer receives the "GOO" for the
last block, it can be fairly certain that the receiving computer
has received the final block, but it must inform the receiving
computer that it knows this. It does so by sending an "ACK", but
cannot be sure the receiving computer has received the "ACK"
unless it gets the "S/B" signal back. Now, the transmitting
computer must acknowledge the reception of the "S/B", but under
the normal communications syntax, it would now have send a block.
This is where the "End-Off" syntax comes into play; after
receiving the "S/B", the transmitting computer sends back a "SYN"
signal. In response to that receiving computer sends it's own
"S/B" signal, then waits for the final "S/B" from the
transmitting computer. Since it will not be responding to this
code, it simply goes into a wait cycle for approximately 5
seconds. If it does get the "S/B" within that 5 seconds, it ends
immediately, but otherwise doesn't really care if it receives the
code or not since at this stage, there is a 100% assurance of
both computers knowing things are Ok.
The transmitting computer need only send three copies of the
"S/B" code at this point, since, as stated above, there is full
assurance that both computers are finished. NOTE that the code
words chosen for the End-Off situation are not necessarily
related to their appearant function.
Transfering File Type
When transfering files from one computer to another it is
often necessary to also transfer the file type, but this must be
known BEFORE the file is opened, and, therefore, before the
protocol begins. "C1" does not impose any strict rules on what
sort of information you transfer about the files, if any, but
when writing a terminal program to communicate with one of my
bulletin boards, the following should be done:
Using a full implementation of the "C1" procotol (first
dummy block, data block, and End-Off), transmit a single byte of
data corresponding to the following file types:
1 = Program File
2 = SEQ File
3 = WordPro File
Transmitting this single piece of data would require that
TWO blocks be sent; the initial dummy block to set up the size of
the first data block (of which there will be only one, size 8),
and the data block itself, consisting of 7 header bytes and the
single file type byte.
For other applications, one could conceivable transfer much
Page 10
C1 Protocol
more information, including file name, file type, computer type,
etc. It could even be possible to transfer multiple files,
specifying the number and name of each file in this first
transmission. Alternately, no one said you HAVE to use this first
separate transmission; if no information other the file needs to
be transmitted, you just send the file and nothing more.
Page 11
--
| Mark R. Rinfret, SofTech, Inc. mark@unisec.usi.com |
| Guest of UniSecure Systems, Inc., Newport, RI |
| UUCP: {gatech|mirror|cbosgd|uiucdcs|ihnp4}!rayssd!unisec!mark |
| work: (401)-849-4174 home: (401)-846-7639 |