[comp.sys.cbm] Punter C1 Protocol

mark@unisec.UUCP (03/30/87)
Here is the definition of the Punter C1 file transfer protocol, as written
by the man himself.  You will note that there a a few garbled lines in this
document.  They are due to the fact that this document was actually a series
of "bulletins" on Punter's main BBS.  Since they were not available in the
download area, I had to use the "capture buffer" technique to collect them.
No essential information was lost.
-------------------------------------------------------------------------------

                                                                           













                                     C1 Protocol
                                          
                                          by
                                          
                                    Steve Punter
          









































                                                                           




                                     C1 Protocol                           



          The  following  document, describing the C1 (new Punter) protocol
          was 'captured' from Steve Punter's BBS (PNET Node 1).


          
          Inception

               During the summer of 1981, when I  first  got  the  idea  of
          putting  up  a  BBS,  I  started  work  on  a simple protocol for
          transfering programs to and  from  the  BBS.  This  protocol  was
          similar   in   structure  to  XMODEM,  and  had  about  the  same
          reliability. Under good line conditions, it would give error free
          transfers  (this  was  to  be  expected).  Under  moderate  noise
          conditions,  the  protocol  would  hold  up, and would still give
          error free transmissions. It was under poor line conditions  that
          it, and XMODEM, would fall apart.

               In  the  summer  of 1984, I started work on a very ambitious
          project; to produce a protocol that was both fast, and  extremely
          reliable, even under the worst of line conditions. From this work
          came the "C1" protocol; not a simple block/checksum affair, but a
          complete communication system for the computer.

               Be  warned, therefore, that under- standing the ins and outs
          of "C1" will not be easy, but with enough  patience,  there's  no
          reason   why   even   the  least  skilled  programmer  cannot  be
          comfortable with it.
          


          Concepts

               The concept behind the "C1" protocol was  simple;  to  allow
          two  computers  to  "talk"  with  one another (while transferring
          data) in such a way that nothing short of a  complete  distortion
          of  the  transmission line could result in a misunderstanding. If
          this concept could be realized, then files could  be  transferred
          between  computers without fear of line noise causing a breakdown
          in the protocol, or that the received data would differ,  in  any
          way, from that which was sent.

               Nothing  is perfect though, and I don't, for a minute, claim
          that  "C1"  is  completely  infallible,  but  I  can  say,   with
          reasonable  comfort,  that "C1" can deliver bad line accuracy not
          found in any other microcomputer  transfer  protocols.  For  this
          accuracy  though,  there is a price to pay, and it is complexity;
          the protocol  is  extremely  difficult  to  duplicate  without  a
          complete  and  utter  understanding  of the intricate workings of
          "C1". This document  will  attempt  to  give  you  that  required
          understanding.


          A Simple Conversation




                                       Page 1                              




                                       C1 Protocol                         



               In first deciding how the protocol would function, I thought
          of  how two people could carry on a conversation under high noise
          conditions, where misunderstanding would be the norm. The senario
          I'm going to give differs from the protocol in  that  the  people
          talking  have  no  way  of  verifying  the  accuracy of that they
          believe they have heard. What it is meant to demonstrate  is  how
          the  the  two  computers "talk" with one another, and discuss the
          neccessary repetition, or non-repetition, of each block  of  data
          (the cornerstone of a checksum based transfer protocol).

          Ken  and  John are attempting to assemble a machine in the middle
          of a very noisy machine shop. Ken reads the instructions to John,
          who carries them out. Even  at  close  proximity,  the  two  have
          difficulty  hearing  one  another,  so  they adopt of form banter
          which allows each instuction to  be  verified  and  acknowledged.
          Here is how the conversation might go:
           John: Put part "A" in hole "D".
          
           Ken: Understood, putting part "A"
              in hole "D".
          
           John: Acknowledged, let me know when
              you are ready for the next
              instruction.
          
           Ken: Go ahead, what do I do next?
          
           John: Put screw "E" through slot
              "T".
          
           Ken: I  didn't  understand that,
              could you please repeat.
          
           John: Oh, ok, tell me when you're
              ready  for that instruction
              again.
          
           Ken: Ready now.

               The  conversation continues on in this fashion, guaranteeing
          that both John and Ken are fully  aware  of  what  the  other  is
          doing.  In real life,@Q=A15Rt=U19"BY"!Q%9"=ZA*A5RD!Q=IQz&KW6.Kbut
          that's why they make more mistakes than a computer.

               It  is  just  this  sort  of  "conversation"  that  the  two
          computers   have   between  each  other,  only  the  language  is
          different; the instruction is replaced by the block of data,  and
          all other statements by special codes.


          Communication Codes

               One of the areas where simple protocols fall apart is in the
          transmission  of  "handshaking  codes".  It's  called handshaking



                                       Page 2                              




                                     C1 Protocol                           



          because is implies that the two computers are having a  dialogue,
          rather  than  a  monologue.  These other protocols rely on single
          byte (8 bit) words for their communication codes, and that  could
          spell  trouble,  since the likelihood of any one 8 bit code being
          transposed into another is greater than for multiple byte  codes.
          For  this  reason,  "C1"  uses  3  byte  (24 bit) codes which are
          sufficiently different that the likelihood of a transposition  is
          extremely  low.  Not  only  that, but as you will soon learn, the
          method of receiving 3 byte codes is designed such that  if  there
          is  sufficient  line noise to make the neccessary transpositions,
          there would most likely be extra characters sent; "C1" can  avoid
          this situation.

               Five  distinct codes are used in the protocol; "GOO", "BAD",
          "ACK", "S/B", and "SYN". Each has it's own meaning, just like any
          English word, and all are used in a specific sequence  such  that
          synchronization  difficulties  would  be automatically identified
          and corrected.


          Checksums

               When a block of  data  is  sent,  we  must  have  a  way  of
          determining if it is correctly received or not. This accomplished
          by using what is known as a checksum. Quite simply, a checksum is
          a  number  which  is  mathematically  derived  from all the bytes
          within the block. The receiving computer recalculates the sum and
          compares it with the  sum  it  received  along  with  the  block.
          Theoretically,  any  fault in the transmitted data will result in
          the two checksums not matching; but that's  theory.  In  reality,
          the accuracy of the checksum is based on the type of mathematical
          operation  used  to  calculate  it,  and  what  kind  of noise it
          encounters.

               The simplest way to create a checksum is to add up  all  the
          ASCII  values  of  the bytes contained in the block. This is fine
          for many types of errors,  but  not  the  type  which  inverts  a
          particular  bit.  Should  two  identical  inversions occur on two
          opposite bits, the sum will remain the same.  For  example,  take
          the following two bytes:

              11010011 = 211
           Plus 01101101 = 109
              --------  ---
                   320

          Now  assume  that  the  forth bit from the right of both of these
          bytes becomes inverted by line noise:

              11011011 = 219
           Plus 01100101 = 101
              --------  ---
                   320




                                       Page 3                              




                                     C1 Protocol                           



          As you can see, the sum remains 320, even though line  noise  has
          made  obvious changes to the bytes. A better system is one called
          "Cyclic  Redundancy",  which  works  on  a   somewhat   different
          principle.  The  checksum  is 16 bits long, and is created in the
          following fashion; each byte from the block  is  Exclusive  OR'ed
          with  the  low  order  part of the checksum. The checksum is then
          ROTATED one bit to the left, and the procedure repeated with  the
          next byte.

               Even  this  highly  superior  method can be tripped up, so I
          have combined BOTH an additive  checksum  and  Cyclic  Redundancy
          checksum to create one very hard to beat 32 bit "super" checksum.


          Listening For Code Words

               Although  3  byte  code  words are more reliable than 1 byte
          code words, nothing is perfect. It was once said that if you  let
          an  infinite  number  of  monkeys bash away at typewriters for an
          infinite amount of time, one of them would eventually type "To be
          or not to be, that is  the  question".  Although  this  stretches
          statistical  probability  to  it's  limit, this kind of thing can
          easily happen on a smaller scale; the letters "GOO"  could  quite
          conceivably be produced by purely random line noise.

               To try and eliminate ALL possible errors isn't feasible, but
          "C1" makes an attempt at trying to eliminate as many as possible.
          One  reasonably  probable  fact  is  that  any  noise  capable of
          randomly producing "GOO", would not stop there; more  likely,  it
          would  produce  a  string of characters, something like "HGOOEK".
          Were we to allow the protocol to  listen  exclusively  for  three
          letter  combinations,  it would most assuredly pick out the "GOO"
          in that string.

               My specifications for  "C1"  call  for  a  code  recognition
          routine  which will ONLY make code word comparisons on the LAST 3
          RECEIVED bytes. This is accomplished in my coding by  going  back
          and  testing  for  further  characters  after I have identified a
          three byte  code  word.  Should  another  byte  be  present,  the
          identified  code  word  is  thrown  away,  and  the  search  will
          continue.


          Statement and Listen Loops

               One immediate drawback to the system described above is that
          a REAL code word, masked  within  some  random  noise,  would  be
          rejected  by the receiving computer. This would also be true of a
          code word simply damaged by noise (like "GOE"). For a protocol to
          be impervious to this sort of corruption, it must be  capable  of
          restating  code  words over and over until the receiving computer
          can understand, yet it must also have a way  of  knowing  whether
          the  receiving computer got the code word or not. This was a fact
          that eluded me when I wrote the original protocol.



                                       Page 4                              




                                       C1 Protocol                         



               When  we  talk  to  other   people,   the   cornerstone   of
          understanding is recognition. If we ask "What do you think?", yet
          get no reply, we ask again. Only when we receive a reply from the
          person  to  whom  we  are talking do we continue on with our next
          statement. It would be pointless wasting our  breath  on  someone
          who isn't listening.

               Within  "C1",  communication  between  computers  is handled
          through a similar system which I call the "Statement  and  Listen
          Loop".  It's  quite simple really; when one computer has to "say"
          something  to  the  other,  it  does  so,  then   waits   for   a
          predetermined  time  for  a  known  response.  Should  it fail to
          receive a response within that period of time, the code  word  is
          said  again,  and  the  computer  listens  for  the  reply.  This
          continues until the required response is  heard.  The  system  is
          further  enhanced  by  the  fact  that  both computers are ALWAYS
          engaged in a "Statement and Listen Loop".


          Synchronization Lock

               That  rather  ominous  sounding  title  is  actually  rather
          simple;  it  refers  to  a  condition  whereby the "Statement and
          Listen Loops" of each computer become locked  together.  This  is
          analogous to two people speaking at the same time, over and over,
          such  that  no  effective  communication takes place. In order to
          guarantee that the two computers never get into this  state,  the
          wait times of the loops are altered slightly.

               Assume  that  the fixed wait loop time was 0.5 seconds; this
          is called a "Short" wait. We also have a "Long" wait, which would
          be slightly longer, say 0.6 seconds (actually, the delay within a
          "Statement and Listen Loop"  is  not  particually  critical,  but
          should  be  somewhere  in  the neighbourhood of one half second).
          Each time the computer goes  through  an  SLL,  a  counter  would
          determine  which type of wait to use; Long or Short. The sequence
          is broken into  three;  the  transmitting  computer  will  use  a
          Long-Long-Short,   while   the  receiving  computer  will  use  a
          Short-Short-Long.


          Block Structure

               Each block of  data  contains  somewhat  more  than  just  a
          collection  of  characters  taken  from  disk, it also contains a
          "header". The header is 7 bytes long, and contains the  following
          information:

          Byte 1: Low part of ADDITIVE checksum
          Byte 2: High part of ADDITIVE checksum
          Byte 3: Low part of CLC checksum
          Byte 4: High part of CLC checksum
          Byte 5: Size of NEXT block
          Byte 6: Low part of Block Number



                                       Page 5                              




                                     C1 Protocol                           



          Byte 7: High part of Block Number

               As  you  remember from the section on "checksums", there are
          two distinctly different, 16 bit (2 byte) checksums.  One  is  an
          additive  checksum,  composed  of  the  mathematical  sum  of the
          CBMASCII values of all the DATA bytes (and bytes 5 through  7  of
          the  header). The other checksum is calculated using Cyclic (CLC)
          Redundancy (on the same bytes). These 32 checksum bits are placed
          in the first 4 bytes of the header.

               The 5th byte is the length of the NEXT block. This may  seem
          odd to some, but consider the difficulties in sending the size of
          the  current  block in that self same block. You need to know the
          block size to calculate the checksum, but you can't know for sure
          that the block size is  correct  unless  you  have  verified  the
          checksum.  We  call  this  a Catch-22. By sending the size of any
          given block in the PREVIOUS block, the size is known for  a  fact
          BEFORE the checksum is calculated.

               In the 6th and 7th byte are the block number. This was added
          quite  early  on  in the development of "C1" under the assumption
          that it would be necessary (as it is in  XMODEM).  As  it  turned
          out,   "C1"  uses  a  method  of  handshaking  which  makes  this
          unnecessary. None the  less,  my  specifications  call  for  it's
          inclusion,  as  certain  uses  of the block number could be made.
          Also, the high order part of the block  number  (byte  7  of  the
          header) is used to flag the last block.


          Varying Block Size

               The  reason  that  block size was included in the header was
          originally to allow the last block only to vary in size (one  can
          never  guarantee  that  the amount of data to be sent will divide
          nicely into a preset block size). It quickly dawned  on  me  that
          "C1"  was  set up in such a way that ANY block size could be used
          for ANY block in the transmission.

               Varying block size has  it's  advantages;  under  reasonably
          clean  line  conditions, large blocks transmit the most data with
          the least handshaking (which is mildly time  consuming).  Smaller
          blocks  are  superior  under  bad noise conditions, since smaller
          blocks run a  higher  chance  of  making  it  through  the  noise
          unscathed;  and  should  it  still  fail to make it, less time is
          required to repeat a smaller block.

               My current implementation of "C1" allows the user to pick  a
          fixed  block  size  between  40  and  255  bytes,  but  in  other
          implementations, there is no reason why block  size  couldn't  be
          varied DURING transmission to adapt to CHANGING line conditions.

               One  final thing concerning block structure is how would one
          presume to know the size of the FIRST BLOCK if that  is  revealed
          only  in  the  block  that came before it (quite a paradox). "C1"



                                       Page 6                              




                                     C1 Protocol                           



          requires that the first block contain ONLY a header, which  would
          make  that  block  7 bytes long. This header would do little more
          than supply the receiving computer with the size  of  first  REAL
          block.  Accuracy  of this first "dummy" block is guaranteed since
          it must still pass the checksum tests. You must  make  the  block
          number for this dummy block "0".


          Communication Syntax

               Now  that  you  understand  the block structure, handshaking
          methods, and code word vocabulary, it comes time to find out  how
          this all comes together.

               Most  procotols  have very simple handshaking between blocks
          which is easy to trip up, given  sufficiently  noisy  conditions.
          Usually,  the  transmitting  computer sends the block, then waits
          for a response from the  receiving  computer;  either  "good"  or
          "bad".  The  transmitting computer then proceeds to send the next
          block (if "good") or resend  the  last  block  (if  "bad").  This
          system  falls apart the moment the transmitting computer receives
          a false indication of "good" or "bad" and goes on to transmit the
          wrong block (and whether the receiving computer likes it or  not,
          it  has  to  tackle with another block). Should things get out of
          sync, and the transmitting computer sends the next block when  it
          should  have  sent  the  last  one again, XMODEM attempts to make
          corrections by use of the block number encoded within each block.

               "C1" does nothing so crude; it's very  communication  syntax
          guarantees  that  neither computer will get out of phase with the
          other. Whereas XMODEM uses a single statement  monologue  between
          each  block,  "C1" uses a multiple part dialogue. This makes "C1"
          about 3% slower than XMODEM, but this small  trade-off  in  speed
          for  accuracy  will  be well worth it the first time you run into
          trouble with XMODEM.

               XMODEM communcations would look something like this:

           Xmit: Transmits Block
          
           Rec : "Good"
          
           Xmit: Transmits Next Block
          
           Rec : "Bad"
          
           Xmit: Transmits Same Block Again

          In "C1", the transmission would look something like this:

           Xmit: Transmits Block
          
           Rec : "Good"
          



                                       Page 7                              




                                     C1 Protocol                           



           Xmit: Good block acknowledged
          
           Rec : Send next block for me
          
           Xmit: Transmits Next Block
          
           Rec : "Bad"
          
           Xmit: Bad block acknowledged
          
           Rec : Send that block again
          
           Xmit: Transmits Same Block Again

               In this type of transmission dialogue, neither computer  can
          get  out  of  sync, since should it receive the opposite response
          than it expects, it goes back to give the correct code  word  for
          the    response   it   DID   RECEIVE,   thus   regaining   proper
          synchronization. Couple  this  with  the  "Statement  and  Listen
          Loops",  and you can readily see than communication would be hard
          to break down.


          Syntax Description

          ,ti +5 The following diagram should give you an understanding  of
          the flow of information between blocks:

          For a Good Block:

           Xmit: [Block]  "ACK"  [Next Block]
          
           Rec :    "GOO"  "S/B"

          For a Bad Block:

           Xmit: [Block]  "ACK"  [Same Block]
          
           Rec :    "BAD"  "S/B"

               Actually,  the two are identical; the only difference is the
          substitution of either "GOO" or "BAD"  as  the  response  to  the
          received block.

               Immediately   after   receiving  the  block,  the  receiving
          computer recalculates the checksum to determine validity  of  the
          data.  In  the meantime, the transmitting computer starts to wait
          for a "GOO" or "BAD" signal. Since it can "say" nothing until  it
          receives  one  of  these  codes,  it merely waits. That may sound
          suspiciously like a good place to "hang up" the protocol, but the
          receiving end is eventually going to finish receiving the  block,
          either  because  it  timed out waiting, or it finished collecting
          the correct number of bytes from the transmitting computer.




                                       Page 8                              




                                       C1 Protocol                         



               At that time, the receiving computer sends  the  appropriate
          code   word   ("GOO"   or  "BAD")  and  begins  to  wait  for  an
          acknowledgement ("ACK"). If it doesn't receive the "ACK" in about
          one half second, it sends the  "GOO"  or  "BAD"  code  word  once
          again.  Meanwhile,  the  transmitting computer has been patiently
          awaiting the reception of  the  "GOO"  or  "BAD"  code.  Once  it
          receives  it,  it  transmits  an  "ACK" and starts to wait for an
          "send block" signal ("S/B"). If it doesn't get the  "S/B"  within
          about one half second, it sends "ACK" again.

               Back  at  the  receiving computer, which is waiting for this
          "ACK" signal, it receives it  and  sends  the  "S/B"  signal  and
          begins  to  wait  for the block. Should it receive an "ACK" while
          waiting  for  the  block,  or  receives  nothing   at   all   for
          approximately   5  seconds,  it  assumes  that  the  transmitting
          computer hasn't heard the "S/B" and transmits it  again.  In  the
          meantime, the transmitting computer is waiting for the "S/B", and
          upon  reception,  starts  sending  the block. The process has now
          started all over again.

               A quick analysis of this system will reveal that it's damned
          near impossible to get any type of  noise  which  could  possibly
          mimick the code sequences required. Also, no noise could stop the
          eventual completion of the above sequence, since each computer is
          aways  "sending  and waiting". If two people keep repeating their
          sentences over and over, and continue  to  listen  to  the  other
          person,  even  a  noisy  room couldn't stop them from hearing one
          another EVENTUALLY.

               Of course, some line noise is just so horrendous, that  even
          this method of communication could fail. Then again, this type of
          noise  would  make  it  damned near impossible for the user to be
          online in the first place, so it can be  considered  an  unlikely
          event.

               But,  should one of the computers go offline for any reason,
          we wouldn't want the other computer to keep looping  and  looping
          until  it  died  of  old  age.  Although  I haven't built in such
          protecbinn into the terminal program I distribute in  the  public
          domain,  my  BBS  program  does  have  abortion  code. Should the
          protocol on the BBS have to go through the "Statement and  Listen
          Loop" more than 12 times in row (which is hightly unlikely if the
          other  computer  is  still  online),  it will abort the transfer.
          Similar code could be used in your implementation.


          The End-Off Situation

               When the final block is transmitted, the high order part  of
          the block number should be made HEX "FF" (255 decimal). This will
          inform  the  receiving  computer  that  this is the last block of
          data, and to expect no more. The question  now  arises;  how  can
          both  computers be 100% sure that the other is fully aware of the
          file completion? A fair question,  but  not  one  with  a  simple



                                       Page 9                              




                                     C1 Protocol                           



          answer.

               When  the  transmitting  computer receives the "GOO" for the
          last block, it can be fairly certain that the receiving  computer
          has  received  the  final block, but it must inform the receiving
          computer that it knows this. It does so by sending an "ACK",  but
          cannot  be  sure  the  receiving  computer has received the "ACK"
          unless it gets the  "S/B"  signal  back.  Now,  the  transmitting
          computer  must  acknowledge the reception of the "S/B", but under
          the normal communications syntax, it would now have send a block.

               This is where the "End-Off" syntax comes  into  play;  after
          receiving the "S/B", the transmitting computer sends back a "SYN"
          signal.  In  response  to  that receiving computer sends it's own
          "S/B"  signal,  then  waits  for  the  final   "S/B"   from   the
          transmitting  computer.  Since  it will not be responding to this
          code, it simply goes  into  a  wait  cycle  for  approximately  5
          seconds.  If it does get the "S/B" within that 5 seconds, it ends
          immediately, but otherwise doesn't really care if it receives the
          code or not since at this stage, there is  a  100%  assurance  of
          both computers knowing things are Ok.

               The transmitting computer need only send three copies of the
          "S/B"  code  at this point, since, as stated above, there is full
          assurance that both computers are finished. NOTE  that  the  code
          words  chosen  for  the  End-Off  situation  are  not necessarily
          related to their appearant function.


          Transfering File Type

               When transfering files from one computer to  another  it  is
          often  necessary to also transfer the file type, but this must be
          known BEFORE the file  is  opened,  and,  therefore,  before  the
          protocol  begins.  "C1"  does not impose any strict rules on what
          sort of information you transfer about the  files,  if  any,  but
          when  writing  a  terminal  program to communicate with one of my
          bulletin boards, the following should be done:

               Using a full implementation  of  the  "C1"  procotol  (first
          dummy  block, data block, and End-Off), transmit a single byte of
          data corresponding to the following file types:

           1 = Program File
           2 = SEQ File
           3 = WordPro File

               Transmitting this single piece of data  would  require  that
          TWO blocks be sent; the initial dummy block to set up the size of
          the  first  data block (of which there will be only one, size 8),
          and the data block itself, consisting of 7 header bytes  and  the
          single file type byte.

               For  other applications, one could conceivable transfer much



                                       Page 10                             




                                     C1 Protocol                           



          more information, including file name, file type, computer  type,
          etc.  It  could  even  be  possible  to  transfer multiple files,
          specifying the number  and  name  of  each  file  in  this  first
          transmission. Alternately, no one said you HAVE to use this first
          separate  transmission; if no information other the file needs to
          be transmitted, you just send the file and nothing more.



















































                                       Page 11                             


-- 
| Mark R. Rinfret, SofTech, Inc.		mark@unisec.usi.com |
| Guest of UniSecure Systems, Inc., Newport, RI                     |
| UUCP:  {gatech|mirror|cbosgd|uiucdcs|ihnp4}!rayssd!unisec!mark    |
| work: (401)-849-4174	home: (401)-846-7639                        |