[comp.unix.questions] UUCP protocol information reposted from comp.doc

root@wittsend.LBP.HARRIS.COM (Administrator) (11/04/88)

In article <1616@netmbx.UUCP> big.t@netmbx.UUCP (Thomas L.) writes:
>In article <1214@vsedev.VSE.COM> James Logan III writes:
>>In article <5961@killer.DALLAS.TX.US> Robert Johnson writes:
>>>[...]
>>>[Talking 'bout UUCP...]    I need to know everything about it
>>I am also interested in the information.  Please email to me also, or
>>post to the net.
> ^^^^^^^^^^^^^^^
>Better post to the net, please. I'm VERY interested, too.

     Ok maybe I'll take a shot at this.  WARNING - This is the first of two long
postings that are reposts from comp.doc.  If someone thinks it's too much, well
flame on.  I saved these when they were posted to comp.doc some time ago.  I
guess the gang here either missed them or they don't read comp.doc.  As you can
tell from the dates, some of this has been rattling arround for quite awhile

     Now for article one!

--------snip snip snip--------

>Article 66 of comp.doc:
>Path: galbp!gatech!udel!rochester!cornell!uw-beaver!mit-eddie!ll-xn!ames!sdcsvax!brian
>From: brian@sdcsvax.UCSD.EDU (Brian Kantor)
>Newsgroups: comp.doc
>Subject: UUCP Protocol Information Potpourri
>Message-ID: <4513@sdcsvax.UCSD.EDU>
>Date: 20 Jan 88 02:04:02 GMT
>Organization: UCSD wombat breeding society
>Lines: 1047
>Approved: brian@cyberpunk.ucsd.edu

The following is collection of stuff that John Gilmore posted to the net
some time ago; with renewed interest in making nearly everything under
the sun talk uucp, I figured it was time this document appeared somewhere 
that it would get archived for future inquiries.

From ucsdhub!hp-sdd!hplabs!decwrl!sun!hoptoad!gnu Tue Feb  3 13:10:08 PST 1987

[This information came from the Tanner Andrews's uucpinfo mailing list.
This is a collection of people interested in writing a version
of uucp in the public domain.  Contact ihnp4!akgua!ucf-cs!ki4pv!tanner
to be added to the mailing list.  There have only been three messages
sent to the list; all are below.

	John Gilmore, hoptoad!gnu]

-----

>Subject: UUCP Protocol Information (issue #1)
>Date: Tue Nov 19 22:04:56 1985

Greetings.  First order of business is the fact that I probably have
a lousy or a slow address for some of you all.  Please complain, and
things will be corrected.  Those of you not receiving this because your
names have been omitted will please inform me, giving a good address.
Those who provided replies but who are not interested in receiving
further information please warn me; maybe things won't cross in the
mail.

Now that we're over that, welcome to the first issue.  There will most
likely be more, as more information is received.  Anyone with comments,
information, or clean suggestions to be shared should please write to
me at the return address given below.  I'll keep the copy of this
mailing list around, and make required additions, deletions, &c.  This
issue is just a "welcome" and mailing-error-finder.  Sorry about the
delay between your "me-too" mailing and the actual goodstuff.

This is being issued as a mailing list because, while I have some of
the required information, there is still rather a shortage.  Research
is expected to improve the situation.

The second issue of this will be coming out almost immediately, and
will contain the bulk of the preliminary information which I have.
It will also include a summary which has been tempered by experience
on this system (type ~uucp_adm/uucico on my terminal, watch the fun
begin).

My address is:
	uucp:	...{decvax|akgua}!ucf-cs!ki4pv!tanner
	csnet:	ki4pv!tanner@ucf-cs.csnet
	arpa:	ki4pv!tanner%ucf-cs.csnet@csnet-relay.arpa

						Tanner Andrews, systems
						CompuData South,
						P.O.Box 3636,
						DeLand, FLA   32723.

>From: ihnp4!akgua!ucf-cs!ki4pv!tanner
>To: ucf-cs!ki4pv-uucpinfo2, ucf-cs!ki4pv-uucpinfo1
>Subject: UUCP Information Issue #02
>Date: Wed Dec 11 23:39:26 1985

This is the second issue; the information below is the start of
what has been collected here.  It is expected that more information
will be collected in the next few weeks, and that information will
be forwarded when/if it becomes available.

 =====================================================
 -- part 1
 =====================================================
This information came via several people, most of whom snet this
exact message (probably from their news archives from before we
joined the net):

	I am posting this over the network because I believe that
	others are interested in knowing the protocols of UUCP.
	Below is listed all the information that I have acquired
	to date. This includes the initial handshaking phase, though
	not the login phase. It also doesn't include information
	about the data transfer protocol for non-packet networks
	(the -G option left off the uucico command line). But, just
	hold on - I am working on that stuff.

	For a point of information : the slave is the UUCP site being
	dialed, and the master is the one doing the calling up. The
	protocols listed in the handshaking and termination phase are
	independent of any UUCP site : it is universal. The stuff in
	the work phase depends on the specific protocol chosen. The
	concepts in the work phase are independent of protocol, ie. the
	sequences are the same. It is just the lower level stuff that
	changes from protocol to protocol. I have access only to level
	g and will document it as I begin to understand it.

	Most of the stuff you see here is gotten from the debug phase
	of the current BSD UUCP system.

	I hope this is useful. Maybe this will get some of the real
	'brains' in UUCP to get off their duffs and provide some real
	detail. In any case, if you have any questions please feel
	free to contact me. I will post any questions and answers over
	the network.


				Chuck Wegrzyn
				{allegra,decvax,ihnp4}!encore!wegrzyn

				(617) 237-1022



			UUCP Handshake Phase
			====================

Master							Slave
------							-----

					<-----		\020Shere\0     (1)


(2)  \020S<mastername> <switches>\0	----->


					<-----		\020RLCK\0      (3)
							\020RCB\0
							\020ROK\0
							\020RBADSEQ\0

					<-----		\020P<protos>\0 (4)


(5) \020U<proto>\0			----->
    \020UN\0


(6) ...


(0) This communication happens outside of the packet communication that
	is supported. If the -G flag is sent on the uucico line, all
	communications will occur without the use of the packet
	simulation software. The communication at this level is just
	the characters listed above.

(1) The slave sends the sequence indicated, while the master waits for
	the message.

(2) The slave waits for the master to send a response message. The message
	is composed of the master's name and some optional switches.
	The switch field can include the following

			-g		(set by the -G switch on the
					 master's uucico command line.
					 Indicates that communication
					 occurs over a packet switch net.)
			-xN		(set by the -x switch on the
					 master's uucico command line.
					 The number N is the debug level
					 desired.)
			-QM		(M is really a sequence number
					 for the communication.)

	Each switch is separated from the others by a 'blank' character.

(3) The slave will send one of the many responses. The meanings appear to
	be :

	RLCK

		This message implies that a 'lock' failure occurred:
		a file called LCK..mastername couldn't be created since
		one already exists. This seems to imply that the master
		is already in communication with the slave.

	RCB

		This message will be sent out if the slave requires a
		call back to the master - the slave will not accept a
		call from the master but will call the master instead.

	ROK

		This message will be returned if the sequence number that
		was sent in the message, attached to the -Q switch, from 
		the master is the same as that computed on the slave.

	RBADSEQ

		Happens if the sequence numbers do not match.

	(Notes on the sequence number - if a machine does not keep
	 sequence numbers, the value is set to 0. If no -Q switch
	 is given in the master's line, the sequence number is
	 defaulted to 0.

	 The sequence file, SQFILE, has the format

		<remotename> <number> <month>/<day>-<hour>:<min>

	 where <remotename> is the name of a master and <number>
	 is the previous sequence number. If the <number> field
	 is not present, or if it is greater than 9998, it is
	 set to 0. The <number> field is an ascii representation
	 of the number. The stuff after the <number> is the time
	 the sequence number was last changed, this information
	 doesn't seem important.)

(4) The slave sends a message that identifies all the protocols that
	it supports. It seems that BSD supports 'g' as the normal case.
	Some sites, such as Allegra, support 'e' and 'g', and a few
	sites support 'f' as well. I have no information about these
	protocols. The exact message sent might look like

		\020Pefg\0

	where efg indicates that this slave supports the e,f and g 
	protocols.

(5) The slave waits for a response from the master with the chosen
	protocol. If the master has a protocol that is in common the
	master will send the message

		\020U<proto>\0

	where <proto> is the protocol (letter) chosen. If no protocol
	is in common, the master will send the message

		\020UN\0

(6) At this point both the slave and master agree to use the designated
	protocol. The first thing that now happens is that the master
	checks for work.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

			UUCP Work Phase
			===============


Master							Slave
------							-----

(a) Master has UUCP Work

	(1) X file1 file2 	----->

					<-----		XN		(2)
							XY

	When the master wants the slave to do a 'uux' command
	it sends the X message. If the slave can't or won't
	do it, the slave will send an XN message. Otherwise
	it will send an XY message.

(b) Master wants to send a file

	(1) S file1 file2 user options  ----->

					<-----		SN2		(2)
							SN4
							SY

			<---- <data exchanged>---->			(3)


					<-----		CY		(4)
							CN5

	If the master wishes to send a file to the slave, it will
	send a S message to the slave. If the slave can or will do
	the transfer, it sends a SY message. If the slave has a
	problem creating work files, it sends a SN4 message. If
	the target file can't be created (because of priv's etc)
	it sends a SN2 message.

	The file1 argument is the source file, and file2 is the
	(almost) target filename. If file2 is a directory, then
	the target filename is composed of file2 concatenated
	with the "last" part of the file1 argument. Note, if the
	file2 argument begins with X, the request is targeted to
	UUX and not the normal send.

	The user argument indicates who, if anyone, is to be notified
	if the file has been copied. This user must be on the slave
	system.

	I am not sure what the options argument does.

	After the data has been exchanged the slave will send one of
	two messages to the master. A CY message indicates that every-
	thing is ok. The message CN5 indicates that the slave had
	some problem moving the file to it's permanent location. This
	is not the same as a problem during the exchange of data : this
	causes the slave to terminate operation.

(c) Master wishes to receive a file.

	(1) R file1 file2 user	----->

						<-----	RN2		(2)
							RY mode

	(3)		<---- <data exchanged> ---->

	(4)	CY		----->
		CN5

	If the master wishes the slave to send a file, the master sends
	a R message. If the slave has the file and can send it, the
	slave will respond with the RY message. If the slave can't find
	the file, or won't send it the RN2 message is sent. It doesn't
	appear that the 'mode' field of the RY message is used.

	The argument file1 is the file to transfer, unless it is a
	directory. In this case the file to be transferred is built
	of a concatenation of file1 with the "last" part of the file2
	argument.

	If anything goes wrong with the data transfer, it results in
	both the slave and the master terminating.

	After the data has been transferred, the master will send an
	acknowledgement to the slave. If the transfer and copy to the
	destination file has been successful, the master will send the
	CY message. Otherwise it will send the CN5 message.

(d) Master has no work, or no more work.

	(1) H			----->

				<-----				HY	(2)
								HN

	(3) HY			----->

				<----				HY	(4)

	(5) ...

	The transfer of control is initiated with the master sending
	a H message. This message tells the slave that the master has
	no work, and the slave should look for work.

	If the slave has no work it will respond with the HY message.
	This will tell the master to send an HY message, and turn off
	the selected protocol. When the HY message is received by the
	slave, it turns off the selected protocol as well. Both the
	master and slave enter the UUCP termination phase.

	If the slave does have work, it sends the HN message to the
	master. At this point, the slave becomes the master. After
	the master receives the HN message, it becomes the slave.
	The whole sequence of sending work starts over again. Note,
	the transmission of HN doesn't force the master to send any
	other H messages : it waits for stuff  from the new master.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


			UUCP Termination Sequence
			=========================

 Master								Slave
 ------								-----

 (1) \020OOOOOO\0		----->

				<-----			\020OOOOOOO\0 (2)



	At this point all conversation has completed normally.


+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

			UUCP Data Transfers
			===================

	After the initial handshake the systems send messages in one
	of two styles : packet and not packet. A Packet protocol is
	just raw data transfers : there is no protocol or acknowledgements;
	this appears to assume that the lower level is a packet network
	of some type. If the style is not Packet, then extra work is
	done. I am still working on this stuff.

 =====================================================
 -- part 2
 =====================================================
 ** summary of UUCP packets ** 
(this is much like part 1, but shortened and compared against a
live UUCP ~uucp_adm/uucico)

note that all transmissions end with a null, not shown here


(master)		(slave)

 ... dials up ...	<DLE>Shere		says "hello"

<DLE>S<sysname> <opts>				says who he is

		|	<DLE>ROK		says ok to talk
		|	<DLE>RLCK		says locked out
		|	<DLE>RCB		says will call back
		|	<DLE>RBADSEQ		says bad seq num

			<DLE>P<protos>		what protocols he has

<DLE>U<proto>	|				which to use
<DLE>UN		|				use none, hang up


packet driver is turned on at this time, if not told otherwise

 -- if master has work --

to sned file to slave...
S <mfilenm> <sfilenm> <user> <opts>		request to sned file

		|	SY			ok -- i'll take it
		|	SN2			not permitted
		|	SN4			can't make workfile

<data>						the file is transmitted

		|	CY			finished OK
		|	CN5			can't move into place


to recv file from slave...
R <sfilenm> <mfilenm> <user>			request to recv file

		|	RY<mode>		ok -- here is prot mode
		|	RN2			not permitted

			<data>			file is transmitted

CY		|				worked
CN5		|				can't move into place


to do UUX on slave...
X <file1> <file2>				request to exec file

		|	XY			ok -- will do
		|	XN			nopers

to indicate that he has no more work...
H						no more work

		|	HN			reverse roles
		|	HY			no work here either

to accept slave's claim of no more work...

HY						agrees to hang up

the rest of the hang-up is done OUTSIDE of packet driver
<DLE>OOOOOO					signs off (6*'O')

			<DLE>OOOOOOO		signs off (7*'O')
	

If the slave has work, then the roles are reversed, and the
session proceeds from the label 'loop1' above.  The system
which was the slave is now the master, and the old master is
just the slave.

The <opts> which follow the system name for the start-up sequence
include:
	-g		don't use packet driver (command line -G)
	-xN		debug level (command line -Xn)
	-QN		seq number (if systems use this)

The filenames for <mfilenm> should be complete filenames with
path information; otherwise they are assumed to be in /usr/spool/uucp.
The filenames for <sfilenm> should be either complete filenames
or directory names.  If directory names are used, then the final
componant of <mfilenm> is appended to form the complete filename.

The 'X' command to do UUX on a slave is more than a little unclear.
It doesn't seem to work here, but that may be a microsoft "feature".


Protocol "g", which seems to be the one most commonly used, is supposed
to be a slightly munged version of level 2 of X.25; an article was just
posted in net.unix-wizards (which you probably have already seen) to
this effect.  The article didn't provide any details on the protocol,
but merely mentioned the modifications.

The "packet" mode, with no protocol, does not work under microsoft
implementations, and may have *lots* of trouble working anywhere
else as well.  It evidently requires that zero-length reads happen
every so often to delimit things, such as files being transferred.
This of course can't happen without the packet driver, which was long
gone by the time sys-3 or sys-5 or <your current version> came along.

**********************************
** end of issue #2
**********************************


>From: ihnp4!akgua!ucf-cs!ki4pv!tanner
>To: ucf-cs!ki4pv-uucpinfo, allegra!mp
>Subject: UUCP INFO mailing list issue #03
>Date: Sun Jan 12 19:11:18 1986

The following information, describing the uucp 'g' protocol, was
provided as "nroff" source.  Cut the header and this text off of
the message, and run it through "nroff".
.ce
.B
Packet Driver Protocol
.R
.sp 1
.ce
G. L. Chesson
.br
.ce
Bell Laboratories
.SH
Abstract
.in +.5i
.PP
These notes describe the packet driver link
protocol that was supplied
with the
Seventh Edition of
.UX
and is used by the UUCP program.
.in -.5i
.SH
General
.PP
Information flow between a pair of machines
may be regulated by
first
representing the data 
as sequence-numbered 
.I
packets
.R
of data 
and then establishing conventions that
govern the use of sequence numbers.
The
.I
PK,
.R
or
.I
packet driver,
.R
protocol
is a particular instance of this type of
flow-control discipline.
The technique depends on the notion of a transmission
.I
window
.R
to determine upper and lower bounds for valid
sequence numbers.
The transmitter is allowed to retransmit packets
having sequence numbers
within the window until the receiver indicates that
packets have been correctly received.
Positive acknowledgement from the receiver moves the
window;
negative acknowledgement or no acknowledgement
causes retransmission.
The receiver must ignore duplicate transmission, detect
the various errors that may occur,
and inform the transmitter when packets are 
correctly or incorrectly received.
.PP
The following paragraphs describe the packet formats,
message exchanges,
and framing
used by the protocol as coded
in the UUCP program and the
.UX
kernel.
Although no attempt will be made here to present
internal details of the algorithms that were used,
the checksum routine is supplied
for the benefit of other implementors.
.SH
Packet Formats
.PP
The protocol is defined in terms of message
transmissions of 8-bit bytes.
Each message includes one
.I
control
.R
byte plus a
.I
data segment
.R
of zero or more information bytes.
The allowed data segment sizes range
between 32 and 4096 as determined by the formula
32(2\uk\d) where
k is a 3-bit number.
The packet sequence numbers are likewise constrained
to 3-bits; i.e. counting proceeds modulo-8.
.PP
The control byte is partitioned into three fields as
depicted below.
.bp
.nf
.sp 
.in 1i
.ls 1
bit	7	6	5	4	3	2	1	0
	t	t	x	x	x	y	y	y
.ls 1
.in -1i
.fi
.sp
The
.I
t
.R
bits indicate a packet type and
determine the interpretation to be placed on
the
.I
xxx
.R
and
.I
yyy
.R
fields.
The various interpretations are as follows:
.in +1i
.sp
.nf
.ls 1
.I
tt	interpretation
.sp
.R
00	control packet
10	data packet
11	`short' data packet
01	alternate channel
.ls 1
.fi
.sp
.in -1i
A data segment accompanies all non-control packets.
Each transmitter is constrained to observe the maximum
data segment size
established during initial synchronization by the
receiver that it sends to.
Type 10 packets have maximal size data segments.
Type 11, or `short', packets have zero or more data
bytes but less than the maximum.
The first one or two bytes of the data segment of a
short packet are `count' bytes that
indicate the difference between the
maximum size and the number of bytes in the short
segment.
If the difference is less than 127, one count
byte is used.
If the difference exceeds 127,
then the low-order seven bits of the difference
are put in the first data byte and the high-order
bit is set as an indicator that the remaining
bits of the difference are in the second byte.
Type 01 packets are never used by UUCP
and need not be discussed in detail here.
.PP
The sequence number of a non-control packet is
given by the
.I
xxx
.R
field.
Control packets are not sequenced.
The newest sequence number,
excluding duplicate transmissions,
accepted by a receiver is placed in the
.I
yyy
.R
field of non-control packets sent to the
`other' receiver.
.PP
There are no data bytes associated with a control packet,
the
.I
xxx
.R
field is interpreted as a control message,
and the
.I
yyy
.R
field is a value accompanying the control message.
The control messages are listed below in decreasing priority.
That is, if several control messages are to be sent,
the lower-numbered ones are sent first.
.in +1i
.nf
.ls 1
.sp
.I
xxx	name		yyy
.R

1	CLOSE	n/a
2	RJ		last correctly received sequence number
3	SRJ		sequence number to retransmit
4	RR		last correctly received sequence number
5	INITC	window size
6	INITB	data segment size
7	INITA	window size
.in -i
.ls 1
.fi
.sp
.PP
The CLOSE message indicates that the communications channel
is to be shut down.
The RJ, or
.I
reject,
.R
message indicates that the receiver has detected an error
and the sender should retransmit after using the 
.I
yyy
.R
field to update the window.
This mode of retransmission is usually
referred to as a
`go-back-N' procedure.
The SRJ, or
.I
selective reject,
.R
message carries with it the sequence number of
a particular packet to be retransmitted.
The RR, or
.I
receiver ready,
.R
message indicates that the receiver has detected
no errors; the
.I
yyy
.R
field updates the sender's window.
The INITA/B/C messages are used
to set window and data segment sizes.
Segment sizes are calculated by the formula
32(2\uyyy\d)
as mentioned above,
and window sizes may range between 1 and 7.
.PP
Measurements of the protocol running on communication
links at rates up to 9600 baud showed that
a window size of 2 is optimal
given a packet size greater than 32 bytes.
This means that the link bandwidth can be fully utilized
by the software.
For this reason the SRJ message is not as important as it
might otherwise be.
Therefore the
.UX
implementations no longer generate or respond to SRJ
messages.
It is mentioned here for historical accuracy only,
and one may assume that SRJ is no longer part of the protocol.
.SH
Message Exchanges
.SH
	Initialization
.PP
Messages are exchanged between four cooperating
entities: two senders and two receivers.
This means that the communication channel is thought of
as two independent half-duplex data paths.
For example the window and segment sizes need not
be the same in each direction.
.PP
Initial synchronization is accomplished
with two 3-way handshakes: two each of
INITA/INITB/INITC.
Each sender transmits INITA messages repeatedly.
When an INITA message is received, INITB is
sent in return.
When an INITB message is received
.I
and
.R
an INITB message has been sent,
an INITC message is sent.
The INITA and INITB messages carry 
with them the packet and window size that
each receiver wants to use,
and the senders are supposed to comply.
When a receiver has seen all three
INIT messages, the channel is 
considered to be open.
.PP
It is possible to design a protocol that starts up using
fewer messages than the interlocked handshakes described above.
The advantage of the more complicated design lies in its use as
a research vehicle:
the initial handshake sequence is completely symmetric,
a handshake
can be initiated by one side of the link while the
connection is in use, and the software to do this can
utilize code that would ordinarily be used only once
at connection setup time.
These properties were used in experiments with dynamically
adjusted parameters.
That is attempts were made to adapt the window and segment
sizes to changes observed in traffic while a link was in use.
Other experiments used the initial
handshake  in a different way
for restarting the protocol without data loss
after machine crashes.
These experiments never worked well in the packet driver and
basically provided the impetus for other protocol designs.
The result 
as far as UUCP is concerned is that initial synchronization
uses the two 3-way handshakes, and the INIT
messages are ignored elsewhere.
.SH
	Data Transport
.PP
After initial synchronization each receiver
sets a modulo-8 incrementing counter R to 0;
each sender sets a similar counter S to 1.
The value of R is always the number of the most recent
correctly received packet.
The value of S is always the first sequence number in
the output window.
Let W denote window size.
Note that the value of W may be different for each sender.
.PP
A sender may transmit packets with sequence numbers
in the range S to (S+W-1)\ mod-8.
At any particular time a receiver expects
arriving packets to have numbers in the range
(R+1)\ mod-8 to (R+W)\ mod-8.
Packets must arrive in sequence number order
are are only acknowledged in order.
That is,
the `next' packet a receiver
will acknowledge must have
sequence number (R+1)\ mod-8.
.PP
A receiver acknowledges receipt of data packets
by arranging for the value of its R counter to be
sent across the channel
where it will be used to update an S counter.
This is done in two ways.
If data is flowing in both directions across a
channel then each receiver's current R value is
carried in the
.I
yyy
.R
field of non-control packets.
Otherwise when there is no bidirectional
data flow,
each receiver's R value is transmitted across the link
as the
.I
yyy
.R
field of an RR control packet.
.PP
Error handling is up to the discretion
of the receiver.
It can ignore all errors in which case
transmitter timeouts must provide for
retransmission.
The receiver may also generate RJ 
error control packets.
The
.I
yyy
.R
field of an incoming RJ message replaces
the S value of the local sender and
constitutes a request for retransmission to start
at that sequence number.
The
.I
yyy
.R
field of an incoming SRJ message selects a particular
packet for retransmission.
.PP
The resemblance between the flow control procedure in the
packet driver and that defined for X.25 is no accident.
The packet driver protocol began life as an attempt at
cleaning up X.25.
That is why, for example,
control information is uniform in length (one byte),
there is no RNR message (not needed),
and there is but one timeout defined
in the sender.
.SH
	Termination
.PP
The CLOSE message is used to terminate communications.
Software on either or both ends of the communication
channel may initiate termination.
In any case when one end wants to terminate it sends
CLOSE messages until one is received from the other end
or until a programmable limit on the number of CLOSE
messages is reached.
Receipt of a CLOSE message causes a CLOSE message to be sent.
In the 
.UX
environment
it also causes the SIGPIPE or
`broken pipe' signal to be sent to
the local process using the communication channel.
.SH
	Framing
.PP
The term
.I
framing
.R
is used to denote the technique by which the
beginning and end of a message is detected
in a byte stream;
.I
error control
.R
denotes the method by which transmission
errors are detected.
Strategies for framing and error control depend
upon
additional information being transmitted along
with the control byte and data segment,
and the choice of a particular strategy usually
depends on characteristics of input/output
devices and transmission media.
.PP
Several framing techniques are in used in support
of PK protocol implementations,
not all of which can be described in detail here.
The technique used on asynchronous serial lines
will be described.
.PP
A six byte
framing
.I
envelope
.R
is constructed using the control byte
C of a packet and five other bytes as
depicted below.
.in +1i
<DLE><k><c0><c1><C><x>
.in -1i
The <DLE> symbol denotes the ASCII ctrl/P character.
If the envelope is to be followed by a data segment,
<k> has the value
log\d2\u(size)-4;
i.e. 1 \(<= k \(<= 8.
If k is 9, then the envelope represents a control packet.
The <c0> and <c1> bytes are the low-order and high-order
bytes respectively of a 16-bit checksum of the data segment,
if there is one.
For control packets <c1> is zero and <c0> is the same
as the control byte C.
The <x> byte is the exclusive-or of <k><c0><c1><C>.
Error control is accomplished by checking 
a received framing envelope for compliance with the definition,
and comparing a checksum function of the data segment
with <c0><c1>.
.PP
This particular framing strategy assumes data segments
are constant-sized:
the `unused' bytes in a short packet are actually
transmitted.
This creates a certain amount of overhead which
can be eliminated by a more complicated framing technique.
The advantage of this strategy is that i/o
devices can be programmed to take advantage of the
constant-sized framing envelopes and data segments.
.bp
.PP
The checksum calculation is displayed below as a C function.
Note that the code is not truly portable because
the definitions of
.I short
and
.I char
are not necessarily uniform across all machines
that might support this language.
This code assumes that
.I short
and
.I char
are 16 and 8-bits respectively.
.PP
.in +.5i
.nf
.ft CW
.ls 1
/* [Original document's version corrected to actual version] */
chksum(s,n)
register char *s;
register n;
{
	register short sum;
	register unsigned short t;
	register short x;

	sum = -1;
	x = 0;

	do {
		if (sum<0) {
			sum <<= 1;
			sum++;
		} else
			sum <<= 1;
		t = sum;
		sum += (unsigned)*s++ & 0377;
		x += sum^n;
		if ((unsigned short)sum <= t) {
			sum ^= x;
		}
	} while (--n > 0);

	return(sum);
}
.fi
.in -.5i
.ft R
-- 
John Gilmore  {sun,ptsfa,lll-crg,ihnp4}!hoptoad!gnu   gnu@ingres.berkeley.edu
Love your country but never trust its government.
		     -- from a hand-painted road sign in central Pennsylvania

---

root@wittsend.LBP.HARRIS.COM (Administrator) (11/04/88)

In article <1616@netmbx.UUCP> big.t@netmbx.UUCP (Thomas L.) writes:
>In article <1214@vsedev.VSE.COM> James Logan III writes:
>>In article <5961@killer.DALLAS.TX.US> Robert Johnson writes:
>>>[...]
>>>[Talking 'bout UUCP...]    I need to know everything about it
>>I am also interested in the information.  Please email to me also, or
>>post to the net.
> ^^^^^^^^^^^^^^^
>Better post to the net, please. I'm VERY interested, too.

     Ok maybe I'll take a shot at this.  WARNING - This is the second of two
long postings that are reposts from comp.doc.  If someone thinks it's too
much, well flame on.  I saved these when they were posted to comp.doc some
time ago.  I guess the gang here either missed them or they don't read
comp.doc.  As you can tell from the dates, some of this has been rattling
arround for quite awhile.

     Now for article two!

--------snip snip snip--------

>Article 92 of comp.doc:
>Path: galbp!dcatla!gatech!bloom-beacon!husc6!think!ames!ucsd!brian
>From: brian@ucsd.EDU (Brian Kantor)
>Newsgroups: comp.doc
>Subject: UUCP Packet Protocol
>Message-ID: <815@ucsd.EDU>
>Date: 6 May 88 11:31:00 GMT
>Distribution: na
>Organization: The Avant-Garde of the Now, Ltd.
>Lines: 391
>Approved: brian@cyberpunk.ucsd.edu


[This is the formatted version of the nroff -ms source posted to
comp.std.unix on May 5, 1988 by JSQ.  If you missed that, you can
get a copy from the uunet archives of that group.  -brian]





                   Packet Driver Protocol

                       G. L. Chesson
                     Bell Laboratories

Abstract

          These notes describe the packet driver link proto-
     col that was supplied with the Seventh Edition of UNIX*
     and is used by the UUCP program.

General

     Information flow between a  pair  of  machines  may  be
regulated  by  first  representing  the  data  as  sequence-
numbered packets of data and then  establishing  conventions
that  govern the use of sequence numbers.  The PK, or packet
driver, protocol is a particular instance of  this  type  of
flow-control  discipline.   The  technique  depends  on  the
notion of a transmission window to determine upper and lower
bounds  for  valid  sequence  numbers.   The  transmitter is
allowed to retransmit packets having sequence numbers within
the  window  until  the receiver indicates that packets have
been correctly received.  Positive acknowledgement from  the
receiver  moves  the  window; negative acknowledgement or no
acknowledgement causes retransmission.   The  receiver  must
ignore  duplicate  transmission,  detect  the various errors
that may occur, and inform the transmitter when packets  are
correctly or incorrectly received.

     The following paragraphs describe the  packet  formats,
message exchanges, and framing used by the protocol as coded
in the UUCP  program  and  the  UNIX  kernel.   Although  no
attempt will be made here to present internal details of the
algorithms that were used, the checksum routine is  supplied
for the benefit of other implementors.

Packet Formats

     The protocol is defined in terms of  message  transmis-
sions  of  8-bit  bytes.   Each message includes one control
byte plus a data segment of zero or more information  bytes.
The  allowed data segment sizes range between 32 and 4096 as
determined by the formula 32(2k) where k is a 3-bit  number.
The  packet  sequence numbers are likewise constrained to 3-
bits; i.e. counting proceeds modulo-8.

     The control byte is partitioned into  three  fields  as
depicted below.


_________________________
* UNIX is a trademark of Bell Laboratories.


                           - 2 -


          bit     7       6       5       4       3       2       1       0
                  t       t       x       x       x       y       y       y

The  t  bits  indicate  a  packet  type  and  determine  the
interpretation  to be placed on the xxx and yyy fields.  The
various interpretations are as follows:

          tt      interpretation

          00      control packet
          10      data packet
          11      `short' data packet
          01      alternate channel

A data segment accompanies all  non-control  packets.   Each
transmitter  is constrained to observe the maximum data seg-
ment size established during initial synchronization by  the
receiver  that  it  sends  to.  Type 10 packets have maximal
size data segments.  Type 11, or `short', packets have  zero
or more data bytes but less than the maximum.  The first one
or two bytes of the data  segment  of  a  short  packet  are
`count'  bytes that indicate the difference between the max-
imum size and the number of bytes in the short segment.   If
the difference is less than 127, one count byte is used.  If
the difference exceeds 127, then the low-order seven bits of
the  difference are put in the first data byte and the high-
order bit is set as an indicator that the remaining bits  of
the  difference are in the second byte.  Type 01 packets are
never used by UUCP and need not be discussed in detail here.

     The sequence number of a non-control packet is given by
the  xxx  field.   Control  packets  are not sequenced.  The
newest sequence number, excluding  duplicate  transmissions,
accepted  by  a  receiver is placed in the yyy field of non-
control packets sent to the `other' receiver.

     There are no  data  bytes  associated  with  a  control
packet,  the  xxx field is interpreted as a control message,
and the yyy field is a value accompanying the  control  mes-
sage.   The  control messages are listed below in decreasing
priority.  That is, if several control messages  are  to  be
sent, the lower-numbered ones are sent first.

          xxx     name            yyy

          1       CLOSE   n/a
          2       RJ              last correctly received sequence number
          3       SRJ             sequence number to retransmit
          4       RR              last correctly received sequence number
          5       INITC   window size
          6       INITB   data segment size
          7       INITA   window size


                           - 3 -


     The CLOSE message  indicates  that  the  communications
channel  is  to  be  shut  down.  The RJ, or reject, message
indicates that the receiver has detected an  error  and  the
sender should retransmit after using the yyy field to update
the window.  This mode of retransmission is usually referred
to  as  a  `go-back-N'  procedure.   The  SRJ,  or selective
reject, message carries with it the  sequence  number  of  a
particular  packet to be retransmitted.  The RR, or receiver
ready, message indicates that the receiver has  detected  no
errors;  the  yyy  field  updates  the sender's window.  The
INITA/B/C messages are used to set window and  data  segment
sizes.  Segment sizes are calculated by the formula 32(2yyy)
as mentioned above, and window sizes may range between 1 and
7.

     Measurements of the protocol running  on  communication
links  at rates up to 9600 baud showed that a window size of
2 is optimal given a packet  size  greater  than  32  bytes.
This  means that the link bandwidth can be fully utilized by
the software.  For this reason the SRJ  message  is  not  as
important  as  it  might  otherwise  be.  Therefore the UNIX
implementations no longer generate or respond  to  SRJ  mes-
sages.   It  is mentioned here for historical accuracy only,
and one may assume that SRJ is no longer part of the  proto-
col.

Message Exchanges

        Initialization

     Messages are exchanged between four  cooperating  enti-
ties:  two  senders  and two receivers.  This means that the
communication channel  is  thought  of  as  two  independent
half-duplex  data paths.  For example the window and segment
sizes need not be the same in each direction.

     Initial synchronization is accomplished with two  3-way
handshakes:  two  each  of  INITA/INITB/INITC.   Each sender
transmits INITA messages repeatedly.  When an INITA  message
is received, INITB is sent in return.  When an INITB message
is received and an INITB message has  been  sent,  an  INITC
message  is  sent.   The INITA and INITB messages carry with
them the packet and window size that each receiver wants  to
use,  and  the  senders  are  supposed  to  comply.   When a
receiver has seen all three INIT messages,  the  channel  is
considered to be open.

     It is possible to design  a  protocol  that  starts  up
using   fewer   messages  than  the  interlocked  handshakes
described above.  The  advantage  of  the  more  complicated
design  lies  in  its use as a research vehicle: the initial
handshake sequence is completely symmetric, a handshake  can
be initiated by one side of the link while the connection is
in use, and the software to do this can  utilize  code  that


                           - 4 -


would ordinarily be used only once at connection setup time.
These properties were used in experiments  with  dynamically
adjusted  parameters.   That  is attempts were made to adapt
the window and segment sizes to changes observed in  traffic
while a link was in use.  Other experiments used the initial
handshake  in a different way for  restarting  the  protocol
without  data loss after machine crashes.  These experiments
never worked well in the packet driver  and  basically  pro-
vided the impetus for other protocol designs.  The result as
far as UUCP is concerned  is  that  initial  synchronization
uses  the  two  3-way  handshakes, and the INIT messages are
ignored elsewhere.

        Data Transport

     After initial  synchronization  each  receiver  sets  a
modulo-8  incrementing  counter  R  to 0; each sender sets a
similar counter S to 1.  The value of R is always the number
of  the most recent correctly received packet.  The value of
S is always the first sequence number in the output  window.
Let  W  denote window size.  Note that the value of W may be
different for each sender.

     A sender may transmit packets with sequence numbers  in
the  range  S  to  (S+W-1) mod-8.   At any particular time a
receiver expects arriving packets to  have  numbers  in  the
range  (R+1) mod-8  to  (R+W) mod-8.  Packets must arrive in
sequence number order are are only  acknowledged  in  order.
That  is, the `next' packet a receiver will acknowledge must
have sequence number (R+1) mod-8.

     A receiver acknowledges  receipt  of  data  packets  by
arranging  for  the value of its R counter to be sent across
the channel where it will be used to update  an  S  counter.
This is done in two ways.  If data is flowing in both direc-
tions across a channel then each receiver's current R  value
is  carried in the yyy field of non-control packets.  Other-
wise  when  there  is  no  bidirectional  data  flow,   each
receiver's R value is transmitted across the link as the yyy
field of an RR control packet.

     Error handling is up to the discretion of the receiver.
It  can ignore all errors in which case transmitter timeouts
must provide for retransmission.  The receiver may also gen-
erate  RJ error control packets.  The yyy field of an incom-
ing RJ message replaces the S value of the local sender  and
constitutes  a  request  for retransmission to start at that
sequence number.  The yyy field of an incoming  SRJ  message
selects a particular packet for retransmission.

     The resemblance between the flow control  procedure  in
the  packet driver and that defined for X.25 is no accident.
The packet driver protocol  began  life  as  an  attempt  at
cleaning  up  X.25.   That  is  why,  for  example,  control


                           - 5 -


information is uniform in length (one byte), there is no RNR
message  (not  needed), and there is but one timeout defined
in the sender.

        Termination

     The CLOSE message is used to terminate  communications.
Software on either or both ends of the communication channel
may initiate termination.  In any case when one end wants to
terminate it sends CLOSE messages until one is received from
the other end or until a programmable limit on the number of
CLOSE  messages  is  reached.   Receipt  of  a CLOSE message
causes a CLOSE message to be sent.  In the UNIX  environment
it  also  causes  the  SIGPIPE or `broken pipe' signal to be
sent to the local process using the communication channel.

        Framing

     The term framing is used to  denote  the  technique  by
which  the  beginning  and end of a message is detected in a
byte stream; error  control  denotes  the  method  by  which
transmission  errors  are  detected.  Strategies for framing
and error control depend upon additional  information  being
transmitted  along  with  the control byte and data segment,
and the choice of a particular strategy usually  depends  on
characteristics  of  input/output  devices  and transmission
media.

     Several framing techniques are in used in support of PK
protocol  implementations, not all of which can be described
in detail here.  The technique used on  asynchronous  serial
lines will be described.

     A six byte framing envelope is  constructed  using  the
control  byte C of a packet and five other bytes as depicted
below.
          <DLE><k><c0><c1><C><x>
The <DLE> symbol denotes the ASCII ctrl/P character.  If the
envelope  is  to  be followed by a data segment, <k> has the
value log2(size)-4; i.e. 1 <= k <= 8.  If k  is  9,  then  the
envelope  represents  a  control  packet.  The <c0> and <c1>
bytes are the low-order and high-order bytes respectively of
0xAAAA  minus  a 16-bit checksum.  For control packets, this
16-bit checksum is the same as the control byte C.  For data
packets,  the  checksum  is calculated by the program below.
The <x> byte is the exclusive-or of  <k><c0><c1><C>.   Error
control  is  accomplished  by  checking  a  received framing
envelope for compliance with the definition, and comparing a
checksum function of the data segment with <c0><c1>.

     This particular framing strategy assumes data  segments
are constant-sized: the `unused' bytes in a short packet are
actually transmitted.  This  creates  a  certain  amount  of
overhead  which  can  be  eliminated  by  a more complicated
framing technique.  The advantage of this strategy  is  that
i/o  devices  can  be  programmed  to  take advantage of the
constant-sized framing envelopes and data segments.


                           - 6 -
     The checksum calculation is  displayed  below  as  a  C
function.   Note that the code is not truly portable because
the definitions of short and char are not  necessarily  uni-
form  across  all machines that might support this language.
This code assumes that short and  char  are  16  and  8-bits
respectively.

     /* [Original document's version corrected to actual version] */
     chksum(s,n)
     register char *s;
     register n;
     {
             register short sum;
             register unsigned short t;
             register short x;

             sum = -1;
             x = 0;

             do {
                     if (sum<0) {
                             sum <<= 1;
                             sum++;
                     } else
                             sum <<= 1;
                     t = sum;
                     sum += (unsigned)*s++ & 0377;
                     x += sum^n;
                     if ((unsigned short)sum <= t) {
                             sum ^= x;
                     }
             } while (--n > 0);

             return(sum);
     }

The checksum routine used in  gnuucp  has  been  updated  to
avoid  depending  on  the particular sizes of char and short
variables.  As long as a char holds 8 bits or  more,  and  a
short  holds  16  bits or more, the code will work.  To test
it, uncomment the ``#define short long'' below.  A good com-
piler  produces the same code from this function as from the
less portable version.

     #define HIGHBIT16       0x8000
     #define JUST16BITS      0xFFFF
     #define JUST8BITS       0x00FF
     #define MAGIC           0125252         /* checksum is subtracted from this */

     int
     pktchksum(msg, bytes)
             unsigned char *msg;
             int bytes;
     {
             return (JUST16BITS &
                     (MAGIC - (chksum(&msg[6], bytes) ^ (JUST8BITS & msg[4]))));
     }


     int
     chksum(s,n)
     register unsigned char *s;
     register n;
     {
     /* #define short long   /* To make sure it works with shorts > 16 bits */
             register short sum;
             register unsigned short t;
             register short x;

             sum = (-1) & JUST16BITS;
             x = 0;
             do {
                     /* Rotate "sum" left by 1 bit, in a 16-bit barrel */
                     if (sum & HIGHBIT16)
                     {
                             sum = (1 + (sum << 1)) & JUST16BITS;
                     }
                     else
                             sum <<= 1;
                     t = sum;
                     sum = (sum + (*s++ & JUST8BITS)) & JUST16BITS;
                     x += sum ^ n;
                     if ((unsigned short)sum <= t)
                             sum = (sum ^ x) & JUST16BITS;
             } while (--n > 0);

             return(sum);
     #undef short            /* End of debugging check */
     }


---