[net.dcom] XON/XOFF Deadlock

dennis@rlgvax.UUCP (Dennis Bednar) (12/13/85)

multiplexor(MUX), flow control, out of band data


Introduction - Overview

I recently came across a very interesting situation which can
cause deadlocks over an asynchronous serial computer link when
both machines use the xon/xoff protocol, and the traffic is full-duplex
(both machines can send data to each other at the same time).
The deadlock causes both machines to cease transmission - permanently.
The discussion below is dependent on how the tty driver
in the kernel interfaces to the terminal multiplexor (MUX) hardware:
the assumption being that part of the XON/XOFF processing is done
in the host, and part of the XON/XOFF processing is done in the MUX.
The problem described below occurs when the MUX processes XON/XOFF's
received from the network, ie, the host interfaces to a semi-intelligent MUX.
If the MUX is dumb, that is, the host tty driver has full control
of the MUX, then the problem described below *may* not appear.


			Figure 1 below - Overview

Application	host tty driver		 tty hardware	asynch serial line
-----------	--------------------	-------------	-----------------
Host-send >---> Send Half tty driver --> MUX Send ---> to network tty device
Host-rcv  <---  Rcv  Half tty driver <-- MUX Rcv  <--- from tty



		Figure 2 (below) - Data Received Too fast from network

Host-rcv  <--- 	Rcv Half generates XOFF <-- MUX Rcv <-- data too fast
		to send out of MUX         Passes All
		(to net) when rcv queue    Data Up to HOST
		goes above high water mark,
		and generates XON to send
		out of MUX when rcv queue
		goes below low water mark




		Figure 3 (below) - Data Sent Too fast to network

Host-send >---> Send Half tty driver --> MUX Send ---> too fast to network
					 MUX Eats <--- XOFF
					XOFF and does
					NOT pass up to
					host and now
					causes output to
					network to be
					blocked
						.... later
						  <--- XON
						 ---> resume sending


Assumptions:
	- no sophisticated Data Link Control protocol is used.
	- that only the XON/XOFF protocol be used for flow-control.
	- that both machines can send an XON/XOFF (to net), and that both
	  machines can understand an XON/XOFF if received (from net).
	- both machines may decide to send data to each other
	  at the same time (ie full duplex).
	- the model is a sending process that sends data over
	  the link to a receiving process at the other end, and
	  that this is symmetric for both machines.
	- that transmission of the XON/XOFF to the network is done
	  in the *host* receive half of the tty driver, to limit
	  the rate of data received from the network (System 3
	  IXOFF ioctl()), see figure 2.
	- that the *MUX* stops/starts sending data to the network if
	  if XON/XOFF is received from the net (S3 IXON ioctl()).
	  In this case, the MUX discards the XON/XOFF, and never
	  passes the characters up to the host receive half of the
	  tty driver, see figure 3.
	- *KEY POINT to this argument:
	  that the MUX refuses to send any data character to the network,
	  when XOFF last received from the network.  The assumption is
	  that all 256 bytes are "data" to the MUX, *including* XON and XOFF.


Scenerio of Problem:
	Both machines begin sending data to each other.  Suppose that
	the application receive process on each machine cannot process
	the data received from the other machine fast enough.
	The receive process gets behind, the rcv queue goes above the
	the high water mark, which causes the network-receive
	half of the tty driver to transmit an XOFF. Now suppose
	that both machines decided the send XOFF to each other at
	the same time.  Later, the receive process on both machines
	will consume the data buffered in the network-receive half
	of the tty driver, which causes the network-receive half
	of the tty driver to send an XON to the MUX (for transmission
	to net). What if the MUX is programmed not to send the "data"
	character XON, because its "XOFF'ed on output to the network".
	Suppose both MUX'es are implemented this way, then neither
	side can transmit an XON to the other, so neither side can
	transmit any data.


Result:
	Both senders and both receiver processes on both machines
	become deadlocked.


Solution to Problem:
	There are two different solutions.  The first is to use a dumb
	MUX which can always be controlled by the host CPU.  The second
	solution is to keep the "semi-intelligent" MUX, but the host
	should have the ability to send "out-of-band" data to the MUX
	in addition to sending normal data, which can be thought of as
	"high priority" data. The "out-of-band" data is the XON and
	XOFF characters. The MUX sends the "out-of-band" (XON/XOFF)
	data to the network, even when the MUX is blocked on sending
	to the network (ie XOFF last received from net.)
-- 
Dennis Bednar	Computer Consoles Inc.	Reston VA	703-648-3300
{decvax,ihnp4,harpo,allegra}!seismo!rlgvax!dennis
dennis@rlgvax.UUCP

jon@altos86.UUCP (Jonathan Stern) (12/16/85)

In article <848@rlgvax.UUCP> dennis@rlgvax.UUCP (Dennis Bednar) writes:
>multiplexor(MUX), flow control, out of band data
>
>
>Introduction - Overview
>
>I recently came across a very interesting situation which can
>cause deadlocks over an asynchronous serial computer link when
>both machines use the xon/xoff protocol, and the traffic is full-duplex
>(both machines can send data to each other at the same time).
>The deadlock causes both machines to cease transmission - permanently.

>Solution to Problem:
>	There are two different solutions.  The first is to use a dumb
>	MUX which can always be controlled by the host CPU.  The second
>	solution is to keep the "semi-intelligent" MUX, but the host
>	should have the ability to send "out-of-band" data to the MUX
>	in addition to sending normal data, which can be thought of as
>	"high priority" data. The "out-of-band" data is the XON and
>	XOFF characters. The MUX sends the "out-of-band" (XON/XOFF)
>	data to the network, even when the MUX is blocked on sending
>	to the network (ie XOFF last received from net.)

Even this solution *may* not solve the problem.  Each machine has presumably
XOFFed the other because it was not ready to recieve more data.  If one machine 
is significantly faster than the other or one machine is in a state where it
is not unblocking the input data stream it may actually lose the XON and stay
deadlocked. The solution here is to process XON/XOFF protocol within the MUX
but many machines do not do this.  What this underscores is that XON/XOFF is
not a practical flow control method for full duplex, high speed, data transfer.
I would be very interested to hear how others have attacked this problem.

-------
Jonathan Stern Altos Computer Systems -- ucbvax!dual!vecpyr!altos86!jon

henry@utzoo.UUCP (Henry Spencer) (12/18/85)

> Even this solution *may* not solve the problem.  Each machine has presumably
> XOFFed the other because it was not ready to recieve more data.  If one machine 
> is significantly faster than the other or one machine is in a state where it
> is not unblocking the input data stream it may actually lose the XON and stay
> deadlocked...

The solution to this is the "persistent" kind of xon/xoff that some devices,
notably the ones from HP, do.  If you sent an XOFF and the other end is still
sending data, send another XOFF after a little while.  (A reasonable strategy
is to send one when you're down to [say] 128 characters of buffer, another at
64, another at 32, etc.)  And if you sent an XON and the other end is silent,
send another XON after, say, 15 seconds.  And persist.  XON/XOFF is not a very
good protocol, but it's the best we have right now, and "persistent" versions
of it are much more robust than simplistic ones.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

kbb@faron.UUCP (Kenneth B. Bass) (12/18/85)

In article <176@altos86.UUCP> jon@gateway.UUCP (Jonathan Stern) writes:
>In article <848@rlgvax.UUCP> dennis@rlgvax.UUCP (Dennis Bednar) writes:
>>multiplexor(MUX), flow control, out of band data
>>
>>I recently came across a very interesting situation which can
>>cause deadlocks over an asynchronous serial computer link when
>>both machines use the xon/xoff protocol, and the traffic is full-duplex
>>(both machines can send data to each other at the same time).
>>The deadlock causes both machines to cease transmission - permanently.
>
>                                   What this underscores is that XON/XOFF is
>not a practical flow control method for full duplex, high speed, data transfer.
>I would be very interested to hear how others have attacked this problem.
>
>-------
>Jonathan Stern Altos Computer Systems -- ucbvax!dual!vecpyr!altos86!jon


I have to disagree.  XON/XOFF flow controlling is practical for full duplex,
high speed data transfer.  This is assuming, of course that the XON/XOFF
characters are truly "out-of-band" characters.  It seems that the problem
above occurs because this MUX is only "semi-intelligent".  That is, it
sounds like the MUX is doing some of the flow controlling, but not all of
it.

The MUX should either: 1) do full flow controlling at both sides
(to/from network, and to/from host); or 2) do no flow controlling at all.
If case 1 is chosen, then the MUX would trap and process XON/XOFF's
it receives from the network, as well as from the host; but it would
not pass these characters through.  For the other case, the MUX would
be "dumb" and just pass the XON/XOFF's it receives - either from the host,
or from the network - through to the network or host.

The only major problem I have ever encountered with any type of flow
controlling is when does the receive end decide to send the XOFF signal.
In the latter case above, where the MUX is dumb and does not do any
flow controlling, then the XOFF must travel through the MUX, 
through the network, through the MUX to the remote host.  The remote
host will be sending data throughout this time, and will not stop
until it sees the XOFF.  The problem then, is how many characters
max will the host need to be able to buffer AFTER it sends out the
XOFF.


				"Tell me why"
				ken bass
				linus!faron!kbb