[comp.protocols.tcp-ip] Error Control and ATM

tds@cbnewsh.ATT.COM (antonio.desimone) (12/12/89)

From article <6796@shlump.nac.dec.com>, by goldstein@delni.enet.dec.com:
> In article <[A.ISI.EDU].9-Dec-89.11:15:42.CERF>, CERF@A.ISI.EDU writes...
>>It is by no means clear that reordering at the cell level will be
>>detected by the ATMs or by links level algorithms sending the
>>packets assembled from cells to the hosts since the link level
>>checksums would be recomputed AFTER reassembly, most likely.

I'm not sure if I understand this point.  A host worried about data
integrety would compute a packet checksum before segmentation into
cells and after reassembly into a packet.  Shouldn't the checksum
detect reordering of cells within the packet?
> 
> Indeed, the current proposals for B-ISDN use ATM cell transfer and
> state, in its service description, that it _will_ preserve order.  Now
> doesn't that make you confident that the implementations will never,
> ever, ever, ever mis-order a cell?  The ATM cells do not contain any
> sort of sequence numbers. 

It's seemed to me that people expect both too little and too much from
ATM.  Here's an example of expecting too much.  One may conceive of a
"malfunctioning" time-slot-interchange switch that reorders bytes in a
T1 frame, even though the "service description" says it won't.  Should
we put sequence numbers on individual bytes?  Similarly sequence
numbers for individual cells are overkill since the architecture of ATM
switches preserves the order of cells.

> BTW I have proposed (and gotten rebuffed by the Powers That Be at 
> T1S1.5, namely the AT&T & Bellcore leadership) an end-to-end datalink
> protocol that has cell sequence numbers and bulk cell ack's, loosely
> based on NETBLT, with a 2^14 modulus.  You can always run it or 

A modulus of 2^14 with 48-byte cells allows roughly 6 Mbits outstanding.
A cross-country circuit at 150 Mbits/second can have 9 Mbits in flight
for a 60 msec propagation delay.  Your proposal reduces throughput at
the speeds being contemplated for the next few years, to say nothing of
higher speeds in the future.  (BTW: I had nothing to do with AT&T's
positions in the standards bodies.)

> something else you prefer end to end across the ATM and ignore their
> recommendations for adaptation, if you're not talking to a
> network-provided service (i.e., the "B-ISDN CLNS", which is an L3
> datagramme service running over the adaptation layer).  My impetus,

Doesn't B-ISDN CLNS include a checksum or CRC at the frame level?

> though, is to handle cell loss without having to retransmit the whole
> frame.  (Can you spell 'congestion collapse'?)  Misordering detection 
> comes along with that.

Think for a minute about what losses will look like in a congested ATM
network.  When buffers overflow losses will occur in clusters, just
because of the long time congested queues take to recover.  Also,
errors on transmission lines typically occur in bursts: isolated random
bit errors that we all like to use for modelling are in fact a poor
representation of reality.  All this says that the loss of isolated
cells is rare, and that cell retransmission promises little gain at a
great cost in complexity.  With an understanding of loss patterns frame
retransmission seems quite reasonable.
-- 
Tony DeSimone
AT&T Bell Laboratories
Holmdel, NJ 07733
att!tds386e!tds

CERF@A.ISI.EDU (12/12/89)

Tony,

Let's see. One might take the view that there is a
tradeoff between sequence numbering of cells and strong
checksumming to detect misordering (followed by
frame retransmission).  When cell sizes get very small
(e.g. your one byte T1 example) then sequence numbers
are silly and checksums are necessary. The current 48 byte cell
size is pretty small - perhaps small enough that sequence
numbering is too expensive. This motivates the interest in
checksumming of a stronger variety than TCP currently supports.

Vint

What I meant about link checksumming not catching the problem
is based on the idea that if cell reassembly happens in the
ATM and THEN a link level checksum is computed to "secure"
the transmission of the frame to the host, the checksum would
not detect the reassembly of misordered cells. If the checksum
is computer end to end, then it covers more of the intervening
transmission and switching plant and thus allows potential
detection of misordering (if the checksum is strong enough).

goldstein@delni.enet.dec.com (Fred R. Goldstein) (12/13/89)

In article <6528@cbnewsh.ATT.COM>, tds@cbnewsh.ATT.COM (antonio.desimone) writes...
>From article <6796@shlump.nac.dec.com>, by goldstein@delni.enet.dec.com:
>> In article <[A.ISI.EDU].9-Dec-89.11:15:42.CERF>, CERF@A.ISI.EDU writes...
>>>It is by no means clear that reordering at the cell level will be
>>>detected by the ATMs or by links level algorithms sending the
>>>packets assembled from cells to the hosts since the link level
>>>checksums would be recomputed AFTER reassembly, most likely.
> 
>I'm not sure if I understand this point.  A host worried about data
>integrety would compute a packet checksum before segmentation into
>cells and after reassembly into a packet.  Shouldn't the checksum
>detect reordering of cells within the packet?

If you're referring to the use of an end-to-end checksum at Transport
or higher (i.e., a TCP checksum that could detect misordered cells),
then that would do it.  The AT&T-Bellcore AAL purports to detect
errors, but does not detect misordering.  In a pure OSI environment,
where there's no checksum above datalink, it wouldn't be caught.

>> Indeed, the current proposals for B-ISDN use ATM cell transfer and
>> state, in its service description, that it _will_ preserve order.  Now
>> doesn't that make you confident that the implementations will never,
>> ever, ever, ever mis-order a cell?  The ATM cells do not contain any
>> sort of sequence numbers. 

>It's seemed to me that people expect both too little and too much from
>ATM.  Here's an example of expecting too much.  One may conceive of a
>"malfunctioning" time-slot-interchange switch that reorders bytes in a
>T1 frame, even though the "service description" says it won't.  Should
>we put sequence numbers on individual bytes?  Similarly sequence
>numbers for individual cells are overkill since the architecture of ATM
>switches preserves the order of cells.

You're being silly.  We know from practice that S-TDM switches do not
reorder bytes.  And if they did, your basic CRC would detect it.  (SLIP 
users deserve what they get.)  The current TCP checksum would detect 
frogged bytes, but wouldn't detect frogged 16-bit words.  We do not know
that all manufacturers of ATM switches will build switches that are
unable to reorder cells.  There are no commercial B-ISDNs yet, so it's
purely an extrapolation that the service description will be met.  This
thread began when someone pointed out that the existing TCP checksum may
be inadequate to detect misordered cells.  You simply assert that there
won't be.  The reader is left to trust AT&T or to cover his hide. 

 >> BTW I have proposed (and gotten rebuffed by the Powers That Be at 
>> T1S1.5, namely the AT&T & Bellcore leadership) an end-to-end datalink
>> protocol that has cell sequence numbers and bulk cell ack's, loosely
>> based on NETBLT, with a 2^14 modulus.  You can always run it or 
> 
>A modulus of 2^14 with 48-byte cells allows roughly 6 Mbits outstanding.
>A cross-country circuit at 150 Mbits/second can have 9 Mbits in flight
>for a 60 msec propagation delay.  Your proposal reduces throughput at
>the speeds being contemplated for the next few years, to say nothing of
>higher speeds in the future.  (BTW: I had nothing to do with AT&T's
>positions in the standards bodies.)

A window allowing 6 Mbits outstanding is adequate for the typical
FDDI bridge application.  BLINKBLT's (my proposal's) 50-100 Mbps typical 
maximum rate seems adequate for a number of applications, and I don't
claim it's a panacaea.

>> something else you prefer end to end across the ATM and ignore their
>> recommendations for adaptation, if you're not talking to a
>> network-provided service (i.e., the "B-ISDN CLNS", which is an L3
>> datagramme service running over the adaptation layer).  My impetus,
> 
>Doesn't B-ISDN CLNS include a checksum or CRC at the frame level?

NO!  It did, before YOUR COMPANY, Tony, made a big stink and holler
about how hard it would be to compute 32-bit checksums.  They insisted
that it be taken out.  Now the only CRCs are the 10-bitters in each
cell.  Compare Draft 5 of 802.6 with the current Draft 9, since this
changed in lockstep!

If you don't think AT&T Did The Right Thing, then call Harvey Rubin
(hr@edsel) and tell him.  Or do you value your job?  Recall that AT&T
delegates at ANSI meetings are company reps, while DEC delegates are
sponsored contributors.  (I forget the exact terms.)  I'm allowed to 
speak my mind at T1S1.

Incidentally, the main impetus for the change seems to be the 
"data pipelining" service, which is there (solely, I think) to allow
Datakit VCS (tm, AT&T) cells to be stuffed into ATM cells without
waiting to fill the 48-octet ATM cells.  I'm sure the Internet
community is concerned about preserving their Datakits beyond 1999!

>> though, is to handle cell loss without having to retransmit the whole
>> frame.  (Can you spell 'congestion collapse'?)  Misordering detection 
>> comes along with that.
> 
>Think for a minute about what losses will look like in a congested ATM
>network.  When buffers overflow losses will occur in clusters, just
>because of the long time congested queues take to recover.  

Yes, that's likely.  But the bursts occur as multiple virtual channels
are "funneled" into one output queue which fills.  A moderate-speed v ch
(say, under T1 rate) will lose maybe one or two cells during the burst
event, but many v chs. will have one or two cells lost.  It all depends
upon the traffic mix, and on whether everyone's traffic is carried
as close-together cell bursts (packet trains?) or smoothed flows.  The
former cause the effect you mention; the latter reduces funneling loss.

Also,
>errors on transmission lines typically occur in bursts: isolated random
>bit errors that we all like to use for modelling are in fact a poor
>representation of reality.  All this says that the loss of isolated
>cells is rare, and that cell retransmission promises little gain at a
>great cost in complexity.  With an understanding of loss patterns frame
>retransmission seems quite reasonable.

Like I say, we don't know traffic patterns, so we can't prove loss 
patterns.  Frame retransmission makes sense in some circumstances.
Cell retransmission makes sense in some.  Since BLINKBLT uses exactly 
the same number of bits/data-cell (32) as AAL-VBR, and has relatively
similar protocol overhead, there's no higher transmission cost for it.
Just a different TE implementation.  Buffer lots of little cells (a 
round-trip's worth of bits) or buffer a smaller number of frames (a 
round-trip's worth of bits).

Either form of retransmission can be done with cell resequence
detection at some layer.  Those who choose to use the proposed BISDN
protocols may wish to develop transport or other checksums that detect
it, in case the network isn't perfect.
     fred

ddeutsch@BBN.COM (Debbie Deutsch) (12/14/89)

We are planning to look at this problem here at BBN, as part of a
research project.  Our approach is to use checksums in a
(non-standard) adaptation layer to detect potential problems (e.g.
misordering of cells, missing cells, bit errors) and to signal any
problem to higher protocol(s), which can then decide whether
corrective action is necessary.  After all, the applications vary 
widely in their sensitivity to errors.

Of course, one of the central issues here is just what kind of errors
will be experienced, and what pattern they will follow.  Until we know
that, it will be hard to determine the best way to deal with errors.

Debbie

zweig@brutus.cs.uiuc.edu (Johnny Zweig) (12/14/89)

ddeutsch@BBN.COM (Debbie Deutsch) writes:

>We are planning to look at this problem here at BBN, as part of a
>research project.
    ....
>After all, the applications vary widely in their sensitivity to errors.
>
>Of course, one of the central issues here is just what kind of errors
>will be experienced, and what pattern they will follow.  Until we know
>that, it will be hard to determine the best way to deal with errors.
>
>Debbie

Which is precisely why I advocated that there be some standard mechanism
to augment the corrective/detective features of the checksum being used on
a TCP connection when I started this Fletcher-checksum string.  We can't
look into a crystal ball and see all the things that people are ever going
to want to do.  I think it would be better for the transport protocol to
roll with the punches than to have it be inflexible, forcing users to
reinvent the wheel (say by using UDP to send adequately-checksummed packets).

-Johnny Checksum