[comp.protocols.tcp-ip] human factors aspects of echo delay

barns@GATEWAY.MITRE.ORG (Bill Barns) (04/27/89)

Someone associated with NASA wrote a paper a few years ago in which he
claimed that the key parameter affecting annoyance level associated
with full-duplex typein echo delay was the variance and not the
mean/median.  (I think he was at JPL and I think his name was
Callender, but I don't remember the title or where it appeared.  Echo
delay was not the main subject of the paper but it came up in passing.)
I believe this was a qualitative/impressionistic evaluation, not a
controlled experiment.

His hypothesis would seem plausible given that there is already some
mental compensation for neurological timing skews.  I don't remember
specific numbers but have this impression of having heard that the
feedback loop time to the eye is on the order of a few milliseconds,
whereas to the toes it is somewhere in the 50+ milliseconds area.  This
is a distinguishable difference in the brain - if it weren't, auditory
direction discrimination wouldn't work.  It sounds plausible to me that
it would be easier to adapt to a high mean skew than to a high variance
of skew in echoing.  It might be amusing to speculate on whether the
echo-variance annoyance factor is due to the presence of a variance
estimator in a human user's brain processing, or to its absence.  Both
answers seem to have implicit epistemological ramifications in other
areas: the former in social sciences, the latter in physical sciences
(in which category I place protocol engineering).

Bill Barns / barns@gateway.mitre.org

amanda@lts.UUCP (Amanda Walker) (05/02/89)

I'm not sure where I picked up this figure (probably psychology readings
for a course or something similar), but as I remember, the basic "cycle
time" of conscious processing is about 1/20th of a second, i.e. events
occurring at 50ms intervals are greater can be perceived as separate
events, whereas events occurring at shorter intervals are perceived
as simultaneous, depending somewhat on what kinds of events are being
correlated.  For example, a video terminal running at 300 baud in
full duplex gives most people the illusion that the letters are appearing
as they type (66ms).  However, motor skills (such as tracking moving
objects) involve much more fine-grained timing (hmm... hardware buffers :-)?)
For example, animation at 60 or 120 frames/sec will look much smoother and
more "realistic" than at 30 frames/sec, even if there's no consciously
perceptible flicker...

Part of the point of this is that how fast the feedback needs to be depends
a lot on what you're feeding back.  Keystrokes can probably get by with
50-80ms.  Rubber-band lines need to be 15-30ms, and so on.

--
Amanda Walker
InterCon Systems Corporation
amanda@lts.UUCP / lts!amanda@uunet.uu.net

Mills@UDEL.EDU (05/04/89)

Amanda,

In my obstreperous youth I happened to be a real live disk jockey for
commercial radio working my way through school. We had an initiation
rite for new guys that involved earphones, a tape machine and a live
news broadcast. The victim, wearing cans and listening to himself
on a live broadcast, was switched without warning to a tape playback
delayed maybe 250 milliseconds, which of course instantly discombobulated
him. It happened to me, of course, with my reaction ripping the cans
off my head after stumblebumming the six o'clock news and with the
control-room guys laughing their heads off. My conclusion is that about
250 ms is just about the resonance point of the the human feedback control
system.

Dave

ron@ron.rutgers.edu (Ron Natalie) (05/05/89)

Van Jacombson claims 100-200 ms and cites Ben Schneiderman's "Designing the
User Interface" as a reference.

-Ron

royc@ami.UUCP (roy crabtree) (05/07/89)

In article <01-May-89.183043@192.41.214.2>, amanda@lts.UUCP (Amanda Walker) writes:
[elided]
> time" of conscious processing is about 1/20th of a second, i.e. events
> occurring at 50ms intervals are greater can be perceived as separate
> events, whereas events occurring at shorter intervals are perceived
> as simultaneous, depending somewhat on what kinds of events are being
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> correlated.  For example, a video terminal running at 300 baud in
> full duplex gives most people the illusion that the letters are appearing
> as they type (66ms).  However, motor skills (such as tracking moving

The event sync rate on output must be about 2* the event input rate if more
than one event is to be tracked, or if parallel events are being tracked;
otherwise, a single serial event on output will perceive under _low_stress_
conditions as 'simultaneous' if it within around 1-1.5 input event intervals
after the _end_ of the input event correlated with it.

Several military studies correlate this; sorry, no refs.

> objects) involve much more fine-grained timing (hmm... hardware buffers :-)?)

Yep, may be needed for speed.  Again, the reason is that the event being perceived
is correlated, not against the keyboard input event, but against the eye/screen
coordinative cognition that _follows_ it; since the granularity of the eye is
spatially higher, it tends to be temporally higher as well (udderwise ya caint do
nuthin wid it anyways so why bother seing it?)

> For example, animation at 60 or 120 frames/sec will look much smoother and
> more "realistic" than at 30 frames/sec, even if there's no consciously
> perceptible flicker...

Hurray, Disney!  (Frame rate minimums of 31-36 FPS preferred)

> 
> Part of the point of this is that how fast the feedback needs to be depends
> a lot on what you're feeding back.  Keystrokes can probably get by with
> 50-80ms.  Rubber-band lines need to be 15-30ms, and so on.

This is true.  However, the rates are still too slow, I would think.
The rationale I have is as follows:

	- You can perceive whart you can do.
	- A musician can play (i.e., do) music
	  at tempos of up to 240-300 beats per minute:  5 times per second.
	- The beat may be subdivided 2-4 times (or more!) at those
	  rates for individual notes: 20 times per second (from which
	  comes the 50 millisecond figure)
	- Positive correlative events (things you have to perceive against
	  prior to actions subsequent and dependent on them for correct function
	  or response) usually should have no more that a 5-10% transit
	  delay against the response interval, to avoid upsetting the
	  goal oriented resonse of the operator involved.  This should also
	  apply in terms of a spatial perception sense:  The position of hte
	  item being viewed or tracked or clicked or stretched should be no more
	  5-10% of the _immediate_significant_visual_field_ off in terms
	  of timing.

		So, for character IO, since you do not correlate
		but every so often (you don't read every character!)
		50 msec is probably OK (but better 25!)

		But for rubber band lines, 10-25 msec under rapid mouse
		drag motions.

		And for supercritical events, such as a popdown or "gotcha!"
		notification for mouse clicks or state transition, probably
		3-8 msec is what is needed for "smooth" perception.

If anybody doubts this, try using any old mouse driven terminal with a
polled mouse at 60 Hertz clock rate; it is easily possible to click and
release the mouse in 1/60th of a second:  if you can, then 1/120 of a second
is the basic event rate, and 1/2 that (4 msec) is the minimum to achieve
_perceptibly_continuous_motion_.

> 
> --
> Amanda Walker
> InterCon Systems Corporation
> amanda@lts.UUCP / lts!amanda@uunet.uu.net

roy a. crabtree uunet!ami!royc 201-566-8584

karn@ka9q.bellcore.com (Phil Karn) (05/07/89)

Dave,

Just to illustrate the amazing adaptability of the human brain, users of the
AMSAT Oscar-13 satellite routinely monitor their own signals coming back
from the satellite as they speak. The round trip delay is just about 1/4
second, the same as your tape-delay trick, since at apogee the satellite is
at about geostationary altitude.

It takes some getting used to (the first night AO-10's transponder switched
on was VERY amusing, to say the least) but it's surprising how quickly you
adapt. Just as I've gotten used to the echo delay over my SLIP line. Not
that I *prefer* it that way, of course...

Phil

bob@tinman.cis.ohio-state.edu (Bob Sutterfield) (05/08/89)

Though delayed sidetone (feedback of one's own speech through
headphones) is tough on an individual, delayed full-duplex vocal
transmission causes all sorts of new social conventions to arise.

I've held conversations over circuits that the telephone company
generously routed via satellite, even though the call only went
between Columbus and Boston.  None of the normal socially-induced
timings that we Americans use to decide when the other person is done
speaking seemed to work, because we'd both jump into the silence at
the same time that we heard the other person start speaking.  This
caused lots of awkward backoffs and retries until we figured out what
was happening.  At first, I just figured it was normal east-cost
asocial rudeness :-)

The line quality was good enough that we couldn't tell from the
transmission carrier itself whether the other person had "lifted his
thumb from the mike".  So we both, fairly naturally it turned out,
fell back into the old half-duplex radio conventions of terminating
each thought-chunk with "Over".  This got us through several calls in
the course of a week and finally that vendor's hardware and my
backplane were happy with each other.  The other people in the office
got quite a kick out of overhearing my side of the conversation.

Perhaps this means that interactive clients and servers will need
better characters-per-packet batching algorithms for higher-delay
networks, with local half-duplex feedback and appropriate "I'm done
now" signals on both ends...

Mills@UDEL.EDU (05/12/89)

Phil,

For my second trick, I was expected to read a live, one-minute commercial
spot during a one-minute network spot and make the ends come out so you
didn't know the network spot was there. The Mutual Broadcasting System
may not be thought of in the same breath as "Auntie" BBC, but you sure
learned some strange skills working that network. 'Nuff said; I seem to
remember from various studies of teleconferencing systems that in the
order of 100 ms is the most demanding regime for echo cancellors, room
accoustics and discombobulated broadcasters.

Dave

Mills@UDEL.EDU (05/12/89)

Bob,

On occasional travel overseas I often happen onto satellite circuits with
quirky societal behavior as you describe. Like you, my correspondents have
learned to live with half-duplex radiospeak with very happy results. I have
my wife trained so well that I can phone home with a position report from
a Paris streetside telephone and get all the data across before the first
frank has expired. Discipline, all it takes is discipline.

Dave

rpw3@amdcad.AMD.COM (Rob Warnock) (05/12/89)

And lest we forget, the main reason Ethernet-as-a-packet-voice-PBX didn't
fly so well was that to get good efficiency you need to bundle up several
milliseconds of speech per packet. [256 data bytes per packet means a 36ms
delay *minimum*, plus elastic buffer delay.] As long as all parties to the
conversation were on the same network *and* all had "4-wire" phones, it worked
just fine, even with a 100ms round-trip time, since there were no echos.
[Note that you did *not* have any delay in your own sidetone!]

But when a call had to go "off net" to a phone "out there" somewhere, the
inevitable imbalances in the 2-wire/4-wire hybrids exposed the delay to
the parties on the packet-voice side [the party on the 2-wire analog side
never heard the problem], and the packet-phone users started getting echo
with delays right in that most irritating range!

[Given recent advances in adaptive echo-cancellation processors, and also
with end-to-end ISDN coming, maybe it's time to try "EtherPBX" again...]


Rob Warnock
Systems Architecture Consultant

UUCP:	  {amdcad,fortune,sun}!redwood!rpw3
DDD:	  (415)572-2607
USPS:	  627 26th Ave, San Mateo, CA  94403