barns@GATEWAY.MITRE.ORG (Bill Barns) (04/27/89)
Someone associated with NASA wrote a paper a few years ago in which he claimed that the key parameter affecting annoyance level associated with full-duplex typein echo delay was the variance and not the mean/median. (I think he was at JPL and I think his name was Callender, but I don't remember the title or where it appeared. Echo delay was not the main subject of the paper but it came up in passing.) I believe this was a qualitative/impressionistic evaluation, not a controlled experiment. His hypothesis would seem plausible given that there is already some mental compensation for neurological timing skews. I don't remember specific numbers but have this impression of having heard that the feedback loop time to the eye is on the order of a few milliseconds, whereas to the toes it is somewhere in the 50+ milliseconds area. This is a distinguishable difference in the brain - if it weren't, auditory direction discrimination wouldn't work. It sounds plausible to me that it would be easier to adapt to a high mean skew than to a high variance of skew in echoing. It might be amusing to speculate on whether the echo-variance annoyance factor is due to the presence of a variance estimator in a human user's brain processing, or to its absence. Both answers seem to have implicit epistemological ramifications in other areas: the former in social sciences, the latter in physical sciences (in which category I place protocol engineering). Bill Barns / barns@gateway.mitre.org
amanda@lts.UUCP (Amanda Walker) (05/02/89)
I'm not sure where I picked up this figure (probably psychology readings for a course or something similar), but as I remember, the basic "cycle time" of conscious processing is about 1/20th of a second, i.e. events occurring at 50ms intervals are greater can be perceived as separate events, whereas events occurring at shorter intervals are perceived as simultaneous, depending somewhat on what kinds of events are being correlated. For example, a video terminal running at 300 baud in full duplex gives most people the illusion that the letters are appearing as they type (66ms). However, motor skills (such as tracking moving objects) involve much more fine-grained timing (hmm... hardware buffers :-)?) For example, animation at 60 or 120 frames/sec will look much smoother and more "realistic" than at 30 frames/sec, even if there's no consciously perceptible flicker... Part of the point of this is that how fast the feedback needs to be depends a lot on what you're feeding back. Keystrokes can probably get by with 50-80ms. Rubber-band lines need to be 15-30ms, and so on. -- Amanda Walker InterCon Systems Corporation amanda@lts.UUCP / lts!amanda@uunet.uu.net
Mills@UDEL.EDU (05/04/89)
Amanda, In my obstreperous youth I happened to be a real live disk jockey for commercial radio working my way through school. We had an initiation rite for new guys that involved earphones, a tape machine and a live news broadcast. The victim, wearing cans and listening to himself on a live broadcast, was switched without warning to a tape playback delayed maybe 250 milliseconds, which of course instantly discombobulated him. It happened to me, of course, with my reaction ripping the cans off my head after stumblebumming the six o'clock news and with the control-room guys laughing their heads off. My conclusion is that about 250 ms is just about the resonance point of the the human feedback control system. Dave
ron@ron.rutgers.edu (Ron Natalie) (05/05/89)
Van Jacombson claims 100-200 ms and cites Ben Schneiderman's "Designing the User Interface" as a reference. -Ron
royc@ami.UUCP (roy crabtree) (05/07/89)
In article <01-May-89.183043@192.41.214.2>, amanda@lts.UUCP (Amanda Walker) writes: [elided] > time" of conscious processing is about 1/20th of a second, i.e. events > occurring at 50ms intervals are greater can be perceived as separate > events, whereas events occurring at shorter intervals are perceived > as simultaneous, depending somewhat on what kinds of events are being ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > correlated. For example, a video terminal running at 300 baud in > full duplex gives most people the illusion that the letters are appearing > as they type (66ms). However, motor skills (such as tracking moving The event sync rate on output must be about 2* the event input rate if more than one event is to be tracked, or if parallel events are being tracked; otherwise, a single serial event on output will perceive under _low_stress_ conditions as 'simultaneous' if it within around 1-1.5 input event intervals after the _end_ of the input event correlated with it. Several military studies correlate this; sorry, no refs. > objects) involve much more fine-grained timing (hmm... hardware buffers :-)?) Yep, may be needed for speed. Again, the reason is that the event being perceived is correlated, not against the keyboard input event, but against the eye/screen coordinative cognition that _follows_ it; since the granularity of the eye is spatially higher, it tends to be temporally higher as well (udderwise ya caint do nuthin wid it anyways so why bother seing it?) > For example, animation at 60 or 120 frames/sec will look much smoother and > more "realistic" than at 30 frames/sec, even if there's no consciously > perceptible flicker... Hurray, Disney! (Frame rate minimums of 31-36 FPS preferred) > > Part of the point of this is that how fast the feedback needs to be depends > a lot on what you're feeding back. Keystrokes can probably get by with > 50-80ms. Rubber-band lines need to be 15-30ms, and so on. This is true. However, the rates are still too slow, I would think. The rationale I have is as follows: - You can perceive whart you can do. - A musician can play (i.e., do) music at tempos of up to 240-300 beats per minute: 5 times per second. - The beat may be subdivided 2-4 times (or more!) at those rates for individual notes: 20 times per second (from which comes the 50 millisecond figure) - Positive correlative events (things you have to perceive against prior to actions subsequent and dependent on them for correct function or response) usually should have no more that a 5-10% transit delay against the response interval, to avoid upsetting the goal oriented resonse of the operator involved. This should also apply in terms of a spatial perception sense: The position of hte item being viewed or tracked or clicked or stretched should be no more 5-10% of the _immediate_significant_visual_field_ off in terms of timing. So, for character IO, since you do not correlate but every so often (you don't read every character!) 50 msec is probably OK (but better 25!) But for rubber band lines, 10-25 msec under rapid mouse drag motions. And for supercritical events, such as a popdown or "gotcha!" notification for mouse clicks or state transition, probably 3-8 msec is what is needed for "smooth" perception. If anybody doubts this, try using any old mouse driven terminal with a polled mouse at 60 Hertz clock rate; it is easily possible to click and release the mouse in 1/60th of a second: if you can, then 1/120 of a second is the basic event rate, and 1/2 that (4 msec) is the minimum to achieve _perceptibly_continuous_motion_. > > -- > Amanda Walker > InterCon Systems Corporation > amanda@lts.UUCP / lts!amanda@uunet.uu.net roy a. crabtree uunet!ami!royc 201-566-8584
karn@ka9q.bellcore.com (Phil Karn) (05/07/89)
Dave, Just to illustrate the amazing adaptability of the human brain, users of the AMSAT Oscar-13 satellite routinely monitor their own signals coming back from the satellite as they speak. The round trip delay is just about 1/4 second, the same as your tape-delay trick, since at apogee the satellite is at about geostationary altitude. It takes some getting used to (the first night AO-10's transponder switched on was VERY amusing, to say the least) but it's surprising how quickly you adapt. Just as I've gotten used to the echo delay over my SLIP line. Not that I *prefer* it that way, of course... Phil
bob@tinman.cis.ohio-state.edu (Bob Sutterfield) (05/08/89)
Though delayed sidetone (feedback of one's own speech through headphones) is tough on an individual, delayed full-duplex vocal transmission causes all sorts of new social conventions to arise. I've held conversations over circuits that the telephone company generously routed via satellite, even though the call only went between Columbus and Boston. None of the normal socially-induced timings that we Americans use to decide when the other person is done speaking seemed to work, because we'd both jump into the silence at the same time that we heard the other person start speaking. This caused lots of awkward backoffs and retries until we figured out what was happening. At first, I just figured it was normal east-cost asocial rudeness :-) The line quality was good enough that we couldn't tell from the transmission carrier itself whether the other person had "lifted his thumb from the mike". So we both, fairly naturally it turned out, fell back into the old half-duplex radio conventions of terminating each thought-chunk with "Over". This got us through several calls in the course of a week and finally that vendor's hardware and my backplane were happy with each other. The other people in the office got quite a kick out of overhearing my side of the conversation. Perhaps this means that interactive clients and servers will need better characters-per-packet batching algorithms for higher-delay networks, with local half-duplex feedback and appropriate "I'm done now" signals on both ends...
Mills@UDEL.EDU (05/12/89)
Phil, For my second trick, I was expected to read a live, one-minute commercial spot during a one-minute network spot and make the ends come out so you didn't know the network spot was there. The Mutual Broadcasting System may not be thought of in the same breath as "Auntie" BBC, but you sure learned some strange skills working that network. 'Nuff said; I seem to remember from various studies of teleconferencing systems that in the order of 100 ms is the most demanding regime for echo cancellors, room accoustics and discombobulated broadcasters. Dave
Mills@UDEL.EDU (05/12/89)
Bob, On occasional travel overseas I often happen onto satellite circuits with quirky societal behavior as you describe. Like you, my correspondents have learned to live with half-duplex radiospeak with very happy results. I have my wife trained so well that I can phone home with a position report from a Paris streetside telephone and get all the data across before the first frank has expired. Discipline, all it takes is discipline. Dave
rpw3@amdcad.AMD.COM (Rob Warnock) (05/12/89)
And lest we forget, the main reason Ethernet-as-a-packet-voice-PBX didn't fly so well was that to get good efficiency you need to bundle up several milliseconds of speech per packet. [256 data bytes per packet means a 36ms delay *minimum*, plus elastic buffer delay.] As long as all parties to the conversation were on the same network *and* all had "4-wire" phones, it worked just fine, even with a 100ms round-trip time, since there were no echos. [Note that you did *not* have any delay in your own sidetone!] But when a call had to go "off net" to a phone "out there" somewhere, the inevitable imbalances in the 2-wire/4-wire hybrids exposed the delay to the parties on the packet-voice side [the party on the 2-wire analog side never heard the problem], and the packet-phone users started getting echo with delays right in that most irritating range! [Given recent advances in adaptive echo-cancellation processors, and also with end-to-end ISDN coming, maybe it's time to try "EtherPBX" again...] Rob Warnock Systems Architecture Consultant UUCP: {amdcad,fortune,sun}!redwood!rpw3 DDD: (415)572-2607 USPS: 627 26th Ave, San Mateo, CA 94403