[net.cog-eng] observations from CHI 83

henry@utzoo.UUCP (Henry Spencer) (01/01/84)
The following is sort of a report on what I thought was interesting
at CHI 83.  "Sort of" because in many places I have omitted things
that are covered adequately in the proceedings.  Treat this as random
observations rather than a complete summary of the conference.  Note
also that this is, to some extent, my interpretations rather than an
attempt at literal and unbiased reporting.  I apologize in advance to
anyone whose views I have inadvertently misrepresented.  Also as usual,
this article represents my own views and not those of U of T.

All statements of results should be understood to have the usual
hedgings about limited applicability and the need for cautious
interpretation.



*Don Norman's keynote address*

Hitting on the right new concept can make the user interface a
secondary detail in user acceptance.  The user interface for the
original Visicalc was rotten, but the idea was such a win that it
sold regardless.  Newer spreadsheets do better on user interface.
Unix is another example.  [Norman is the author of the infamous
Datamation article criticizing Unix.]

A sure sign of poor design is Dymo label tape and Scotch-taped notices.

[Norman showed some slides of nicely-labelled controls for hall lights,
the only gotcha being that none of the labels could be read in the dark!]

We should think in terms of tradeoffs, not "solutions", and interactions,
not "errors".

Increases in information presentation have a cost:  they eat into work
area on the screen [his example was a message system whose "main menu"
covered the entire screen!], and updating the information can hurt
system responsiveness.



*Dumais, Landauer:  Using Examples to Describe Categories*

Menu systems for selection of information have various problems, in 
particular the difficulty of getting good and unambiguous category names,
problems with overlapping and fuzzy categories, and the need to classify
by only a few dimensions in a very complex space.  [They showed some
slides of videotex-system "main menus", asking "which category would
you look under for, say, today's weather?".]

They used Yellow-Pages headings in classification experiments, with
subjects asked to put a number of headings into a few general categories.
Specific findings:

	In the presence of vague categories with minimal information,
	people are overly cautious and a lot of stuff ends up classed
	under "Miscellaneous".  Deleting the "Miscellaneous" category
	greatly increases classification accuracy without increasing
	time and effort.

	3 examples from a category are actually a bit better than
	category names chosen by experts!  Systematically-chosen
	examples are better than randomly-chosen ones, but not much.
	A slightly larger number of random examples (3 was the most
	they tested) might eliminate the difference altogether.

	The best performance was with category names (which show the
	central tendency of a category) plus 3 examples (which show
	the variability of the category).

	Even the best performance is still strikingly poor:  they never
	saw classification accuracy better than about 50%.



*Kraut, Hanson, Farber: Command Use and Interface Design*

"In a rich computer system, most users don't use most commands most
of the time."  It's important to make the few most common commands
very easy to get at.

They got statistics on command use by experienced Unix users.  Their
paper has some nice stuff on command-sequence frequency and some nice
command-transition diagrams.

Fully 10% of commands are syntax errors.  This might be as high as
20-30% if you counted commands aborted during typing and semantic
errors as well.

Three classes of Unix commands were particularly error-prone:

	- commands with odd syntax, e.g. find, at
		No surprise.

	- "programming" commands, e.g. awk, cc, for, while
		These commands require extensive advance planning and
		give little intermediate feedback.  Users often do the
		same job in smaller steps for the sake of feedback and
		as-you-go error correction.

	- commands requiring memory of current status, e.g. cat, rmdir, cp
		The most frequent command both before and after "cd" was
		"ls".  Clearly most people would work more efficiently
		with a system that displayed position and status on screen
		continuously rather than requiring the user to constantly
		ask for it.

General conclusions were the importance of the "core" commands, the
importance of orienting information and feedback when moving through
complex tasks, and the importance of intelligent error treatment.  They
suggested the "generic editor" model for interaction as a useful way
of having status displayed at all times.



*De Leon, Harris, Evens:  Is There Really Trouble with UNIX?*

The first attempt to actually confirm one of Don Norman's complaints
about Unix (cryptic command names making life difficult) yielded strongly
negative results.  Subjects were high-school girls with high scores on
math and science assessment tests but zero previous computer experience.
Tasks were simple file manipulation.  One group got our old friends cat,
grep, cp, etc.  The other group got English words for the same jobs.
(The English words were generally longer, but spelling errors were not
frequent for either group, and the test criterion was error rate
rather than time.)  Both groups had a "help" command, printing a "cheat
sheet" with one-line summaries of all commands.  The tests were run on
three successive Saturdays:

	1. Both groups making heavy use of "help".  Cryptic-names group
		makes about half as many errors as English-names group!

	2. Cryptic-names group still using "help" heavily, English-names
		group much less so.  But cryptic-names group is *still*
		making half as many errors!

	3. Experimenters get nasty, take "help" command away.  Average
		error rate for cryptic-names group goes way up, but so
		does the variance, preventing statistically-significant
		conclusions.

Tentative hypothesis is that cryptic names are harder to learn and
remember, but their lack of pre-existing meanings results in much less
ambiguity and confusion.

"Was there any difference in user satisfaction?"  "We didn't learn much
about user satisfaction, because they were all so thrilled to be using
a computer for the first time..."



*Gould, Lewis:  Designing for Usability...*

Their group evolved some basic principles for the design of user
interfaces:
	- early focus on users
	- user involvement in design
	- early user testing
	- iterative design based on test results

These sound like motherhoods, but when designers are actually *asked*
about their design approach, very few ever mention the word "user" at
all!  Even under the loosest interpretation criteria, only a modest
fraction of designers mention anything on the above list.

They elaborated:
	- early focus on users	- direct contact and observation
				- try doing the job yourself?

	- user involvement	- real users ON THE DESIGN TEAM
				- not just their managers!

	- early testing	- useful experiments can precede prototypes
			- documentation, etc. must be tested too

	- iterative design	- changes fed back from tests
				- presupposes ease of change

They observed that these things do not necessarily lengthen the design
process, in fact they can shorten it.



*Bannon, Cypher, Greenspan, Monty:  ...Users' Activity Organization*

"It's the exception, not the rule, for a task to be completed without
an interruption."

They classified reasons for task switching into five categories:
	- "while I'm at it", e.g. housecleaning in an infrequently-visited
		directory
	- concurrent demands, e.g. urgent mail
	- lengthy or boring tasks, e.g. compiling
	- subtasks, e.g. learning programming methods
	- snags, e.g. "file not found"



*Hammond et al:  ...Interviews with Designers*

Designers' theories of users are often over-generalized, not sufficiently
context-sensitive.

Designers show a strong tendency to be overly preoccupied with conceptual
models and elegance rather than usability.



*Tennant, Ross, Thompson: ...Menu-Based Natural Language...*

User interfaces using natural language are (a) nice in concept, but
(b) terrible to implement.  Things like near-miss sentences are a
major pain for users, because it's not obvious which ones are understood
or why.

They tried a different approch for a database query system:  natural
language input by picking words, names, and phrases from a set of menus.
The result is incremental construction of an English-language query
which is guaranteed syntactically correct.  They found four major wins:

	- no spelling errors

	- obvious about language coverage
		Even expert users of a more ordinary query system,
		confronted with this one, sometimes discovered possible
		queries that they'd been unaware of.

	- encourages exploration

	- development and execution resource requirements are small
		Particularly in comparison to "real" natural-language input.

...and two major warts:

	- Particularly for long queries, the English gets quite stilted.

	- Complex data structures can lead to excessively-large menus.

With these reservations, they thought the technique a major win for
nontrivial query work by fairly-naive users.



*Bewley et al:  Human Factors Testing for the Xerox Star*

"Keep the designers out of the room when you're videotaping user tests."

Designers find watching the resulting videotapes very uncomfortable.

"Did you test iconic vs. non-iconic user interfaces?"  "No."  "Why not?"
"A variety of practical and political considerations."

Japanese engineers, confronted with the Star and told about it's ease-of-use
emphasis, asked "Why are you doing this?  We will *train* our people to
use the system!"  Mind you, they are starting to change their minds now.



*Wright:  ...Computer Documentation*

Writing-style guidelines alone have little effect.  When editors were
given a list of guidelines, they still didn't consistently pick out
things that flagrantly violated them.  Guidelines encourage people
to tinker with the text, but don't help with specifics.  Worse, they
tend to be a substitute for thinking:  people miss tradeoffs by blindly
applying guidelines.

"Avoiding catastrophes is more important than attaining perfection."

People seldom read the manuals, even when they clearly should.  Thus
it is important to optimize manuals for quick search rather than smooth
sequential reading.

People tend to thumb through rather than using the index or the table
of contents.  Informative running headings are important.

Users are not good at guessing performance on document-related tasks.
For speed of completion, the best way to lay out a questionnaire with
boxes to be ticked is to put each set of boxes immediately after the
question it relates to, i.e. *not* to have a nice neat column of boxes
off to the right of the questions.  But people see a questionnaire
laid out like this as being more difficult to complete than a neatly-
organized one.  This can matter considerably when, for example, filling
out the questionnaire is voluntary.



*Poller, Garter:  ...Moded and Modeless Text Editing...*

They tested experienced users of vi and emacs on various text-processing
chores, all users using the editor they preferred.  There was no significant
speed difference.  The vi (moded) group made fewer errors (!).  The claim
that modeless editors make editing during (as opposed to after) composing
easier appears to be true, but there was no difference in the end results
after completion of both composition and editing.



*Rosson:  Patterns of Experience in Text Editing*

Subjects were various users of an IBM editor which had a very large
command set and extensive macro facilities.

There was extensive individual variation.  The second-smallest command
repertoire was that of a very experienced user.

The well-known increase in work rate with experience can be *entirely*
accounted for by:
	- A higher fraction of alteration, as opposed to viewing,
		commands.  I.e., the experienced user spends less
		time looking around while changing text.
	- A higher command rate.

Note that this list does *not* include "a larger command repertoire".

The average scope of commands (i.e. amount of text affected), the
repertoire of programmed-function keys, and the use of fancy macro
facilities do *not* correlate with experience.

One limiting factor is that the editor in question gives no real
indication of the coverage of its command set (e.g. no elaborate
"help" facility).

Suggestions for making systems easier to use well:
	- People need to recognize useful strategies.  The manuals
		should explain why a command is useful, not just
		how to use it.
	- One might want to have the system monitor users and offer
		suggestions in some cases.
	- Sophisticated facilities should be easier to discover
		(menus, on-line help stuff, etc.).
	- Learning should be easy and safe.  (The editor in question
		had no "undo" command, a serious deficiency.)



*Gomez et al:  Interface Design vs Text Editor Use*

They gave subjects a battery of psychological tests and then tested
them on editing tasks, looking for factors that might predict editing
skill.  They first worked with a line editor.  Results:
	- editing skill correlates positively with spatial memory
	- editing skill correlates negatively with age (!)
	- editing skill correlates weakly positively with typing speed

The spatial-memory effect is "obviously" a result of using a line editor.
Seems reasonable.  So they tried with a screen editor.  Results:
	- editing skill *still* correlates positively with spatial memory
	- but the age effect is gone
	- overall editing speed is roughly doubled

The general result from human-factors work on age effects is that more
complex tasks are more age-sensitive.  They suggest that the age effect
in line editing is simply because line editing is more complicated than
screen editing, from a beginner's point of view.

Conclusions:
	- Spatial memory seems to be important for text editing,
		regardless of type of editor.
	- Evaluation of designs should be done using older people,
		to show age effects.
	- The superiority of screen editors over line editors may be
		largely just a question of simplicity.



*Halasz, Moran: Mental Models and Problem Solving...*

This was an attempt to experimentally validate the common folklore that
people cope better with complex systems if you give them a mental model
of what's going on inside.  The system in question was an RPN calculator
(actually simulated on a workstation screen for ease of monitoring).
One group got mental-model training, the other got cookbook training.

Mental models win, heavily, provided the problem being solved is complex
enough to go beyond familiar examples.  There is little difference if
the problem is of a familiar type.

Models definitely are used when solving unfamiliar problems.

The big win of mental models is that they provide a complete framework
for reasoning about unfamiliar problems.  Non-model subjects had to
use a much less efficient framework involving trying to modify methods
for known problems; the incompleteness of this problem space led to
frequent failures.



*my own comments from watching videotape demos*

The Interlisp-D system "grays out" a window (overlays it with a fairly
dark stipple pattern) when changes to source have made the information
in that window (e.g. a calling-structure graph) obsolete.

A quick way of placing many symbols from a small vocabulary into a
drawing space is to use stroke recognition on cursor motion after a
button hit, with different directions and shapes of strokes specifying
the different symbols.

"Futuristic, yes, but the future has a habit of happening next
Wednesday afternoon."  -- Robert Spence

The Mesa system has a "sample tool", which is a "boilerplate" source
file containing one example of each kind of interaction (e.g. pop-up
menu, create new window, etc.).  This is used as a starting point
for building real tools.  The result is easier editing (less typing
from scratch) and much more standardization of interaction methods.

The Xerox internal systems are much less spectacular, in a user-interface
sense, than things like the Star.  The contrast is quite striking, in
fact.  Cryptic command languages abound and very little use is made of
the graphics capabilities of the hardware, beyond a few standard things
like pop-up menus and multiple overlapping windows.  At least, that's the
way it looked from watching the demos.  Rob Pike, describing the Blit
software in another context, observed that he was a heavy user of the
Blit debugger but had never found any need to read the manual.  I can't
imagine the same statement being made about the Mesa debugging environment
that I saw on the videotape.

The Mesa debugging environment has another real horror arising from the
decision to use a single address space and rely on strong typing (etc.)
for protection.  The debugging system has a nice display with all sorts
of windows showing interesting things, but it can't run at the same time
as the process being debugged because of the possibility that the data
structures might get smashed.  So when you hit "run", the screen blanks
for *several* *seconds* while all of main memory is dumped to disk and
the process being debugged is loaded.  It runs, you interact with it,
it hits a breakpoint that you set... and *again* the screen blanks for
several seconds while the process is "swapped out" and the debugger
is brought back in.  *Then* you can examine variables, paw through the
source, make changes, compile things, etc.  Lordy.  *This* is the
major software development environment for the people who built the
Star?!?  If any of the developers are reading this, nothing personal
folks, but that's a fearful botch.



*Brown:  When User Hits Machine...*

The Xerox 8200 copier (a big, fancy thing that collates, staples, etc.)
looked good to both the developers and the Xerox human-factors people,
but casual users encountering it for the first time "collapsed into a
twitching heap".  There was considerable interest in finding out why.

There is an important distinction between the easy of use of a device
once understood and the problem-solving effort needed to make sense
out of the thing for the first time.

The 8200 had the usual flipchart instruction booklet, a wall chart of
instructions, plus the usual detailed instructions inside various parts
of the machine.  The *first* job with something like this is making
sense out of the structure of the *instructions*.  "Fixing" problems
by adding more instructions makes this *worse*.

Using two-person teams as subjects leads to people "thinking out loud"
without being explicitly asked to.  They found this very useful.

Conclusion:  The "idiot-proofing" approach is *fundamentally wrong*,
because it's impossible for a complex system.  One must design instead
for immediate detection and repair of trouble, i.e. for local management
of trouble rather than for absence of trouble.  This means giving users
an explicit model of the machine plus designing the machine so that it
makes its workings and troubles self-evident.

Local management of trouble seems to be the way humans normally work.
One videotape of an experienced user using the 8200 showed him hitting
a long succession of minor hitches and obstacles, but coping well with
all of them.  When he was asked about it afterward, he said that he'd
had "no trouble", and was most insistent about this even when told
that the videotape showed otherwise.

[Brown talked for a while about the sociological effects of new equipment.
He observed that once a machine has a reputation as a lemon, it's dead,
even if the problems get fixed.  But you can eliminate the reputation
by just wheeling the thing out the door, changing the nameplate, and
wheeling it back in.]

[In a similar sociological vein, at a party once a Xerox executive
approached him about a mail system he'd helped build.  The man complained
that in the old days, his secretary proofread the stuff he sent out, but
with the new electronic mail system he found himself going to great lengths
to proofread the stuff for typos etc. before it went out.  He asked for
things like spelling checkers.  Brown half-seriously replied that the real
solution was to change the system to deliberately inject spelling errors
(without changing the sense of a message -- that's the hard part) so that
people would get used to the idea that this was a less formal method of
interaction than inter-office memos!]



*END*
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry