henry@utzoo.UUCP (Henry Spencer) (01/01/84)
The following is sort of a report on what I thought was interesting at CHI 83. "Sort of" because in many places I have omitted things that are covered adequately in the proceedings. Treat this as random observations rather than a complete summary of the conference. Note also that this is, to some extent, my interpretations rather than an attempt at literal and unbiased reporting. I apologize in advance to anyone whose views I have inadvertently misrepresented. Also as usual, this article represents my own views and not those of U of T. All statements of results should be understood to have the usual hedgings about limited applicability and the need for cautious interpretation. *Don Norman's keynote address* Hitting on the right new concept can make the user interface a secondary detail in user acceptance. The user interface for the original Visicalc was rotten, but the idea was such a win that it sold regardless. Newer spreadsheets do better on user interface. Unix is another example. [Norman is the author of the infamous Datamation article criticizing Unix.] A sure sign of poor design is Dymo label tape and Scotch-taped notices. [Norman showed some slides of nicely-labelled controls for hall lights, the only gotcha being that none of the labels could be read in the dark!] We should think in terms of tradeoffs, not "solutions", and interactions, not "errors". Increases in information presentation have a cost: they eat into work area on the screen [his example was a message system whose "main menu" covered the entire screen!], and updating the information can hurt system responsiveness. *Dumais, Landauer: Using Examples to Describe Categories* Menu systems for selection of information have various problems, in particular the difficulty of getting good and unambiguous category names, problems with overlapping and fuzzy categories, and the need to classify by only a few dimensions in a very complex space. [They showed some slides of videotex-system "main menus", asking "which category would you look under for, say, today's weather?".] They used Yellow-Pages headings in classification experiments, with subjects asked to put a number of headings into a few general categories. Specific findings: In the presence of vague categories with minimal information, people are overly cautious and a lot of stuff ends up classed under "Miscellaneous". Deleting the "Miscellaneous" category greatly increases classification accuracy without increasing time and effort. 3 examples from a category are actually a bit better than category names chosen by experts! Systematically-chosen examples are better than randomly-chosen ones, but not much. A slightly larger number of random examples (3 was the most they tested) might eliminate the difference altogether. The best performance was with category names (which show the central tendency of a category) plus 3 examples (which show the variability of the category). Even the best performance is still strikingly poor: they never saw classification accuracy better than about 50%. *Kraut, Hanson, Farber: Command Use and Interface Design* "In a rich computer system, most users don't use most commands most of the time." It's important to make the few most common commands very easy to get at. They got statistics on command use by experienced Unix users. Their paper has some nice stuff on command-sequence frequency and some nice command-transition diagrams. Fully 10% of commands are syntax errors. This might be as high as 20-30% if you counted commands aborted during typing and semantic errors as well. Three classes of Unix commands were particularly error-prone: - commands with odd syntax, e.g. find, at No surprise. - "programming" commands, e.g. awk, cc, for, while These commands require extensive advance planning and give little intermediate feedback. Users often do the same job in smaller steps for the sake of feedback and as-you-go error correction. - commands requiring memory of current status, e.g. cat, rmdir, cp The most frequent command both before and after "cd" was "ls". Clearly most people would work more efficiently with a system that displayed position and status on screen continuously rather than requiring the user to constantly ask for it. General conclusions were the importance of the "core" commands, the importance of orienting information and feedback when moving through complex tasks, and the importance of intelligent error treatment. They suggested the "generic editor" model for interaction as a useful way of having status displayed at all times. *De Leon, Harris, Evens: Is There Really Trouble with UNIX?* The first attempt to actually confirm one of Don Norman's complaints about Unix (cryptic command names making life difficult) yielded strongly negative results. Subjects were high-school girls with high scores on math and science assessment tests but zero previous computer experience. Tasks were simple file manipulation. One group got our old friends cat, grep, cp, etc. The other group got English words for the same jobs. (The English words were generally longer, but spelling errors were not frequent for either group, and the test criterion was error rate rather than time.) Both groups had a "help" command, printing a "cheat sheet" with one-line summaries of all commands. The tests were run on three successive Saturdays: 1. Both groups making heavy use of "help". Cryptic-names group makes about half as many errors as English-names group! 2. Cryptic-names group still using "help" heavily, English-names group much less so. But cryptic-names group is *still* making half as many errors! 3. Experimenters get nasty, take "help" command away. Average error rate for cryptic-names group goes way up, but so does the variance, preventing statistically-significant conclusions. Tentative hypothesis is that cryptic names are harder to learn and remember, but their lack of pre-existing meanings results in much less ambiguity and confusion. "Was there any difference in user satisfaction?" "We didn't learn much about user satisfaction, because they were all so thrilled to be using a computer for the first time..." *Gould, Lewis: Designing for Usability...* Their group evolved some basic principles for the design of user interfaces: - early focus on users - user involvement in design - early user testing - iterative design based on test results These sound like motherhoods, but when designers are actually *asked* about their design approach, very few ever mention the word "user" at all! Even under the loosest interpretation criteria, only a modest fraction of designers mention anything on the above list. They elaborated: - early focus on users - direct contact and observation - try doing the job yourself? - user involvement - real users ON THE DESIGN TEAM - not just their managers! - early testing - useful experiments can precede prototypes - documentation, etc. must be tested too - iterative design - changes fed back from tests - presupposes ease of change They observed that these things do not necessarily lengthen the design process, in fact they can shorten it. *Bannon, Cypher, Greenspan, Monty: ...Users' Activity Organization* "It's the exception, not the rule, for a task to be completed without an interruption." They classified reasons for task switching into five categories: - "while I'm at it", e.g. housecleaning in an infrequently-visited directory - concurrent demands, e.g. urgent mail - lengthy or boring tasks, e.g. compiling - subtasks, e.g. learning programming methods - snags, e.g. "file not found" *Hammond et al: ...Interviews with Designers* Designers' theories of users are often over-generalized, not sufficiently context-sensitive. Designers show a strong tendency to be overly preoccupied with conceptual models and elegance rather than usability. *Tennant, Ross, Thompson: ...Menu-Based Natural Language...* User interfaces using natural language are (a) nice in concept, but (b) terrible to implement. Things like near-miss sentences are a major pain for users, because it's not obvious which ones are understood or why. They tried a different approch for a database query system: natural language input by picking words, names, and phrases from a set of menus. The result is incremental construction of an English-language query which is guaranteed syntactically correct. They found four major wins: - no spelling errors - obvious about language coverage Even expert users of a more ordinary query system, confronted with this one, sometimes discovered possible queries that they'd been unaware of. - encourages exploration - development and execution resource requirements are small Particularly in comparison to "real" natural-language input. ...and two major warts: - Particularly for long queries, the English gets quite stilted. - Complex data structures can lead to excessively-large menus. With these reservations, they thought the technique a major win for nontrivial query work by fairly-naive users. *Bewley et al: Human Factors Testing for the Xerox Star* "Keep the designers out of the room when you're videotaping user tests." Designers find watching the resulting videotapes very uncomfortable. "Did you test iconic vs. non-iconic user interfaces?" "No." "Why not?" "A variety of practical and political considerations." Japanese engineers, confronted with the Star and told about it's ease-of-use emphasis, asked "Why are you doing this? We will *train* our people to use the system!" Mind you, they are starting to change their minds now. *Wright: ...Computer Documentation* Writing-style guidelines alone have little effect. When editors were given a list of guidelines, they still didn't consistently pick out things that flagrantly violated them. Guidelines encourage people to tinker with the text, but don't help with specifics. Worse, they tend to be a substitute for thinking: people miss tradeoffs by blindly applying guidelines. "Avoiding catastrophes is more important than attaining perfection." People seldom read the manuals, even when they clearly should. Thus it is important to optimize manuals for quick search rather than smooth sequential reading. People tend to thumb through rather than using the index or the table of contents. Informative running headings are important. Users are not good at guessing performance on document-related tasks. For speed of completion, the best way to lay out a questionnaire with boxes to be ticked is to put each set of boxes immediately after the question it relates to, i.e. *not* to have a nice neat column of boxes off to the right of the questions. But people see a questionnaire laid out like this as being more difficult to complete than a neatly- organized one. This can matter considerably when, for example, filling out the questionnaire is voluntary. *Poller, Garter: ...Moded and Modeless Text Editing...* They tested experienced users of vi and emacs on various text-processing chores, all users using the editor they preferred. There was no significant speed difference. The vi (moded) group made fewer errors (!). The claim that modeless editors make editing during (as opposed to after) composing easier appears to be true, but there was no difference in the end results after completion of both composition and editing. *Rosson: Patterns of Experience in Text Editing* Subjects were various users of an IBM editor which had a very large command set and extensive macro facilities. There was extensive individual variation. The second-smallest command repertoire was that of a very experienced user. The well-known increase in work rate with experience can be *entirely* accounted for by: - A higher fraction of alteration, as opposed to viewing, commands. I.e., the experienced user spends less time looking around while changing text. - A higher command rate. Note that this list does *not* include "a larger command repertoire". The average scope of commands (i.e. amount of text affected), the repertoire of programmed-function keys, and the use of fancy macro facilities do *not* correlate with experience. One limiting factor is that the editor in question gives no real indication of the coverage of its command set (e.g. no elaborate "help" facility). Suggestions for making systems easier to use well: - People need to recognize useful strategies. The manuals should explain why a command is useful, not just how to use it. - One might want to have the system monitor users and offer suggestions in some cases. - Sophisticated facilities should be easier to discover (menus, on-line help stuff, etc.). - Learning should be easy and safe. (The editor in question had no "undo" command, a serious deficiency.) *Gomez et al: Interface Design vs Text Editor Use* They gave subjects a battery of psychological tests and then tested them on editing tasks, looking for factors that might predict editing skill. They first worked with a line editor. Results: - editing skill correlates positively with spatial memory - editing skill correlates negatively with age (!) - editing skill correlates weakly positively with typing speed The spatial-memory effect is "obviously" a result of using a line editor. Seems reasonable. So they tried with a screen editor. Results: - editing skill *still* correlates positively with spatial memory - but the age effect is gone - overall editing speed is roughly doubled The general result from human-factors work on age effects is that more complex tasks are more age-sensitive. They suggest that the age effect in line editing is simply because line editing is more complicated than screen editing, from a beginner's point of view. Conclusions: - Spatial memory seems to be important for text editing, regardless of type of editor. - Evaluation of designs should be done using older people, to show age effects. - The superiority of screen editors over line editors may be largely just a question of simplicity. *Halasz, Moran: Mental Models and Problem Solving...* This was an attempt to experimentally validate the common folklore that people cope better with complex systems if you give them a mental model of what's going on inside. The system in question was an RPN calculator (actually simulated on a workstation screen for ease of monitoring). One group got mental-model training, the other got cookbook training. Mental models win, heavily, provided the problem being solved is complex enough to go beyond familiar examples. There is little difference if the problem is of a familiar type. Models definitely are used when solving unfamiliar problems. The big win of mental models is that they provide a complete framework for reasoning about unfamiliar problems. Non-model subjects had to use a much less efficient framework involving trying to modify methods for known problems; the incompleteness of this problem space led to frequent failures. *my own comments from watching videotape demos* The Interlisp-D system "grays out" a window (overlays it with a fairly dark stipple pattern) when changes to source have made the information in that window (e.g. a calling-structure graph) obsolete. A quick way of placing many symbols from a small vocabulary into a drawing space is to use stroke recognition on cursor motion after a button hit, with different directions and shapes of strokes specifying the different symbols. "Futuristic, yes, but the future has a habit of happening next Wednesday afternoon." -- Robert Spence The Mesa system has a "sample tool", which is a "boilerplate" source file containing one example of each kind of interaction (e.g. pop-up menu, create new window, etc.). This is used as a starting point for building real tools. The result is easier editing (less typing from scratch) and much more standardization of interaction methods. The Xerox internal systems are much less spectacular, in a user-interface sense, than things like the Star. The contrast is quite striking, in fact. Cryptic command languages abound and very little use is made of the graphics capabilities of the hardware, beyond a few standard things like pop-up menus and multiple overlapping windows. At least, that's the way it looked from watching the demos. Rob Pike, describing the Blit software in another context, observed that he was a heavy user of the Blit debugger but had never found any need to read the manual. I can't imagine the same statement being made about the Mesa debugging environment that I saw on the videotape. The Mesa debugging environment has another real horror arising from the decision to use a single address space and rely on strong typing (etc.) for protection. The debugging system has a nice display with all sorts of windows showing interesting things, but it can't run at the same time as the process being debugged because of the possibility that the data structures might get smashed. So when you hit "run", the screen blanks for *several* *seconds* while all of main memory is dumped to disk and the process being debugged is loaded. It runs, you interact with it, it hits a breakpoint that you set... and *again* the screen blanks for several seconds while the process is "swapped out" and the debugger is brought back in. *Then* you can examine variables, paw through the source, make changes, compile things, etc. Lordy. *This* is the major software development environment for the people who built the Star?!? If any of the developers are reading this, nothing personal folks, but that's a fearful botch. *Brown: When User Hits Machine...* The Xerox 8200 copier (a big, fancy thing that collates, staples, etc.) looked good to both the developers and the Xerox human-factors people, but casual users encountering it for the first time "collapsed into a twitching heap". There was considerable interest in finding out why. There is an important distinction between the easy of use of a device once understood and the problem-solving effort needed to make sense out of the thing for the first time. The 8200 had the usual flipchart instruction booklet, a wall chart of instructions, plus the usual detailed instructions inside various parts of the machine. The *first* job with something like this is making sense out of the structure of the *instructions*. "Fixing" problems by adding more instructions makes this *worse*. Using two-person teams as subjects leads to people "thinking out loud" without being explicitly asked to. They found this very useful. Conclusion: The "idiot-proofing" approach is *fundamentally wrong*, because it's impossible for a complex system. One must design instead for immediate detection and repair of trouble, i.e. for local management of trouble rather than for absence of trouble. This means giving users an explicit model of the machine plus designing the machine so that it makes its workings and troubles self-evident. Local management of trouble seems to be the way humans normally work. One videotape of an experienced user using the 8200 showed him hitting a long succession of minor hitches and obstacles, but coping well with all of them. When he was asked about it afterward, he said that he'd had "no trouble", and was most insistent about this even when told that the videotape showed otherwise. [Brown talked for a while about the sociological effects of new equipment. He observed that once a machine has a reputation as a lemon, it's dead, even if the problems get fixed. But you can eliminate the reputation by just wheeling the thing out the door, changing the nameplate, and wheeling it back in.] [In a similar sociological vein, at a party once a Xerox executive approached him about a mail system he'd helped build. The man complained that in the old days, his secretary proofread the stuff he sent out, but with the new electronic mail system he found himself going to great lengths to proofread the stuff for typos etc. before it went out. He asked for things like spelling checkers. Brown half-seriously replied that the real solution was to change the system to deliberately inject spelling errors (without changing the sense of a message -- that's the hard part) so that people would get used to the idea that this was a less formal method of interaction than inter-office memos!] *END* -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry