[comp.cog-eng] Images vs. Text: Syntactic Issues

ralex@tigger.Colorado.EDU (Repenning Alexander) (04/08/91)

Hi there,

It appears that there's a spectrum of application domains regarding
the applicability of picture-oriented representations. On one side
most people agree that having an iconic tool palette in MacDraw-like
programs is a good idea. On the other side I could hardly imagine
anybody trying to "translate" a telephone book into a bunch of icons.

So, there are definitely domains asking for a textual representations
simple because we're used to deal with them (right, this is specific to
our culture and therefore not generaly true). The question is can we
elaborate on this spectrum? Let me try to come up with a very
simplistic classification. I realize this is VERY naive and limited to
English and alike languages. The following list is sorted from simple
to hard:

nouns representing real objects: e.g., a horse
  simple because these nouns can be mapped to a (stereotypical)
  pictorial representation of the object

abstract nouns representing concepts: e.g., a loan
  cannot be mapped to a single "real" object

adjectives
  simple if property is intrinsically visual, e.g., color
  otherwise, .. hmmm..

verbs
  e.g., describe relationships between agent and patient. Without
  using some sort of animation it is very hard to convey the action
  associated with a pictorial representation. Agent and/or patient which
  typically have to be part of the pictorial representation can be
  missinterpreted as the noun(s) to be represented, e.g., how would you
  represent the action of sawing? Drawing a saw would most likely imply
  the noun saw.

I'd like to get your opionions. Does it make any sense at all? 

Maybe part of the problem with picture-oriented representations is
that, in contrast to English, picture-oriented representations do not
have any defined syntax. You can of course make up a syntax using
color, spatial features, explicit relationships (e.g., arcs), etc. 

Have there been attempts to capture a "natural" syntax for images used by people
(I'm not talking about the syntax imposed by individual visual programming
systems)? What does it mean on an abstract level when somebody draws a
bunch of circles, boxes and lines? What is the minimal set of
graphical primitives?

  Cheers, Alex "no signature" Repenning

mcgregor@hemlock.Atherton.COM (Scott McGregor) (04/09/91)

In article <1991Apr7.184708.22888@colorado.edu>,
ralex@tigger.Colorado.EDU (Repenning Alexander) writes:

> abstract nouns representing concepts: e.g., a loan
>  cannot be mapped to a single "real" object

If an abstract noun has a direct affect on the real world, then it is often
sensible to indicate the abstract noun by an aspect of it which appears in
the real world.  For instance, a loan contract is a physical world thing,
the loan an abstract concept.  Yet it is very common for people to treat,
in WORD, in ACTION, and in PICTURES the physical world representation as
the loan itself (e.g. "we need your signature on the loan" (paper form
is meant but not stated explictly).  This works, because people understand
discourse in terms of context.  A picture of a loan form, or a check, or
of the collatoral might be sufficient in a given context either separately
or together to indicate the abstract loan.  Now the word Loan may well
do this just as well.  One of the nice thing about "graphical" representations
is that they support both text and pictures, and these can be combined
to help convey information more effectively, to more people, than either
alone could.  Of course this is dependent upon the skills of the designer.

> adjectives
>   simple if property is intrinsically visual, e.g., color
>  otherwise, .. hmmm..

Here again, convention and context stand in for grammar, and allow 
adjectives to be represented in other ways.  Standard alternatives
are to use a different visual or temporal attribute.  For instance
a tool for exploring particle physics might represent quarks as circles
and show their "color" by using a corresponding "color" and their "spin"
by a combination of arrows  and/or animation of the spin.  However,
the terms color and spin applied to quarks do not REALLY have any of
these properties in the sense that apples are red (i.e. electro-magnetic
radiation reflection characteristics) or that wheels spin (angular
momentum).  Rather in the case of quarks, they are made up terms for
abstract qualities only having to do with conservation and quantum
numbers, but which have no other macro world meaning.  As you can see
from this example, this use of substitute terms, for things that we
cannot "see" is common not only in graphical renderings, but also in
our verbal language itself. 

In the case of adjectives, color, shape, border characteristics,
physical juxtaposition or arrangement on the screen, connection with
lines or arrows, and temporal juxtaposition including animation are
common techniques for showing adjectival characteristics nonverbally.

>verbs
>  e.g., describe relationships between agent and patient. Without
>  using some sort of animation it is very hard to convey the action
>  associated with a pictorial representation. Agent and/or patient which
>  typically have to be part of the pictorial representation can be
>  missinterpreted as the noun(s) to be represented, e.g., how would you
>  represent the action of sawing? Drawing a saw would most likely imply
>  the noun saw.

Note that your example "saw" is a good case of the ambiguity of text
in isolation of context alone.  Does the word

		"saw"

all by itself mean: 1) the noun for something that cuts wood?  2)
a command for the reader to begin cutting wood? or 3) the past tense
of to see?

Context is critical.  In logic class the statement "Two is blue" is
a category mistake, in poetry class it is a metaphor, and on the
billiards table it is a true statement!  Context is just as critical
for graphic or pictorial images, and the fact that images are ambiguous
in isolation does not make them useless anymore than the fact that
words in isolation are ambigous makes them useless.  The best solution
in both cases is to get them out of isolation and back into a clarifying
context as quickly as possible!

When the need for action is obvious by context, then the noun form of a tool
for doing that action is often very clear to people.  When faced with a
scissors symbol, and a paste pot symbol in a situation where cutting and
pasting are obviously desirable alternatives, people will quite naturally
select the scissors to indicate their desire "to cut" something, or 
the paste pot "to paste" something.  Dragging (a limited form of animation)
can also indicate a verb form. People understand the notion of
pulling a representation of a page to a trash can meaning to delete (throw
away) the underlying object, or pulling it to the printer icon meaning to
print it, or pulling it to the mail box meaning to mail it.  The question
of whether such dragging is superior to doing textual operations is a
complex one that also has to do with competancy with keyboards vs. pointing
devices.  But indicating verbs is quite possible in pictorial representations
when chosen well in a limited context or domain of discourse.

> I'd like to get your opionions. Does it make any sense at all? 

> Maybe part of the problem with picture-oriented representations is
> that, in contrast to English, picture-oriented representations do not
> have any defined syntax. You can of course make up a syntax using
> color, spatial features, explicit relationships (e.g., arcs), etc. 

In fact, though, most textual computer interfaces don't rely on english
syntax, but rather, like pictorial representations, get most of their
meaning from their context, or domain of discourse.  This area, which
is the focus of Wittgenstein's later work, as well as the work of Austin
and Searle, is often called Pragmatics (i.e. Language = Syntactics + Semantics
+ Pragmatics).  Pragmatics is so powerful, that in common working discourse
syntax is often abandoned, but the users fully understand what is meant.
Wittgenstein gives the example of the bricklayer who says only "brick"
to his assistant, but the assistant knows that this is a request that
a brick be retrieved and placed where the mason is working. No syntax
was involved, but no ambiguity was felt by the individuals either.
I think this is one of the reasons that english syntax is not widely used
in computer interfaces; it is just as possible to understand things by a
keyword alone, given the work context.  In fact, when syntax is introduced
into textual computer interfaces, it is likely to be a made up syntax
(e.g. "(verb/command)  -(adjectives/modifier-flags) (nouns/files))
and not traditional english syntax at all!

> Have there been attempts to capture a "natural" syntax for images used by 
> people.

Yes, it is formally taught in graphic design classes and informally learned
by most people. See below for an example.

> What does it mean on an abstract level when somebody draws a
> bunch of circles, boxes and lines? 

There are also many graphical representation "conventions" that play the
part of a pre-known syntax for their interpreters. Many of us may have
gone through school without learning these conventions formally, the way
we may have learned our written language grammar.  But we know them just
the same, just as illiterate speakers of the language often speak it
quite well with no syntax errors despite their lack of formal eduation.
For just one example, consider the principles of composition, as applied
to advertising. Consider 3 soaps. The creators of "Brand I" want it to
have cerebral/traditional appeal.  "Brand C" is designed to give the
feeling of a sensual experience.  The object for "Brand Z" is to come across
and dramatic/modern/dynamic.

If you ask a hundred people to design packaging for these soaps based on
these descriptions (and I have done this!) you will find that almost 
everyone will come up with designs that follow these rules, even if they have
had no art training:

Brand I will have vertical and/or horizontal lines on it and all text
on the package will be written horizonally.  Text fonts will be roman,
not italic (i.e. vertical ascenders and descenders, not slanted).
The package will probably be a rectangular prism.

Brand C will feature curving lines, swirls and/or circles.  Type font
selection is very likely to be a script or cursive style (i.e, curved
ascenders and descenders).  The package is likely to be an oval or
circle.

Brand Z will have lots of diagonal (nonhorizontal, nonvertical) lines 
and shapes (e.g. starburst polygons) Text is written in italics 
(slanting ascenders and descenders) and text is often written on an
angle with respect to natural base of the packaging.  Angular packaging,
such as triangles, trapezoids, or rhomboids are often selected.

Why do people come up with these designs over and over again?  The
principles of composition in graphic art dictate exactly these choices.
Without going into whether graphic principles are any more innate than
textual or verbal syntax, let us observe that in both cases people are
free to break the rules, but almost always choose not to. They may 
learn these rules formally in a classroom or may pick them up informally
through observation.  But they do learn them, internalize them, and use
them, and by doing so increase their ability to communicate in the
chosen medium.

> What is the minimal set of graphical primitives?

It is hard to say what this means.  Minimal to what context?  What
can be said is that the graphical conventions understood by indivduals
are very rich indeed, perhaps as rich as syntactic conventions used
in language. There aren't just a few things to learn if you want
to learn them formally, just as there is no short catalog of things
to learn to formally understand grammar.  We take years to learn these
things informally (especially as children), and it takes a long time
to learn them formally too.  But if you want to learn more about these,
I suggest a course of study in of "speech act" analysis and pragmatics,
plus a course of study in principles of graphic design. 

Scott McGregor
Atherton Technology