[comp.ai.digest] Visual Decoding

Laws@STRIPE.SRI.COM (Ken Laws) (06/18/87)

  From: ihnp4!homxb!houxm!houdi!marty1@ucbvax.Berkeley.EDU  (M.BRILLIANT)

  For monaural
  sound, decoding can be done with Fourier methods that are in principle
  continuous.  For monocular vision, Fourier methods are used for image
  enhancement to aid in human decoding, but I think machine decoding
  depends on making the spatial dimensions discontinous and comparing the
  content of adjacent cells.

Marty is right; one must be specific about the types of signals that are
carrying the information.  Information theorists tend to work with
particular types of modulation (e.g., radar returns), but are interested
in the general principles of information transmission.  Some of the
spread spectrum work is aimed at concealing evidence of modulation while
still being able to recover the encoded information.

Fourier techniques are particularly appropriate for speech processing
because sinusoidal waveforms (the basis of Fourier analysis) are the
eigenforms of acoustic channels.  In other words, the sinusoidal components
of speech are transmitted relatively unharmed, although the phase relationships
between the components can be scrambled.  Any process that decodes acoustic
signals must be prepared to deal with a little phase spreading.  Other
1-D signals (e.g., spectrographic signatures of chemicals) may be composed
of Gaussian pulses or other basis forms.  Yet others may be generated by
differential equations rather than composition or modulation of basis
functions.  Decoding generally requires models of the generating process
and of the channel or sensing transformations, particularly if the latter
are invertible.

Images are typically captured in discrete arrays, although we know that
biological retinas are neither limited to one kind of detector/resolution
nor so spatially regular.  Discrete arrays are convenient, and the Nyquist
theorem (combined with the limited spatial resolution of typical imaging
systems) gives us assurance that we lose nothing below a specific minimum
frequency -- we can, if we wish, reconstruct the true image intensity at
any point in the image plane, regardless of its relationship to the pixel
centers.  (In practice this interpolation is exceedingly difficult and is
almost never done -- but enough pixels are sampled to make interpolation
unnecessary for the types of discrimination we need to perform.)  The
discrete pixel grid is often convenient but is not fundamental to the
enterprise of image analysis.

A difficulty in image analysis is that we rarely know the shapes of the
basis functions that carry the information; that, after all, is what we
are trying to determine by parsing a scene into objects.  We do have
models of the optical channels, but they are generally noninvertible.
Our models of the generating processes (e.g., real-world scenes) are
exceedingly weak.  We have some approaches to decoding these signals,
but nothing approaching the power of the human visual system except in
very special tasks (such as analysis of bubble chamber photographs).

					-- Ken
-------