Laws@STRIPE.SRI.COM (Ken Laws) (06/18/87)
From: ihnp4!homxb!houxm!houdi!marty1@ucbvax.Berkeley.EDU (M.BRILLIANT) For monaural sound, decoding can be done with Fourier methods that are in principle continuous. For monocular vision, Fourier methods are used for image enhancement to aid in human decoding, but I think machine decoding depends on making the spatial dimensions discontinous and comparing the content of adjacent cells. Marty is right; one must be specific about the types of signals that are carrying the information. Information theorists tend to work with particular types of modulation (e.g., radar returns), but are interested in the general principles of information transmission. Some of the spread spectrum work is aimed at concealing evidence of modulation while still being able to recover the encoded information. Fourier techniques are particularly appropriate for speech processing because sinusoidal waveforms (the basis of Fourier analysis) are the eigenforms of acoustic channels. In other words, the sinusoidal components of speech are transmitted relatively unharmed, although the phase relationships between the components can be scrambled. Any process that decodes acoustic signals must be prepared to deal with a little phase spreading. Other 1-D signals (e.g., spectrographic signatures of chemicals) may be composed of Gaussian pulses or other basis forms. Yet others may be generated by differential equations rather than composition or modulation of basis functions. Decoding generally requires models of the generating process and of the channel or sensing transformations, particularly if the latter are invertible. Images are typically captured in discrete arrays, although we know that biological retinas are neither limited to one kind of detector/resolution nor so spatially regular. Discrete arrays are convenient, and the Nyquist theorem (combined with the limited spatial resolution of typical imaging systems) gives us assurance that we lose nothing below a specific minimum frequency -- we can, if we wish, reconstruct the true image intensity at any point in the image plane, regardless of its relationship to the pixel centers. (In practice this interpolation is exceedingly difficult and is almost never done -- but enough pixels are sampled to make interpolation unnecessary for the types of discrimination we need to perform.) The discrete pixel grid is often convenient but is not fundamental to the enterprise of image analysis. A difficulty in image analysis is that we rarely know the shapes of the basis functions that carry the information; that, after all, is what we are trying to determine by parsing a scene into objects. We do have models of the optical channels, but they are generally noninvertible. Our models of the generating processes (e.g., real-world scenes) are exceedingly weak. We have some approaches to decoding these signals, but nothing approaching the power of the human visual system except in very special tasks (such as analysis of bubble chamber photographs). -- Ken -------