[sci.electronics] a MIMD/Multiflow approach to inexpensive vision processing

sullivan@asuvax.asu.edu (G. Allen Sullivan) (11/07/89)
In 1986-1987, an autonomous wheelchair existed at Arizona State Univ.
It navigated by tracking the black baseboards along the hallways
[which nicely contrasted with off-white floors and walls], used a
sonar to detect alcoves and person-sized obstacles, and looked for
jagged patterns in the baseboards around intersections to determine
when to turn 90 degrees. WHACKY [yep! put a hole in a wall] used
2 on-board IBM PCs to execute everything, including analyzing 2.5
images of 128X128 per second to generate steering info. You can
imagine that with about 0.1 MIPS/2.5 compute power, you can't do 
any sort of prediction of where image features will next appear, etc.
Indeed, we [Prof Richard Madasraz, Loren Heiny, Rob Lovell, me] only
had power to perform binary image processing.

That bothered me. Hardware accelerators could be cheap and effective.
I investigated: bit slices, DSP, systolic arrays, grids of 8088s.
After years of pondering the challenge, I've settled on an accelerator
with several types of concurrent processing nodes: ALU, integer math,
DO-LOOP increments, DO-LOOP terminations, and general lookup tables.
All the processing will occur between registers; memory interfaces only
to registers. Having some of the processing nodes implement lookups
is justified since e.g. X * cos(theta) can be executed in ONE step by
concatenating X with theta to form an address. (Yes, this gets
expensive for high accuracy.)

The mode of processing is to SHUFFLE operands as needed from where
values were generated to where a value is needed, and with values in
place, trigger the processing nodes required to TRANFORM params into
new values.

Operands will be shuffled between registers at a rate of 30-60 MHz.
I'd use 74Fxxx, keeping the highest speed clocks within a 3X3 inch
area.

The transform phase, to keep down cost, will allow about 200 nsec so
that I can use cheap static RAM for the big lookup tables. The other
activities, such as ALU, DO-LOOP incrementing, etc could be much
faster, but there is no reason.

WHAT GOOD IS ALL THIS??

I've written various vision algorithms in this architecture. The code
often keeps 5 activities going concurrently. Assuming 8 params needed
to be moved to set up the 5 activities, at 30 nsec per move, that's
240 nsec SHUFFLE phase. Cheap SRAM needs 200 nsec. Thus each
instruction, effectively Multiple-Instruction-Multiple-Data, takes
480 nsec, or 2.15 Million Instructions/Sec. This is conservative.
Notes that the actual rate of doing something useful, for these
inner-loop instructions, is 5 * 2.15 MIPS, or 10.75 MIPS. With
200 nsec RAM.

COMMENTS?? FLAMES?
I need cheap, powerful, simply-to-build hardware, for 8 bit gray
scales, yet supporting linked lists for WIRE-FRAME modeling of
3-D spaces.

Allen Sullivan
Arizona State University Tempe/Phoenix
sullivan@asuvax.eas.asu.edu