sullivan@asuvax.asu.edu (G. Allen Sullivan) (11/07/89)
In 1986-1987, an autonomous wheelchair existed at Arizona State Univ. It navigated by tracking the black baseboards along the hallways [which nicely contrasted with off-white floors and walls], used a sonar to detect alcoves and person-sized obstacles, and looked for jagged patterns in the baseboards around intersections to determine when to turn 90 degrees. WHACKY [yep! put a hole in a wall] used 2 on-board IBM PCs to execute everything, including analyzing 2.5 images of 128X128 per second to generate steering info. You can imagine that with about 0.1 MIPS/2.5 compute power, you can't do any sort of prediction of where image features will next appear, etc. Indeed, we [Prof Richard Madasraz, Loren Heiny, Rob Lovell, me] only had power to perform binary image processing. That bothered me. Hardware accelerators could be cheap and effective. I investigated: bit slices, DSP, systolic arrays, grids of 8088s. After years of pondering the challenge, I've settled on an accelerator with several types of concurrent processing nodes: ALU, integer math, DO-LOOP increments, DO-LOOP terminations, and general lookup tables. All the processing will occur between registers; memory interfaces only to registers. Having some of the processing nodes implement lookups is justified since e.g. X * cos(theta) can be executed in ONE step by concatenating X with theta to form an address. (Yes, this gets expensive for high accuracy.) The mode of processing is to SHUFFLE operands as needed from where values were generated to where a value is needed, and with values in place, trigger the processing nodes required to TRANFORM params into new values. Operands will be shuffled between registers at a rate of 30-60 MHz. I'd use 74Fxxx, keeping the highest speed clocks within a 3X3 inch area. The transform phase, to keep down cost, will allow about 200 nsec so that I can use cheap static RAM for the big lookup tables. The other activities, such as ALU, DO-LOOP incrementing, etc could be much faster, but there is no reason. WHAT GOOD IS ALL THIS?? I've written various vision algorithms in this architecture. The code often keeps 5 activities going concurrently. Assuming 8 params needed to be moved to set up the 5 activities, at 30 nsec per move, that's 240 nsec SHUFFLE phase. Cheap SRAM needs 200 nsec. Thus each instruction, effectively Multiple-Instruction-Multiple-Data, takes 480 nsec, or 2.15 Million Instructions/Sec. This is conservative. Notes that the actual rate of doing something useful, for these inner-loop instructions, is 5 * 2.15 MIPS, or 10.75 MIPS. With 200 nsec RAM. COMMENTS?? FLAMES? I need cheap, powerful, simply-to-build hardware, for 8 bit gray scales, yet supporting linked lists for WIRE-FRAME modeling of 3-D spaces. Allen Sullivan Arizona State University Tempe/Phoenix sullivan@asuvax.eas.asu.edu