[comp.compilers] Compiling for DSP chips

tatge@m2.csc.ti.com (Reid Tatge) (09/07/90)

Concerning the article by tommyp@isy.liu.se (Tommy Pedersen) 
in which the moderator wrote:
> People I know in the DSP bix tell me that although there are many C
> compilers for DSP chips, nobody uses them because they're all too slow.

I've spent the last few years writing compilers for our (TI's) DSP chips, so
I thought I should respond to this.  There are really two classes of DSP
chips on the market today : fixed-point and floating point.  The TMS320C25
is fixed point, so I'll talk about that first.

In general, the DSP fixed-point processors across the industry are
"compiler-hostile", or in other words, very difficult to map general purpose
HLL's onto.  The `C25 is an accumulator machine with no offset addressing
and very little support for arbitrary address arithmetic.  Consequently,
compilers tend to generate sequences of instructions which would translate
to single instructions on any more conventional CPU. 

Why such an apparently contorted ISA?  The reason is simple: DSP fixed-point
processors are designed to optimize price/performance for DSP algorithms,
often at the expense of general purpose performance.  They are targeted at
real-time applications where the time-critical kernels are very small and
arithmetically intensive.  Most DSP folks are more than happy to code these
in assembly.  However, this is changing - algorithms are becoming
increasingly complex, and the need for high quality compilers cannot be
discounted.

Concerning comments by mhorne@ka7axd.wv.tek.com (Mike Horne)
> Generally speaking, few people use compilers to generate code for DSP chips
> for *time critical* code sections.  Note that this includes just about all
> signal processing algorithms.  However, you can use a high-level language
> (such as C) to build the *structure* of the program and use in-line assembly
> for time critical sections.....

I agree.  This is generally an excellent approach. However, by applying more
sophisticated optimization strategies, we plan to narrow the gap between
hand coded assembly and C performance.  This is particularly true in the
floating-point CPU arena (TMS3203x).  The new generation of floating point
DSP's tend to be moving towards more conventional compiler-friendly
architectures.  As the compilers become more sophisticated, they enable
people to code their entire program in C, with very little, if any,
performance penalty (with all the obvious benefits).  Over time, the
improved technology will also be applied the fixed point processors.

Reid Tatge
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

kuusama@news.funet.fi.tut.fi (Kuusama Juha) (09/11/90)

While it is true, that currently used HLL's do produce poor code for currently
used DSP processors, things are changing. I will not talk about code generators
that produce optimised code from filter specifications, flow graphs etc., 
altough I've seen several. But I like to point out:

In ICASSP-90 (International Conference on Acoustics, Speech and Signal
Processing) K. Leary (form Analog Devices, Inc.) gave an exellent speech on
DSP/C: "DSP/C is a structured procedural programming language that solves
the problems of using C for DSP, while retaining the benefits of C." My
personal view is, that the claim may well prove to be true. DSP/C can, as
far as I see, be compiled to _optimimum_ code for the DSP processor, given
smart enough compiler. Have a look, the article is in the proceedings book 2.

(I can't resist: if the language will indeed be popular, why not call it 'D'?)
--
Juha Kuusama, kuusama@korppi.tut.fi
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

avi@taux01.nsc.com (Avi Bloch) (09/24/90)

I realize that I'm a little late on this topic but I just saw the tail end of
this discussion and I thought I'd add what National Semiconductor has to offer
in this field.

National Semiconductor recently announced three micro-processors: the
ns32fx16, ns32cg160 and the ns32gx320. All of these processors have a core of
a general purpose processor with additions for DSP and fax applications.
These additions are accessed using either special instructions or
memory-mapped i/o.

In order to allow the user to access these special instructions from HLL (in
our case - C) we invented a mechanism which we call Application Specific
Instruction Set (ASIS) Support. What this entails is a group of functions and
procedures whose prototypes are supplied in an 'include' file and are
recognized by the compiler. These functions are then inlined by the compiler.
The compiler (including the optimizer) has intimate knowledge on how these
instructions work, e.g., which parameters are changed by the instruction or
in which register each parameter much reside, and it uses this knowledge to
allocate registers and generate code in a most efficient manner.  I'm not
saying that it will be as good as if it was written in assembly but in most
cases it's good enough.

I'm willing to add more details for anyone interested.

BTW, if anyone knows of any other compiler that does something similar, I'd be
interested to hear about it.
-- 
	Avi Bloch
National Semiconductor (Israel)
6 Maskit st. P.O.B. 3007, Herzlia 46104, Israel		Tel: (972) 52-522263
avi@taux01.nsc.com
[GCC lets you in-line assembler, frequently hidden inside macros, that is
often used to get to features like sin and cos instructions. -John]
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

pardo@cs.washington.edu (David Keppel) (09/27/90)

In article <4751@taux01.nsc.com> avi@taux01.nsc.com (Avi Bloch) writes:
>[Compiler that optimizes for special instructions.]

The moderator writes:
>[GCC lets you in-line assembler, frequently hidden inside macros, that is
>often used to get to features like sin and cos instructions. -John]

In particular, you can tell GCC that certain hard registers are
clobbered, so GCC can perform register allocation around those
instructions.  If the machine description knows about those
instructions, then I think that it is also possible to define
optimizations over those instructions, even if the compiler itself
doesn't ``know'' how to emit them.

	;-D on  ( A compile of things to do )  Pardo
-- 
		    pardo@cs.washington.edu
    {rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

seanf@sco.COM (Sean Fagan) (09/30/90)

In article <13148@june.cs.washington.edu> pardo@cs.washington.edu (David Keppel) writes:
>In particular, you can tell GCC that certain hard registers are
>clobbered, so GCC can perform register allocation around those
>instructions.  

>If the machine description knows about those
>instructions, then I think that it is also possible to define
>optimizations over those instructions, even if the compiler itself
>doesn't ``know'' how to emit them.

For example:

static inline void *
_inline_memcpy (void *dst, void *src, unsigned int len) {
	void *t1, *t2;
	unsigned int t3;

	__asm volatile ("rep;movsb %0, %1, %2" : "=D" (t1), "=S" (t2), "=c" (t3)
		: "0" (dst), "1" (src), "2" (len));
	return (temp1);
}

Although gcc does not know what the 'rep;movsb' string means, from the
information I've told it, it knows that it needs to set up three registers
(edi, esi, and ecx), and that they will be clobbered.  I have also told it
that the values for those registers will initially be in the parameters dst,
src, and len; but, after the instruction completes, to move them into t1,
t2, and t3, respectively.

Where the optimization comes into effect is that gcc can (and will) emit the
code such that register movement is as minimal as possible (i.e., it will
try to make sure that the values I want to memcpy are already in the
respective registers, if it can).  Also, without the 'volatile,' gcc is
likely to get rid of the instruciton completely, if the modified values are
never used (I just tried it, and my trivial case ended up losing the movsb,
since I never used the values!).

Where this can come into play is something like:

	static __inline const double
	__inline_sin(double x) {
		double temp;
		__asm ("fsin" : "=f" (temp) : "0" (x));
		return (temp);
	}

where gcc will get rid of the entire inline function, if it can determine
that none of the values are used.

Incidently, I have yet to see a commercial compiler where I can do this.
It's really a pity, too, since, although the inline assembly syntax is a bit
bizarre, it's far more powerful than the "normal" method of doing it.

-- 
Sean Eric Fagan
seanf@sco.COM
uunet!sco!seanf
(408) 458-1422
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

pardo@cs.washington.edu (David Keppel) (10/02/90)

>pardo@cs.washington.edu (David Keppel) writes:
>>[Tell gcc hard regs clobbered -> optimize around them.]
>>[If md knows about magic instructions can optimize over them,
>> even the compiler doesn't kow how to emit them.]

In article <7949@scolex.sco.COM> Sean Fagan <seanf@sco.COM> writes:
>[For example, `_inline_memcpy', `__inline_sin'.]

Gcc has the capability to do two things:

* Register allocation and code motion of `asm'-ed stuff.  That's
  what Sean described.

* Optimization of instructions that the compiler doesn't know
  how to emit, provided the instructions are in the machine
  description.

I'm much fuzzier on the latter, but I think it works something
like this:

* The machine description contains information about how to
  emit the "div" and "mod" instructions.

* The machine description contains a description of a peephole
  optimization that says something like ``if there's a "div"
  instruction next to a "rem" instruction, and they operate on
  the same operands, then trash the "div" instruction and get
  the results from the "rem" which computes "div" as a side-
  effect".

* The compiler has no way of producing a "rem" instruction.

* The user defines an "asm" that emits a "rem" instruction.

* If the peephole matches, the optimization occurs, even tho'
  the compiler never emitted the "rem".

I don't think this feature is used often on most targets because C
and the gcc IR are pretty well matched.  However, I could immagine
the optimizations to be useful on e.g., DSP machines where there are
some machine primitives that match poorly with C semantics but for
which various optimizations could be done with neighboring
instructions.

	;-D on  ( Looking for a few good digital signals )  Pardo
-- 
		    pardo@cs.washington.edu
    {rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.