[comp.arch] Endian reversing with high level languages

RWilson@acorn.co.uk (02/21/89)

So we now combine two activities - endian reversing moves and my compiler
generates worse/better code than your compiler.

Here is John's program shoved into the most recent, safest, combat proven
etc. version of our C compiler.

>struct x { unsigned char a, b, c, d};
>unsigned y,z;
>main(p)
>        struct x *p;
>{
>        y = p->d << 24 | p->c << 16 | p->b << 8 | p->a;
>        z = (y << 24) | ((y & 0xff00) << 8) | ((y << 8) & 0xff00) | (y >> 24);
>}

; generated by Norcroft RISC OS ARM C 317/318/319/ASD Feb  3 1989, 16:48:50

    AREA |C$$code|, CODE, READONLY

        EXPORT  main
main
        LDR     a4, addressofstaticdata

 #   6          y = p->d << 24 | p->c << 16 | p->b << 8 | p->a;
        LDRB    a3, [a1, #3]
        MOV     a3, a3, ASL #24
        LDRB    a4, [a1, #2]
        ORR     a3, a3, a4, ASL #16
        LDRB    a4, [a1, #1]
        ORR     a3, a3, a4, ASL #8
        LDRB    a1, [a1, #0]
        ORR     a1, a3, a1

        STR     a1, [a4, #0]

 #   7          z = (y << 24) | ((y & 0xff00) << 8) | ((y << 8) & 0xff00) | (y >> 24);
        MOV     a3, a1, ASL #24
        AND     a4, a1, #&ff00
        ORR     a3, a3, a4, ASL #8
        MOV     a4, #&ff00
        AND     a4, a4, a1, ASL #8
        ORR     a3, a3, a4
        ORR     a1, a3, a1, LSR #24

        STR     a1, [a4, #4]

        MOVS    pc, lk

(Note lots of superfluous stuff deleted if anyone out there tries it on their
own ARM machine. Of course you can't get this compiler yet!).

First case is 4*load byte (each 3 cycles with current external DRAM system) +
4 cycles = 12.

Second case is 7 cycles.

It is a great relief to me (instruction set designer) that the compiler did
OK - managed to use shift and operate instructions, did not go berserk with
register usage or data transfer etc. The compiler writers were "quietly
confident". Optimisation (on/off) irrelevant for first line, but did help
second line.

A good job? Only if I forget that the more complex algorithm can do this in
4 (rpt of 3) cycles:

        MVN     a3,#&FF00 ; a3=0xFFFF00FF - set up constant

        EOR     a4,a1,a1,ROR #16
        AND     a4,a3,a4,LSR #8
        EOR     a1,a4,a1,ROR #8

Although I can imagine a compiler generating this, C doesn't let me specify
the rotates that make the algorithm work.

.....Roger Wilson (RWilson@Acorn.co.uk)

DISCLAIMER: (I speak for me only, etc.)