[comp.parallel] Sorting on NxN Mesh.

KOLOUTSAKIS%grcrvax1.bitnet@RELAY.CS.NET (Math Dept U of Crete Greece Michalis Kolountzakis) (01/25/89)

[ I found this on comp.theory.  Thought it might interest the group.
  -- steve
]


I would appreciate any suggestions, references or solutions for the following
problem:
        We have a SIMD, 4-connected, NxN Mesh Connected Computer (MCC) and
        the following
        algorithm for sorting on it. The algorithm is an extension of the
        so called "odd-even transposition sorting" (OETS) algorithm for a
        sequence of N processors:

        P <--> P <--> P <--> ... <--> P
         0      1      2               N-1

        OETS works by coupling all processors and performing a compare-and-
        then-possibly-swap operation on every pair of processors. In even
        time moments the set P={0,...,N-1} is partitioned in
        {(0,1), (2,3), ..., (2i,2i+1), ...} and in odd time moments P is parti-
        tioned in {(1,2), (3,4), ..., (2i-1,2i) ,...} (P  and P   are not always
                                                        0      N-1
        in some pair). It is almost obvious that OETS is a sorting algorithm
        for the sequence of processors but it takes some time to prove that it
        is linear in N. So, in OETS, each prossesor compares its contents, first
        with its left neighbour and then with its right neighbour and so on.
        In two dimensions the most reasonable extension of OETS would be to
        couple each processor first with its upper neighbour, then with its
        lower, its left and finally irs right neighbour and perform a compare-
        and-then-possibly-swap operation. More precisely in time T the NxN
        grid G={(i,j): i, j in P /* i for rows */} is partitioned in
        { <(2i-1,j), (2i, j)> for all i, j } when T mod 4 = 0
        { <(2i,j), (2i+1,j)> for all i, j} when T mod 4 = 1
        { <(i,2j-1), (i,2j)> for all i, j} when T mod 4 = 2 and
        { <(i,2j), (i,2j+1)> for all i, j} when T mod 4 = 3,
        and all these pairs of processors compare their contents and possibly
        swap them.
        It is obvious again that it is a sorting algorithm but what is its
        time complexity? Is it linear ? is the question. I suspect it is not,
        because if it were no other sorting alg. for the NxN mesh should have
        been invented; this seems far simpler than any other I have seen.
        But I do not have a bad example for this alg. Do you?
Thanks in advance

Michalis Kolountzakis,
Mathematics Dept, Univ. of Crete, Greece

e-mail: koloutsa@grcrvax1.bitnet
------------------------------------------------------------------------
Date:     Tue, 24 Jan 89 21:28 O
From:     Michalis Kolountzakis(Math Dept U of Crete Greece)
          <KOLOUTSAKIS@GRCRVAX1>
Subject:  More on sorting on the NxN mesh
To:       theorynt@ndsuvm1

On the problem I have just mailed to the list (Sorting on the Mesh Connected
Computer) I have not given the desirable order on the Mesh for which the
algorithm I have described is supposed to work. I cannot expect this order
to be any one for which "local order" does not imply "global order", e.g. it
cannot be the row-major (lexicographic) order. If we want big elements to go
up and left, then the following configuration of values
                        4  2
                        3  1
for a 2x2 Mesh is not ordered (2 and 3 should be swapped) but it is locally
consistent with the row-major order (that is, every element in the Mesh is
consistent with its neighbours.)
  An order for the Mesh which is good for our purposes is the "snake-like"
order for which such a configuration cannot exist and so we can prove that the
proposed algorithm is indeed a sorting algorithm.

        Michalis Kolountzakis,
        Math. Dept, Univ. of Crete, Greece





I would appreciate any suggestions, references or solutions for the following
problem:
        We have a SIMD, 4-connected, NxN Mesh Connected Computer (MCC) and
        the following
        algorithm for sorting on it. The algorithm is an extension of the
        so called "odd-even transposition sorting" (OETS) algorithm for a
        sequence of N processors:

        P <--> P <--> P <--> ... <--> P
         0      1      2               N-1

        OETS works by coupling all processors and performing a compare-and-
        then-possibly-swap operation on every pair of processors. In even
        time moments the set P={0,...,N-1} is partitioned in
        {(0,1), (2,3), ..., (2i,2i+1), ...} and in odd time moments P is parti-
        tioned in {(1,2), (3,4), ..., (2i-1,2i) ,...} (P  and P   are not always
                                                        0      N-1
        in some pair). It is almost obvious that OETS is a sorting algorithm
        for the sequence of processors but it takes some time to prove that it
        is linear in N. So, in OETS, each prossesor compares its contents, first
        with its left neighbour and then with its right neighbour and so on.
        In two dimensions the most reasonable extension of OETS would be to
        couple each processor first with its upper neighbour, then with its
        lower, its left and finally irs right neighbour and perform a compare-
        and-then-possibly-swap operation. More precisely in time T the NxN
        grid G={(i,j): i, j in P /* i for rows */} is partitioned in
        { <(2i-1,j), (2i, j)> for all i, j } when T mod 4 = 0
        { <(2i,j), (2i+1,j)> for all i, j} when T mod 4 = 1
        { <(i,2j-1), (i,2j)> for all i, j} when T mod 4 = 2 and
        { <(i,2j), (i,2j+1)> for all i, j} when T mod 4 = 3,
        and all these pairs of processors compare their contents and possibly
        swap them.
        It is obvious again that it is a sorting algorithm but what is its
        time complexity? Is it linear ? is the question. I suspect it is not,
        because if it were no other sorting alg. for the NxN mesh should have
        been invented; this seems far simpler than any other I have seen.
        But I do not have a bad example for this alg. Do you?
Thanks in advance

Michalis Kolountzakis,
Mathematics Dept, Univ. of Crete, Greece

e-mail: koloutsa@grcrvax1.bitnet
------------------------------------------------------------------------
Date:     Tue, 24 Jan 89 21:28 O
From:     Michalis Kolountzakis(Math Dept U of Crete Greece)
          <KOLOUTSAKIS@GRCRVAX1>
Subject:  More on sorting on the NxN mesh
To:       theorynt@ndsuvm1

On the problem I have just mailed to the list (Sorting on the Mesh Connected
Computer) I have not given the desirable order on the Mesh for which the
algorithm I have described is supposed to work. I cannot expect this order
to be any one for which "local order" does not imply "global order", e.g. it
cannot be the row-major (lexicographic) order. If we want big elements to go
up and left, then the following configuration of values
                        4  2
                        3  1
for a 2x2 Mesh is not ordered (2 and 3 should be swapped) but it is locally
consistent with the row-major order (that is, every element in the Mesh is
consistent with its neighbours.)
  An order for the Mesh which is good for our purposes is the "snake-like"
order for which such a configuration cannot exist and so we can prove that the
proposed algorithm is indeed a sorting algorithm.

        Michalis Kolountzakis,
        Math. Dept, Univ. of Crete, Greece

halldors@paul.rutgers.edu (Magnus M Halldorsson) (01/28/89)

Preliminary, empirical investigation of the problem indicates that the
algorithm given is linear. It runs consistently in around 5n parallel
compare-exchange (cmp-xchg) operations. I conjecture a 6n upper bound
(cmp-xchg, or 12n routing operations).

This is quite good for such a simple algorithm. I am surprised of not
having seen this algorithm in the literature, despite a fairly
exhaustive search. Has somebody seen it before?

However,it is nowhere near optimal, especially considering that it
requires alternating comparisons (odd rows sort left-right, even rows
right-left).

Magnus

landman@Sun.COM (Howard A. Landman) (01/30/89)

In article <4196@hubcap.UUCP> KOLOUTSAKIS%grcrvax1.bitnet@RELAY.CS.NET (Math Dept U of Crete Greece Michalis Kolountzakis) writes:
>I would appreciate any suggestions, references or solutions for the following
>problem:
>        We have a SIMD, 4-connected, NxN Mesh Connected Computer (MCC) and
>        the following
>        algorithm for sorting on it. ...
>	It is almost obvious that OETS is a sorting algorithm
>        for the sequence of processors but it takes some time to prove that it
>        is linear in N.

However, proving the lower bound is trivial.  If we assume the reverse-ordered
case as input, the sum of the distances from initial to final position is
quadratic in the number of processors, and the rate of decrease in this sum
is at best linear in the number of processors; so the run time must be at best
linear.

>        It is obvious again that it is a sorting algorithm but what is its
>        time complexity? Is it linear ? is the question. I suspect it is not,
>        because if it were no other sorting alg. for the NxN mesh should have
>        been invented; this seems far simpler than any other I have seen.

It's pretty clear that this is no worse than the previous algorithm.  And,
when we attempt the above lower bound, we find that since the sum may decrease
by as much as N*sqrt(N), we can only establish a lower bound of order sqrt(N).
(This bound, by the way, applies to ANY CONCEIVABLE algorithm on the MCC,
not just this one.)  I suspect that 2D OETS actually achieves this lower
bound complexity, which would make it SUB-linear in N.

	Howard A. Landman
	landman@hanami.sun.com

jfbuss%water.waterloo.edu@RELAY.CS.NET (Jonathan Buss) (01/31/89)

See Schnorr and Shamir, "An optimal sorting algorithm for
mesh-connected computers," Eighteenth Ann. Symp. on Theory of
Computing (1986) p. 255.  They show that 3n steps is optimal, and
give a simple algorithm to achieve the bound.

Jonathan Buss

dan@csun1.cs.uga.edu (Dan Gordon) (01/31/89)

In article <4225@hubcap.UUCP> halldors@paul.rutgers.edu (Magnus M Halldorsson) writes:
>Preliminary, empirical investigation of the problem indicates that the
>algorithm given is linear. It runs consistently in around 5n parallel
>compare-exchange (cmp-xchg) operations. I conjecture a 6n upper bound
>(cmp-xchg, or 12n routing operations).
>

Similar algorithms have been looked at before, sorting on a mesh by alternately
sorting in rows and columns (although in these, one step usually consists
of completely sorting a row or column using the odd-even transposition sort,
not just doing one transposition).  The most recent reference is an article
by Marberg and Gafni in Algorithmica last year.

I'd be surprised if your conjecture is correct.  Most such algorithms end
up at least a factor of log n below optimal.  Marberg and Gafni gave an
algorithm which sorted in O(n) steps, but it required several new ideas to
achieve that.  I recently wrote a paper which analyzes the same type of
algorithm on a 2^n hypercube, where it requires O(log^2 n) time, also
a factor of log n worse than optimal

Dan Gordon
Depts. of CS and Math
University of Georgia

halldors@athos.rutgers.edu (Halldorsson) (02/03/89)

I wrote:

> Preliminary, empirical investigation of the problem indicates that the
> algorithm given is linear.  I conjecture a 6n upper bound
> (cmp-xchg, or 12n routing operations).

Unfortunately, this has turned out not to be true. The algorithm has a
worst case complexity of O(n^2).
   Consider these two kinds of input:

1) Fill the upper triangular matrix of 1's, and the lower of 0's.

               1 1 1 1 1 1 1 1
               0 1 1 1 1 1 1 1
               0 0 1 1 1 1 1 1
               0 0 0 1 1 1 1 1
               0 0 0 0 1 1 1 1
               0 0 0 0 0 1 1 1
               0 0 0 0 0 0 1 1
               0 0 0 0 0 0 0 1

The algorithm sorts it in  n^2/2 + 3n cmp-xchg operations.

2) Divide the columns into four sections, fill the middle two with
1's, and the left and the right sections with 0's.

               0 0 1 1 1 1 0 0
               0 0 1 1 1 1 0 0
               0 0 1 1 1 1 0 0
               0 0 1 1 1 1 0 0
               0 0 1 1 1 1 0 0
               0 0 1 1 1 1 0 0
               0 0 1 1 1 1 0 0
               0 0 1 1 1 1 0 0

The algorithm will sort it in  n(n+1)/2 + 4  cmp-xchg operations.

> It runs consistently in around 5n parallel
> compare-exchange (cmp-xchg) operations. 

That is true, and the average case appears to be somewhere between 4n
and 5n, hence the algorithm might actually be quite useful in practice.

Magnus

Note: These results have been gotten by simulation. They have not been
      proven.