KOLOUTSAKIS%grcrvax1.bitnet@RELAY.CS.NET (Math Dept U of Crete Greece Michalis Kolountzakis) (01/25/89)
[ I found this on comp.theory. Thought it might interest the group. -- steve ] I would appreciate any suggestions, references or solutions for the following problem: We have a SIMD, 4-connected, NxN Mesh Connected Computer (MCC) and the following algorithm for sorting on it. The algorithm is an extension of the so called "odd-even transposition sorting" (OETS) algorithm for a sequence of N processors: P <--> P <--> P <--> ... <--> P 0 1 2 N-1 OETS works by coupling all processors and performing a compare-and- then-possibly-swap operation on every pair of processors. In even time moments the set P={0,...,N-1} is partitioned in {(0,1), (2,3), ..., (2i,2i+1), ...} and in odd time moments P is parti- tioned in {(1,2), (3,4), ..., (2i-1,2i) ,...} (P and P are not always 0 N-1 in some pair). It is almost obvious that OETS is a sorting algorithm for the sequence of processors but it takes some time to prove that it is linear in N. So, in OETS, each prossesor compares its contents, first with its left neighbour and then with its right neighbour and so on. In two dimensions the most reasonable extension of OETS would be to couple each processor first with its upper neighbour, then with its lower, its left and finally irs right neighbour and perform a compare- and-then-possibly-swap operation. More precisely in time T the NxN grid G={(i,j): i, j in P /* i for rows */} is partitioned in { <(2i-1,j), (2i, j)> for all i, j } when T mod 4 = 0 { <(2i,j), (2i+1,j)> for all i, j} when T mod 4 = 1 { <(i,2j-1), (i,2j)> for all i, j} when T mod 4 = 2 and { <(i,2j), (i,2j+1)> for all i, j} when T mod 4 = 3, and all these pairs of processors compare their contents and possibly swap them. It is obvious again that it is a sorting algorithm but what is its time complexity? Is it linear ? is the question. I suspect it is not, because if it were no other sorting alg. for the NxN mesh should have been invented; this seems far simpler than any other I have seen. But I do not have a bad example for this alg. Do you? Thanks in advance Michalis Kolountzakis, Mathematics Dept, Univ. of Crete, Greece e-mail: koloutsa@grcrvax1.bitnet ------------------------------------------------------------------------ Date: Tue, 24 Jan 89 21:28 O From: Michalis Kolountzakis(Math Dept U of Crete Greece) <KOLOUTSAKIS@GRCRVAX1> Subject: More on sorting on the NxN mesh To: theorynt@ndsuvm1 On the problem I have just mailed to the list (Sorting on the Mesh Connected Computer) I have not given the desirable order on the Mesh for which the algorithm I have described is supposed to work. I cannot expect this order to be any one for which "local order" does not imply "global order", e.g. it cannot be the row-major (lexicographic) order. If we want big elements to go up and left, then the following configuration of values 4 2 3 1 for a 2x2 Mesh is not ordered (2 and 3 should be swapped) but it is locally consistent with the row-major order (that is, every element in the Mesh is consistent with its neighbours.) An order for the Mesh which is good for our purposes is the "snake-like" order for which such a configuration cannot exist and so we can prove that the proposed algorithm is indeed a sorting algorithm. Michalis Kolountzakis, Math. Dept, Univ. of Crete, Greece I would appreciate any suggestions, references or solutions for the following problem: We have a SIMD, 4-connected, NxN Mesh Connected Computer (MCC) and the following algorithm for sorting on it. The algorithm is an extension of the so called "odd-even transposition sorting" (OETS) algorithm for a sequence of N processors: P <--> P <--> P <--> ... <--> P 0 1 2 N-1 OETS works by coupling all processors and performing a compare-and- then-possibly-swap operation on every pair of processors. In even time moments the set P={0,...,N-1} is partitioned in {(0,1), (2,3), ..., (2i,2i+1), ...} and in odd time moments P is parti- tioned in {(1,2), (3,4), ..., (2i-1,2i) ,...} (P and P are not always 0 N-1 in some pair). It is almost obvious that OETS is a sorting algorithm for the sequence of processors but it takes some time to prove that it is linear in N. So, in OETS, each prossesor compares its contents, first with its left neighbour and then with its right neighbour and so on. In two dimensions the most reasonable extension of OETS would be to couple each processor first with its upper neighbour, then with its lower, its left and finally irs right neighbour and perform a compare- and-then-possibly-swap operation. More precisely in time T the NxN grid G={(i,j): i, j in P /* i for rows */} is partitioned in { <(2i-1,j), (2i, j)> for all i, j } when T mod 4 = 0 { <(2i,j), (2i+1,j)> for all i, j} when T mod 4 = 1 { <(i,2j-1), (i,2j)> for all i, j} when T mod 4 = 2 and { <(i,2j), (i,2j+1)> for all i, j} when T mod 4 = 3, and all these pairs of processors compare their contents and possibly swap them. It is obvious again that it is a sorting algorithm but what is its time complexity? Is it linear ? is the question. I suspect it is not, because if it were no other sorting alg. for the NxN mesh should have been invented; this seems far simpler than any other I have seen. But I do not have a bad example for this alg. Do you? Thanks in advance Michalis Kolountzakis, Mathematics Dept, Univ. of Crete, Greece e-mail: koloutsa@grcrvax1.bitnet ------------------------------------------------------------------------ Date: Tue, 24 Jan 89 21:28 O From: Michalis Kolountzakis(Math Dept U of Crete Greece) <KOLOUTSAKIS@GRCRVAX1> Subject: More on sorting on the NxN mesh To: theorynt@ndsuvm1 On the problem I have just mailed to the list (Sorting on the Mesh Connected Computer) I have not given the desirable order on the Mesh for which the algorithm I have described is supposed to work. I cannot expect this order to be any one for which "local order" does not imply "global order", e.g. it cannot be the row-major (lexicographic) order. If we want big elements to go up and left, then the following configuration of values 4 2 3 1 for a 2x2 Mesh is not ordered (2 and 3 should be swapped) but it is locally consistent with the row-major order (that is, every element in the Mesh is consistent with its neighbours.) An order for the Mesh which is good for our purposes is the "snake-like" order for which such a configuration cannot exist and so we can prove that the proposed algorithm is indeed a sorting algorithm. Michalis Kolountzakis, Math. Dept, Univ. of Crete, Greece
halldors@paul.rutgers.edu (Magnus M Halldorsson) (01/28/89)
Preliminary, empirical investigation of the problem indicates that the algorithm given is linear. It runs consistently in around 5n parallel compare-exchange (cmp-xchg) operations. I conjecture a 6n upper bound (cmp-xchg, or 12n routing operations). This is quite good for such a simple algorithm. I am surprised of not having seen this algorithm in the literature, despite a fairly exhaustive search. Has somebody seen it before? However,it is nowhere near optimal, especially considering that it requires alternating comparisons (odd rows sort left-right, even rows right-left). Magnus
landman@Sun.COM (Howard A. Landman) (01/30/89)
In article <4196@hubcap.UUCP> KOLOUTSAKIS%grcrvax1.bitnet@RELAY.CS.NET (Math Dept U of Crete Greece Michalis Kolountzakis) writes: >I would appreciate any suggestions, references or solutions for the following >problem: > We have a SIMD, 4-connected, NxN Mesh Connected Computer (MCC) and > the following > algorithm for sorting on it. ... > It is almost obvious that OETS is a sorting algorithm > for the sequence of processors but it takes some time to prove that it > is linear in N. However, proving the lower bound is trivial. If we assume the reverse-ordered case as input, the sum of the distances from initial to final position is quadratic in the number of processors, and the rate of decrease in this sum is at best linear in the number of processors; so the run time must be at best linear. > It is obvious again that it is a sorting algorithm but what is its > time complexity? Is it linear ? is the question. I suspect it is not, > because if it were no other sorting alg. for the NxN mesh should have > been invented; this seems far simpler than any other I have seen. It's pretty clear that this is no worse than the previous algorithm. And, when we attempt the above lower bound, we find that since the sum may decrease by as much as N*sqrt(N), we can only establish a lower bound of order sqrt(N). (This bound, by the way, applies to ANY CONCEIVABLE algorithm on the MCC, not just this one.) I suspect that 2D OETS actually achieves this lower bound complexity, which would make it SUB-linear in N. Howard A. Landman landman@hanami.sun.com
jfbuss%water.waterloo.edu@RELAY.CS.NET (Jonathan Buss) (01/31/89)
See Schnorr and Shamir, "An optimal sorting algorithm for mesh-connected computers," Eighteenth Ann. Symp. on Theory of Computing (1986) p. 255. They show that 3n steps is optimal, and give a simple algorithm to achieve the bound. Jonathan Buss
dan@csun1.cs.uga.edu (Dan Gordon) (01/31/89)
In article <4225@hubcap.UUCP> halldors@paul.rutgers.edu (Magnus M Halldorsson) writes: >Preliminary, empirical investigation of the problem indicates that the >algorithm given is linear. It runs consistently in around 5n parallel >compare-exchange (cmp-xchg) operations. I conjecture a 6n upper bound >(cmp-xchg, or 12n routing operations). > Similar algorithms have been looked at before, sorting on a mesh by alternately sorting in rows and columns (although in these, one step usually consists of completely sorting a row or column using the odd-even transposition sort, not just doing one transposition). The most recent reference is an article by Marberg and Gafni in Algorithmica last year. I'd be surprised if your conjecture is correct. Most such algorithms end up at least a factor of log n below optimal. Marberg and Gafni gave an algorithm which sorted in O(n) steps, but it required several new ideas to achieve that. I recently wrote a paper which analyzes the same type of algorithm on a 2^n hypercube, where it requires O(log^2 n) time, also a factor of log n worse than optimal Dan Gordon Depts. of CS and Math University of Georgia
halldors@athos.rutgers.edu (Halldorsson) (02/03/89)
I wrote: > Preliminary, empirical investigation of the problem indicates that the > algorithm given is linear. I conjecture a 6n upper bound > (cmp-xchg, or 12n routing operations). Unfortunately, this has turned out not to be true. The algorithm has a worst case complexity of O(n^2). Consider these two kinds of input: 1) Fill the upper triangular matrix of 1's, and the lower of 0's. 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 1 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 The algorithm sorts it in n^2/2 + 3n cmp-xchg operations. 2) Divide the columns into four sections, fill the middle two with 1's, and the left and the right sections with 0's. 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 The algorithm will sort it in n(n+1)/2 + 4 cmp-xchg operations. > It runs consistently in around 5n parallel > compare-exchange (cmp-xchg) operations. That is true, and the average case appears to be somewhere between 4n and 5n, hence the algorithm might actually be quite useful in practice. Magnus Note: These results have been gotten by simulation. They have not been proven.