[alt.sources.wanted] matrix invert routine

rkc@XN.LL.MIT.EDU (rkc) (11/09/89)

I have "spreadsheet-like" data that looks like this:
	a1 b1 c1
	a2 b2 c2
	a3 b3 c3
and I want to get it in a form like:
	a1 a2 a3
	b1 b2 b3
	c1 c2 c3
Before I re-invent the wheel, does anyone have anything that will do this?  Is
there perhaps a unix tool that does this sort of thing?
Email preferred.
	Thanks,
	-Rob

root@cca.ucsf.edu (Systems Staff) (11/10/89)

In article <1612@xn.LL.MIT.EDU>, rkc@XN.LL.MIT.EDU (rkc) writes:
> I have "spreadsheet-like" data that looks like this:
> 	a1 b1 c1
> 	a2 b2 c2
> 	a3 b3 c3
> and I want to get it in a form like:
> 	a1 a2 a3
> 	b1 b2 b3
> 	c1 c2 c3

A small matter of terminology:

      This operation is transposing, not inverting.

Inverting a matrix is analogous to taking the reciprocal of
a scalar.

Transposition would apply perfectly well to a matrix whose elements
are not numeric.

If your data is in the form of a multi-column text file you could do
the transpose with standard Unix tools (I would look at awk first)
but if it is in some internal form for a spreadsheet program I would
look at the speadsheet first.

 Thos Sumner       Internet: thos@cca.ucsf.edu
 (The I.G.)        UUCP: ...ucbvax!ucsfcgl!cca.ucsf!thos
                   BITNET:  thos@ucsfcca

 U.S. Mail:  Thos Sumner, Computer Center, Rm U-76, UCSF
             San Francisco, CA 94143-0704 USA

I hear nothing in life is certain but death and taxes -- and they're
working on death.

#include <disclaimer.std>


 Thos Sumner       Internet: thos@cca.ucsf.edu
 (The I.G.)        UUCP: ...ucbvax!ucsfcgl!cca.ucsf!thos
                   BITNET:  thos@ucsfcca

 U.S. Mail:  Thos Sumner, Computer Center, Rm U-76, UCSF
             San Francisco, CA 94143-0704 USA

I hear nothing in life is certain but death and taxes -- and they're
working on death.

#include <disclaimer.std>

bzs@world.std.com (Barry Shein) (11/10/89)

>I have "spreadsheet-like" data that looks like this:
>	a1 b1 c1
>	a2 b2 c2
>	a3 b3 c3
>and I want to get it in a form like:
>	a1 a2 a3
>	b1 b2 b3
>	c1 c2 c3

I'm too lazy to do all the work but here's the basic attack, it can
easily be done in a shell loop:

Assume you have 3 columns as shown:

	#!/bin/sh
	cp /dev/null outfile
	for i in 1 2 3
	do
		cut -f$i datafile | tr '\012' '\011' >> outfile
		echo '' >> outfile
	end
	ed - outfile <<'EOF'
	1,$s/.$//
	w
	q
	EOF

Explanation: "cut" cuts vertical fields separated by tabs (or a
specified character, see man page), the rest is pretty easy. "Tr"
replaces all the newlines with tabs thus flattening the column you cut
into a row, the echo puts a newline on the end of each line.  The ed
script fixes the fact that you ended up with an extra tab on each line
(the very last newline got changed to a tab.)

There's probably some other way to do this (avoiding the final ed
script) but this will certainly work and I think that's what you were
after.
-- 
        -Barry Shein

Software Tool & Die, Purveyors to the Trade         | bzs@world.std.com
1330 Beacon St, Brookline, MA 02146, (617) 739-0202 | {xylogics,uunet}world!bzs

sun@me.utoronto.ca (Andy Sun Anu-guest) (11/11/89)

In article <4573@emory.mathcs.emory.edu> arnold@emory.UUCP (Arnold D. Robbins {EUCC}) writes:
>>In article <1612@xn.LL.MIT.EDU>, rkc@XN.LL.MIT.EDU (rkc) writes:
>>> I have "spreadsheet-like" data that looks like this:
>>> 	a1 b1 c1
>>> 	a2 b2 c2
>>> 	a3 b3 c3
>>> and I want to get it in a form like:
>>> 	a1 a2 a3
>>> 	b1 b2 b3
>>> 	c1 c2 c3
>
>In article <2560@ucsfcca.ucsf.edu> root@cca.ucsf.edu (Systems Staff) writes:
>>the transpose with standard Unix tools (I would look at awk first) ....
>
>Aha! No sooner said than done.  From the 2.11 gawk.texinfo manual:
>
>-- Begin Quote --
>The following example treats its input as a two-dimensional array of
>fields; it rotates this array 90 degrees clockwise and prints the
>result.  It assumes that all lines have the same number of
>elements.
> [program deleted]
>
>When given the input:
>
>	1 2 3 4 5 6
>	2 3 4 5 6 1
>	3 4 5 6 1 2
>	4 5 6 1 2 3
>
>it produces:
>
>	4 3 2 1
>	5 4 3 2
>	6 5 4 3
>	1 6 5 4
>	2 1 6 5
>	3 2 1 6
>
>-- End Quote --

Someone have brought it up already but I might as well clarify this again
(since it seems to wander further and further from the original thing). 
What the original poster wanted is called TRANSPOSE, or in other words, 
row-to-column, column-to-row operations. It's like reflecting the matrix about 
its diagonal. Matrix inversion is a totally different thing.

Now, the above "quote" was doing yet another thing, neither inverse or
transpose. It simply rotates the matrix clockwise 90 degrees (as it said
up there). Do we have an agreement on what "invert" or "transpose" are
or the definitions vary from place to place?

Andy
-- 

_______________________________________________________________________________
Andy Sun                        | Internet: sun@me.utoronto.ca
University of Toronto, Canada   | UUCP    : csri.toronto.edu!me.utoronto.ca!sun
Dept. of Mechanical Engineering | BITNET  : sun@me.utoronto.BITNET

tchrist@convex.COM (Tom Christiansen) (11/11/89)

In article <1989Nov9.233341.14788@world.std.com> bzs@world.std.com (Barry Shein) writes:

|>I have "spreadsheet-like" data that looks like this:
|>	a1 b1 c1
|>	a2 b2 c2
|>	a3 b3 c3
|>and I want to get it in a form like:
|>	a1 a2 a3
|>	b1 b2 b3
|>	c1 c2 c3
|

|Assume you have 3 columns as shown:
|
|	#!/bin/sh
|	cp /dev/null outfile
|	for i in 1 2 3
|	do
|		cut -f$i datafile | tr '\012' '\011' >> outfile
|		echo '' >> outfile
|	end
|	ed - outfile <<'EOF'
|	1,$s/.$//
|	w
|	q
|	EOF

I had to change the "end" to "done" (too much csh, barry?  :-)
and add "-d' '" to the cut line to make this work.  This is a nice
example of using small, dedicated UNIX tools to do your job.  It
is also a good example of why doing this is terribly slow.  

Here's the perl version.  It's bit more general than the sh version
because it doesn't have to be hacked to work for different sized
matrices or different files, but does basically the same thing:

    #!/usr/local/bin/perl

    $[ = 1;    # array should start at 1 not 0

    while (<>) {
	split;
	$rows++;
	$cols = $#_ if $#_ > $cols;
	for ($j = 1; $j <= $#_; $j++) {
	    $matrix{$rows,$j} = $_[$j];
	} 
    } 

    for ($i = 1; $i <= $rows; $i++) {
	for ($j = 1; $j <= $cols; $j++) {
	    print $matrix{$j,$i}, " ";
	} 
	print "\n";
    } 


Here are the timings for the original dataset.  I ran both tests
twice to get everything in memory.  All timings were run on a Convex-C1.  

% /bin/time xpose.sh 
        2.2 real        0.2 user        1.6 sys
% /bin/time xpose.perl < mat > outfile
        0.4 real        0.0 user        0.2 sys

That's of 5.5 times longer on real time, call it twice as long system,
and 8 times as long on system for the sh version.

This is not a great test because of the resolution.  Let's
try it on this dataset:

% cat mat2
00 01 02 03 04 05 06 07 08 09 
10 11 12 13 14 15 16 17 18 19 
20 21 22 23 24 25 26 27 28 29 
30 31 32 33 34 35 36 37 38 39 
40 41 42 43 44 45 46 47 48 49 
50 51 52 53 54 55 56 57 58 59 
60 61 62 63 64 65 66 67 68 69 
70 71 72 73 74 75 76 77 78 79 
80 81 82 83 84 85 86 87 88 89 
90 91 92 93 94 95 96 97 98 99 

Now I'll go hack on xpose.sh to make it work for square matrices of
order 10 (and coming from file mat2 not mat) and rerun the timings.

% /bin/time xpose.sh   
        6.6 real        0.8 user        4.4 sys
% /bin/time xpose.perl < mat2 > outfile
        0.9 real        0.5 user        0.2 sys

That's more than 7.3x the real, 1.6x the user, and 22x on system
time for the sh version.

'Nuff said.

--tom

    Tom Christiansen                       {uunet,uiucdcs,sun}!convex!tchrist 
    Convex Computer Corporation                            tchrist@convex.COM
		 "EMACS belongs in <sys/errno.h>: Editor too big!"

tchrist@convex.COM (Tom Christiansen) (11/11/89)

In article <1989Nov9.233341.14788@world.std.com> bzs@world.std.com (Barry Shein) writes:

|>I have "spreadsheet-like" data that looks like this:
|>	a1 b1 c1
|>	a2 b2 c2
|>	a3 b3 c3
|>and I want to get it in a form like:
|>	a1 a2 a3
|>	b1 b2 b3
|>	c1 c2 c3
|

|Assume you have 3 columns as shown:
|
|	#!/bin/sh
|	cp /dev/null outfile|	for i in 1 2 3
|	do
|		cu| -f$i datafile | tr '\012' '\011' >> outfile
|		echo '' >> outfmle
|	end
|	ed - outfile <<'EOF'|	1,$s/.$//
|	w|	q
|	EOF

I hal to change the &end" to "done" ,too much csl, barry?  :-)
and add "-d' '" to the cut line to make this work.  This is a nice
example of using small, dedicated UNIX tools to do your job.  It
is also a good example of why doing this is terribly slow.  

Here's the perl version.  It's bit more general than the sh version
because it doesn't have to be hacked to work for different sized
matrices or different files, but does basically the same thing:

    #!/usr/local/bin/perl

    $[ = 1;    # array should start at 1 not 0

    while (<>) {
	split;
	$rows++;
	$cols = $#_ if $#_ > $cols;
	for ($j = 1; $j <= $#_; $j++) {
	    $matrix{$rows,$j} = $_[$j];
	} 
    } 

    for ($i = 1; $i <= $rows; $i++) {
	for ($j = 1; $j <= $cols; $j++) {
	    print $matrix{$j,$i}, " ";
	} 
	print "\n";
    } 


Here are the timings for the original dataset.  I ran both tests
twice to get everything in memory.  All timings were run on a Convex-C1.  

% /bin/time xpose.sh 
        2.2 real        0.2 user        1.6 sys
% /bin/time xpose.perl < mat > outfile
        0.4 real        0.0 user        0.2 sys

That's of 5.5 times longer on real time, call it twice as long system,
and 8 times as long on system for the sh version.

This is not a great test because of the resolution.  Let's
try it on this dataset:

% cat mat2
00 01 02 03 04 05 06 07 08 09 
10 11 12 13 14 15 16 17 18 19 
20 21 22 23 24 25 26 27 28 29 
30 31 32 33 34 35 36 37 38 39 
40 41 42 43 44 45 46 47 48 49 
50 51 52 53 54 55 56 57 58 59 
60 61 62 63 64 65 66 67 68 69 
70 71 72 73 74 75 76 77 78 79 
80 81 82 83 84 85 86 87 88 89 
90 91 92 93 94 95 96 97 98 99 

Now I'll go hack on xpose.sh to make it work for square matrices of
order 10 (and coming from file mat2 not mat) and rerun the timings.

% /bin/time xpose.sh   
        6.6 real        0.8 user        4.4 sys
% /bin/time xpose.perl < mat2 > outfile
        0.9 real        0.5 user        0.2 sys

That's more than 7.3x the real, 1.6x the user, and 22x on system
time for the sh version.

'Nuff said.

--tom

    Tom Christiansen                       {uunet,uiucdcs,sun}!convex!tchrist 
    Convex Computer Corporation                            tchrist@convex.COM
		 "EMACS belongs in <sys/errno.h>: Editor too big!"

arnold@mathcs.emory.edu (Arnold D. Robbins {EUCC}) (11/11/89)

>In article <1612@xn.LL.MIT.EDU>, rkc@XN.LL.MIT.EDU (rkc) writes:
>> I have "spreadsheet-like" data that looks like this:
>> 	a1 b1 c1
>> 	a2 b2 c2
>> 	a3 b3 c3
>> and I want to get it in a form like:
>> 	a1 a2 a3
>> 	b1 b2 b3
>> 	c1 c2 c3

In article <2560@ucsfcca.ucsf.edu> root@cca.ucsf.edu (Systems Staff) writes:
>the transpose with standard Unix tools (I would look at awk first) ....

Aha! No sooner said than done.  From the 2.11 gawk.texinfo manual:

-- Begin Quote --
The following example treats its input as a two-dimensional array of
fields; it rotates this array 90 degrees clockwise and prints the
result.  It assumes that all lines have the same number of
elements.

	awk '{
	     if (max_nf < NF)
	          max_nf = NF
	     max_nr = NR
	     for (x = 1; x <= NF; x++)
	          vector[x, NR] = $x
	}
	
	END {
	     for (x = 1; x <= max_nf; x++) {
	          for (y = max_nr; y >= 1; --y)
	               printf("%s ", vector[x, y])
	          printf("\n")
	     }
	}'


When given the input:

	1 2 3 4 5 6
	2 3 4 5 6 1
	3 4 5 6 1 2
	4 5 6 1 2 3

it produces:

	4 3 2 1
	5 4 3 2
	6 5 4 3
	1 6 5 4
	2 1 6 5
	3 2 1 6

-- End Quote --
-- 
Arnold Robbins -- Emory U. Information Technology Div.  | Laundry increases
DOMAIN: arnold@emoryu1.cc.emory.edu			| exponentially in the
UUCP: gatech!emoryu1!arnold  PHONE: +1 404 727-7636	| number of children.
BITNET: arnold@emoryu1	     FAX:   +1 404 727-2599	|   -- Miriam Hartholz

merlyn@iwarp.intel.com (Randal Schwartz) (11/11/89)

In article <1612@xn.LL.MIT.EDU>, rkc@XN (rkc) writes:
| I have "spreadsheet-like" data that looks like this:
| 	a1 b1 c1
| 	a2 b2 c2
| 	a3 b3 c3
| and I want to get it in a form like:
| 	a1 a2 a3
| 	b1 b2 b3
| 	c1 c2 c3
| Before I re-invent the wheel, does anyone have anything that will do this?  Is
| there perhaps a unix tool that does this sort of thing?

Perl, anyone?  (Eeekkk... it's the Perl monster!)

#!/usr/bin/perl

while (<>) {
	@F = split;
	$fn = 0;
	for $f (@F) {
		$final{++$fn} .= "$f ";
	}
	$maxfn = $fn if $maxfn < $fn;
}

for ($fn = 1; $fn <= $maxfn; $fn++) {
	($_ = $final{$fn}) =~ s/ $//;
	print "$_\n";
}

There, not quite one line, but it'll do. :-)

Just another Perl hacker,
-- 
/== Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ====\
| on contract to Intel's iWarp project, Hillsboro, Oregon, USA, Sol III  |
| merlyn@iwarp.intel.com ...!uunet!iwarp.intel.com!merlyn	         |
\== Cute Quote: "Welcome to Oregon... Home of the California Raisins!" ==/

karish@forel.stanford.edu (Chuck Karish) (11/11/89)

In article <1612@xn.LL.MIT.EDU>, rkc@XN (rkc) writes:
| I have "spreadsheet-like" data that looks like this:
| 	a1 b1 c1
| 	a2 b2 c2
| 	a3 b3 c3
| and I want to get it in a form like:
| 	a1 a2 a3
| 	b1 b2 b3
| 	c1 c2 c3
| Before I re-invent the wheel, does anyone have anything that will do this?  Is
| there perhaps a unix tool that does this sort of thing?

The `rs' utility distributed with 4.2 BSD UNIX does exactly this
sort of thing.  Pipe your data through `rs -t' and it will come
out transposed.  It was in /usr/src/new on the distribution tape.

	Chuck Karish		karish@mindcraft.com
	(415) 493-9000		karish@forel.stanford.edu

peter@ontmoh.UUCP (Peter Renzland) (11/14/89)

awk '{for (i=1;i<=NF;i++) a[NR,i]=$i}  # transpose a matrix: a[i,j] <--> a[j,i]
END {for (j=1;j<=NF;j++) {for (i=1;i<=NR;i++) printf("%s ",a[i,j]); print}}' $*