[comp.unix.questions] matrix invert routine

rkc@XN.LL.MIT.EDU (rkc) (11/09/89)

I have "spreadsheet-like" data that looks like this:
	a1 b1 c1
	a2 b2 c2
	a3 b3 c3
and I want to get it in a form like:
	a1 a2 a3
	b1 b2 b3
	c1 c2 c3
Before I re-invent the wheel, does anyone have anything that will do this?  Is
there perhaps a unix tool that does this sort of thing?
Email preferred.
	Thanks,
	-Rob

root@cca.ucsf.edu (Systems Staff) (11/10/89)

In article <1612@xn.LL.MIT.EDU>, rkc@XN.LL.MIT.EDU (rkc) writes:
> I have "spreadsheet-like" data that looks like this:
> 	a1 b1 c1
> 	a2 b2 c2
> 	a3 b3 c3
> and I want to get it in a form like:
> 	a1 a2 a3
> 	b1 b2 b3
> 	c1 c2 c3

A small matter of terminology:

      This operation is transposing, not inverting.

Inverting a matrix is analogous to taking the reciprocal of
a scalar.

Transposition would apply perfectly well to a matrix whose elements
are not numeric.

If your data is in the form of a multi-column text file you could do
the transpose with standard Unix tools (I would look at awk first)
but if it is in some internal form for a spreadsheet program I would
look at the speadsheet first.

 Thos Sumner       Internet: thos@cca.ucsf.edu
 (The I.G.)        UUCP: ...ucbvax!ucsfcgl!cca.ucsf!thos
                   BITNET:  thos@ucsfcca

 U.S. Mail:  Thos Sumner, Computer Center, Rm U-76, UCSF
             San Francisco, CA 94143-0704 USA

I hear nothing in life is certain but death and taxes -- and they're
working on death.

#include <disclaimer.std>

 Thos Sumner       Internet: thos@cca.ucsf.edu
 (The I.G.)        UUCP: ...ucbvax!ucsfcgl!cca.ucsf!thos
                   BITNET:  thos@ucsfcca

 U.S. Mail:  Thos Sumner, Computer Center, Rm U-76, UCSF
             San Francisco, CA 94143-0704 USA

I hear nothing in life is certain but death and taxes -- and they're
working on death.

#include <disclaimer.std>

bzs@world.std.com (Barry Shein) (11/10/89)

>I have "spreadsheet-like" data that looks like this:
>	a1 b1 c1
>	a2 b2 c2
>	a3 b3 c3
>and I want to get it in a form like:
>	a1 a2 a3
>	b1 b2 b3
>	c1 c2 c3

I'm too lazy to do all the work but here's the basic attack, it can
easily be done in a shell loop:

Assume you have 3 columns as shown:

	#!/bin/sh
	cp /dev/null outfile
	for i in 1 2 3
	do
		cut -f$i datafile | tr '\012' '\011' >> outfile
		echo '' >> outfile
	end
	ed - outfile <<'EOF'
	1,$s/.$//
	w
	q
	EOF

Explanation: "cut" cuts vertical fields separated by tabs (or a
specified character, see man page), the rest is pretty easy. "Tr"
replaces all the newlines with tabs thus flattening the column you cut
into a row, the echo puts a newline on the end of each line.  The ed
script fixes the fact that you ended up with an extra tab on each line
(the very last newline got changed to a tab.)

There's probably some other way to do this (avoiding the final ed
script) but this will certainly work and I think that's what you were
after.
-- 
        -Barry Shein

Software Tool & Die, Purveyors to the Trade         | bzs@world.std.com
1330 Beacon St, Brookline, MA 02146, (617) 739-0202 | {xylogics,uunet}world!bzs

tchrist@convex.COM (Tom Christiansen) (11/11/89)

In article <1989Nov9.233341.14788@world.std.com> bzs@world.std.com (Barry Shein) writes:

|>I have "spreadsheet-like" data that looks like this:
|>	a1 b1 c1
|>	a2 b2 c2
|>	a3 b3 c3
|>and I want to get it in a form like:
|>	a1 a2 a3
|>	b1 b2 b3
|>	c1 c2 c3
|

|Assume you have 3 columns as shown:
|
|	#!/bin/sh
|	cp /dev/null outfile
|	for i in 1 2 3
|	do
|		cut -f$i datafile | tr '\012' '\011' >> outfile
|		echo '' >> outfile
|	end
|	ed - outfile <<'EOF'
|	1,$s/.$//
|	w
|	q
|	EOF

I had to change the "end" to "done" (too much csh, barry?  :-)
and add "-d' '" to the cut line to make this work.  This is a nice
example of using small, dedicated UNIX tools to do your job.  It
is also a good example of why doing this is terribly slow.  

Here's the perl version.  It's bit more general than the sh version
because it doesn't have to be hacked to work for different sized
matrices or different files, but does basically the same thing:

    #!/usr/local/bin/perl

    $[ = 1;    # array should start at 1 not 0

    while (<>) {
	split;
	$rows++;
	$cols = $#_ if $#_ > $cols;
	for ($j = 1; $j <= $#_; $j++) {
	    $matrix{$rows,$j} = $_[$j];
	} 
    } 

    for ($i = 1; $i <= $rows; $i++) {
	for ($j = 1; $j <= $cols; $j++) {
	    print $matrix{$j,$i}, " ";
	} 
	print "\n";
    } 


Here are the timings for the original dataset.  I ran both tests
twice to get everything in memory.  All timings were run on a Convex-C1.  

% /bin/time xpose.sh 
        2.2 real        0.2 user        1.6 sys
% /bin/time xpose.perl < mat > outfile
        0.4 real        0.0 user        0.2 sys

That's of 5.5 times longer on real time, call it twice as long system,
and 8 times as long on system for the sh version.

This is not a great test because of the resolution.  Let's
try it on this dataset:

% cat mat2
00 01 02 03 04 05 06 07 08 09 
10 11 12 13 14 15 16 17 18 19 
20 21 22 23 24 25 26 27 28 29 
30 31 32 33 34 35 36 37 38 39 
40 41 42 43 44 45 46 47 48 49 
50 51 52 53 54 55 56 57 58 59 
60 61 62 63 64 65 66 67 68 69 
70 71 72 73 74 75 76 77 78 79 
80 81 82 83 84 85 86 87 88 89 
90 91 92 93 94 95 96 97 98 99 

Now I'll go hack on xpose.sh to make it work for square matrices of
order 10 (and coming from file mat2 not mat) and rerun the timings.

% /bin/time xpose.sh   
        6.6 real        0.8 user        4.4 sys
% /bin/time xpose.perl < mat2 > outfile
        0.9 real        0.5 user        0.2 sys

That's more than 7.3x the real, 1.6x the user, and 22x on system
time for the sh version.

'Nuff said.

--tom

    Tom Christiansen                       {uunet,uiucdcs,sun}!convex!tchrist 
    Convex Computer Corporation                            tchrist@convex.COM
		 "EMACS belongs in <sys/errno.h>: Editor too big!"

tchrist@convex.COM (Tom Christiansen) (11/11/89)

In article <1989Nov9.233341.14788@world.std.com> bzs@world.std.com (Barry Shein) writes:

|>I have "spreadsheet-like" data that looks like this:
|>	a1 b1 c1
|>	a2 b2 c2
|>	a3 b3 c3
|>and I want to get it in a form like:
|>	a1 a2 a3
|>	b1 b2 b3
|>	c1 c2 c3
|

|Assume you have 3 columns as shown:
|
|	#!/bin/sh
|	cp /dev/null outfile|	for i in 1 2 3
|	do
|		cu| -f$i datafile | tr '\012' '\011' >> outfile
|		echo '' >> outfmle
|	end
|	ed - outfile <<'EOF'|	1,$s/.$//
|	w|	q
|	EOF

I hal to change the &end" to "done" ,too much csl, barry?  :-)
and add "-d' '" to the cut line to make this work.  This is a nice
example of using small, dedicated UNIX tools to do your job.  It
is also a good example of why doing this is terribly slow.  

Here's the perl version.  It's bit more general than the sh version
because it doesn't have to be hacked to work for different sized
matrices or different files, but does basically the same thing:

    #!/usr/local/bin/perl

    $[ = 1;    # array should start at 1 not 0

    while (<>) {
	split;
	$rows++;
	$cols = $#_ if $#_ > $cols;
	for ($j = 1; $j <= $#_; $j++) {
	    $matrix{$rows,$j} = $_[$j];
	} 
    } 

    for ($i = 1; $i <= $rows; $i++) {
	for ($j = 1; $j <= $cols; $j++) {
	    print $matrix{$j,$i}, " ";
	} 
	print "\n";
    } 


Here are the timings for the original dataset.  I ran both tests
twice to get everything in memory.  All timings were run on a Convex-C1.  

% /bin/time xpose.sh 
        2.2 real        0.2 user        1.6 sys
% /bin/time xpose.perl < mat > outfile
        0.4 real        0.0 user        0.2 sys

That's of 5.5 times longer on real time, call it twice as long system,
and 8 times as long on system for the sh version.

This is not a great test because of the resolution.  Let's
try it on this dataset:

% cat mat2
00 01 02 03 04 05 06 07 08 09 
10 11 12 13 14 15 16 17 18 19 
20 21 22 23 24 25 26 27 28 29 
30 31 32 33 34 35 36 37 38 39 
40 41 42 43 44 45 46 47 48 49 
50 51 52 53 54 55 56 57 58 59 
60 61 62 63 64 65 66 67 68 69 
70 71 72 73 74 75 76 77 78 79 
80 81 82 83 84 85 86 87 88 89 
90 91 92 93 94 95 96 97 98 99 

Now I'll go hack on xpose.sh to make it work for square matrices of
order 10 (and coming from file mat2 not mat) and rerun the timings.

% /bin/time xpose.sh   
        6.6 real        0.8 user        4.4 sys
% /bin/time xpose.perl < mat2 > outfile
        0.9 real        0.5 user        0.2 sys

That's more than 7.3x the real, 1.6x the user, and 22x on system
time for the sh version.

'Nuff said.

--tom

    Tom Christiansen                       {uunet,uiucdcs,sun}!convex!tchrist 
    Convex Computer Corporation                            tchrist@convex.COM
		 "EMACS belongs in <sys/errno.h>: Editor too big!"

arnold@mathcs.emory.edu (Arnold D. Robbins {EUCC}) (11/11/89)

>In article <1612@xn.LL.MIT.EDU>, rkc@XN.LL.MIT.EDU (rkc) writes:
>> I have "spreadsheet-like" data that looks like this:
>> 	a1 b1 c1
>> 	a2 b2 c2
>> 	a3 b3 c3
>> and I want to get it in a form like:
>> 	a1 a2 a3
>> 	b1 b2 b3
>> 	c1 c2 c3

In article <2560@ucsfcca.ucsf.edu> root@cca.ucsf.edu (Systems Staff) writes:
>the transpose with standard Unix tools (I would look at awk first) ....

Aha! No sooner said than done.  From the 2.11 gawk.texinfo manual:

-- Begin Quote --
The following example treats its input as a two-dimensional array of
fields; it rotates this array 90 degrees clockwise and prints the
result.  It assumes that all lines have the same number of
elements.

	awk '{
	     if (max_nf < NF)
	          max_nf = NF
	     max_nr = NR
	     for (x = 1; x <= NF; x++)
	          vector[x, NR] = $x
	}
	
	END {
	     for (x = 1; x <= max_nf; x++) {
	          for (y = max_nr; y >= 1; --y)
	               printf("%s ", vector[x, y])
	          printf("\n")
	     }
	}'


When given the input:

	1 2 3 4 5 6
	2 3 4 5 6 1
	3 4 5 6 1 2
	4 5 6 1 2 3

it produces:

	4 3 2 1
	5 4 3 2
	6 5 4 3
	1 6 5 4
	2 1 6 5
	3 2 1 6

-- End Quote --
-- 
Arnold Robbins -- Emory U. Information Technology Div.  | Laundry increases
DOMAIN: arnold@emoryu1.cc.emory.edu			| exponentially in the
UUCP: gatech!emoryu1!arnold  PHONE: +1 404 727-7636	| number of children.
BITNET: arnold@emoryu1	     FAX:   +1 404 727-2599	|   -- Miriam Hartholz

merlyn@iwarp.intel.com (Randal Schwartz) (11/11/89)

In article <1612@xn.LL.MIT.EDU>, rkc@XN (rkc) writes:
| I have "spreadsheet-like" data that looks like this:
| 	a1 b1 c1
| 	a2 b2 c2
| 	a3 b3 c3
| and I want to get it in a form like:
| 	a1 a2 a3
| 	b1 b2 b3
| 	c1 c2 c3
| Before I re-invent the wheel, does anyone have anything that will do this?  Is
| there perhaps a unix tool that does this sort of thing?

Perl, anyone?  (Eeekkk... it's the Perl monster!)

#!/usr/bin/perl

while (<>) {
	@F = split;
	$fn = 0;
	for $f (@F) {
		$final{++$fn} .= "$f ";
	}
	$maxfn = $fn if $maxfn < $fn;
}

for ($fn = 1; $fn <= $maxfn; $fn++) {
	($_ = $final{$fn}) =~ s/ $//;
	print "$_\n";
}

There, not quite one line, but it'll do. :-)

Just another Perl hacker,
-- 
/== Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ====\
| on contract to Intel's iWarp project, Hillsboro, Oregon, USA, Sol III  |
| merlyn@iwarp.intel.com ...!uunet!iwarp.intel.com!merlyn	         |
\== Cute Quote: "Welcome to Oregon... Home of the California Raisins!" ==/

peter@ontmoh.UUCP (Peter Renzland) (11/14/89)

awk '{for (i=1;i<=NF;i++) a[NR,i]=$i}  # transpose a matrix: a[i,j] <--> a[j,i]
END {for (j=1;j<=NF;j++) {for (i=1;i<=NR;i++) printf("%s ",a[i,j]); print}}' $*

ok@mudla.cs.mu.OZ.AU (Richard O'Keefe) (11/17/89)

In article <6979@convex.UUCP>, tchrist@convex.COM (Tom Christiansen) writes:
> This is a nice example of using small, dedicated UNIX tools to do your job.
> It is also a good example of why doing this is terribly slow.  

> Here's [a] perl version.  It's bit more general than the sh version
> because it doesn't have to be hacked to work for different sized
> matrices or different files, but does basically the same thing:
	[deleted]

> This is not a great test because of the resolution.  Let's
> try it on [a 10 by 10 matrix.] [The result is] more than 7.3x the real,
> 1.6x the user, and 22x on system ... on a Convex-C1. ... 'Nuff said.

> 	shell cut	6.6 real 0.8 user 4.4 sys
>	perl		0.9 real 0.5 user 0.2 sys

Not _quite_ enough said.  I timed three versions of this in the Bourne
shell on a discless Sun-3/50 serving off an Encore.

	echo `cut`	7.1 real 1.4 user 4.1 sys
	read|sort|read	2.8 real 0.6 user 1.0 sys
	read|nsort|read	1.9 real 0.4 user 0.7 sys

echo `cut`	: the version Christiansen criticised
read|sort|read	: copy the file as <colno><linemark><item> using a
		  while read ... loop, sort the result using sort(1),
		  flatten the result using another while read ... loop.
		  (Transposing by sorting is old hat, and while read is
		  the standard way to read things in the shell.)
read|nsort|read	: same as above, except using a faster sort program.

Given the difference in I/O systems, we can't compare the times of the
two machines.  What we _do_ notice is that a different shell program
using out-of-the-box SysV stuff ran twice as fast as the version that
Christiansen criticised, and that simply plugging in a faster version
of one of the standard commands chopped another third off the time.

To compare _one_ shell program with _one_ Perl program tells us little
about either the shell or about Perl.

gwc@root.co.uk (Geoff Clare) (11/17/89)

In article <4573@emory.mathcs.emory.edu> arnold@emory.UUCP (Arnold D. Robbins {EUCC}) writes:
>>In article <1612@xn.LL.MIT.EDU>, rkc@XN.LL.MIT.EDU (rkc) writes:
>>> [ request for matrix transposing utility ]
>
>In article <2560@ucsfcca.ucsf.edu> root@cca.ucsf.edu (Systems Staff) writes:
>>the transpose with standard Unix tools (I would look at awk first) ....
>
>Aha! No sooner said than done.  From the 2.11 gawk.texinfo manual:
>
>	awk '{
>	     if (max_nf < NF)
>	          max_nf = NF
>	     max_nr = NR
>	     for (x = 1; x <= NF; x++)
>	          vector[x, NR] = $x
>	}
>	
>	END {
>	     for (x = 1; x <= max_nf; x++) {
>	          for (y = max_nr; y >= 1; --y)
>	               printf("%s ", vector[x, y])
>	          printf("\n")
>	     }
>	}'

There are a number of problems with this answer.  Firstly the request
was for a program to *transpose* a matrix, not *rotate* it.  Secondly I
wouldn't call gawk a "standard" UNIX tool, however the program can
easily be made portable to all versions of awk by replacing vector[a, b]
with vector[a "," b] as necessary.  Thirdly the two dimensional array is
actually unnecessary and slows the execution down considerably.  A more
efficient method is to build the rows of the new matrix on the fly.

Putting all this together, here is my solution for a matrix transposing
utility using (portable) awk.

exec awk '{
	if (NF > n)
		n = NF
	for (i = 1; i <= NF; i++)
		row[i] = row[i] " " $i
}
END {
	for (i = 1; i <= n; i++)
		print row[i]
}' ${1+"$@"}

A modification so that it doesn't put an extra space character on the
front of each line is left as an exercise for the reader.

-- 
Geoff Clare, UniSoft Limited, Saunderson House, Hayne Street, London EC1A 9HH
gwc@root.co.uk  (Dumb mailers: ...!uunet!root.co.uk!gwc)  Tel: +44-1-315-6600