rkc@XN.LL.MIT.EDU (rkc) (11/09/89)
I have "spreadsheet-like" data that looks like this: a1 b1 c1 a2 b2 c2 a3 b3 c3 and I want to get it in a form like: a1 a2 a3 b1 b2 b3 c1 c2 c3 Before I re-invent the wheel, does anyone have anything that will do this? Is there perhaps a unix tool that does this sort of thing? Email preferred. Thanks, -Rob
root@cca.ucsf.edu (Systems Staff) (11/10/89)
In article <1612@xn.LL.MIT.EDU>, rkc@XN.LL.MIT.EDU (rkc) writes: > I have "spreadsheet-like" data that looks like this: > a1 b1 c1 > a2 b2 c2 > a3 b3 c3 > and I want to get it in a form like: > a1 a2 a3 > b1 b2 b3 > c1 c2 c3 A small matter of terminology: This operation is transposing, not inverting. Inverting a matrix is analogous to taking the reciprocal of a scalar. Transposition would apply perfectly well to a matrix whose elements are not numeric. If your data is in the form of a multi-column text file you could do the transpose with standard Unix tools (I would look at awk first) but if it is in some internal form for a spreadsheet program I would look at the speadsheet first. Thos Sumner Internet: thos@cca.ucsf.edu (The I.G.) UUCP: ...ucbvax!ucsfcgl!cca.ucsf!thos BITNET: thos@ucsfcca U.S. Mail: Thos Sumner, Computer Center, Rm U-76, UCSF San Francisco, CA 94143-0704 USA I hear nothing in life is certain but death and taxes -- and they're working on death. #include <disclaimer.std> Thos Sumner Internet: thos@cca.ucsf.edu (The I.G.) UUCP: ...ucbvax!ucsfcgl!cca.ucsf!thos BITNET: thos@ucsfcca U.S. Mail: Thos Sumner, Computer Center, Rm U-76, UCSF San Francisco, CA 94143-0704 USA I hear nothing in life is certain but death and taxes -- and they're working on death. #include <disclaimer.std>
bzs@world.std.com (Barry Shein) (11/10/89)
>I have "spreadsheet-like" data that looks like this: > a1 b1 c1 > a2 b2 c2 > a3 b3 c3 >and I want to get it in a form like: > a1 a2 a3 > b1 b2 b3 > c1 c2 c3 I'm too lazy to do all the work but here's the basic attack, it can easily be done in a shell loop: Assume you have 3 columns as shown: #!/bin/sh cp /dev/null outfile for i in 1 2 3 do cut -f$i datafile | tr '\012' '\011' >> outfile echo '' >> outfile end ed - outfile <<'EOF' 1,$s/.$// w q EOF Explanation: "cut" cuts vertical fields separated by tabs (or a specified character, see man page), the rest is pretty easy. "Tr" replaces all the newlines with tabs thus flattening the column you cut into a row, the echo puts a newline on the end of each line. The ed script fixes the fact that you ended up with an extra tab on each line (the very last newline got changed to a tab.) There's probably some other way to do this (avoiding the final ed script) but this will certainly work and I think that's what you were after. -- -Barry Shein Software Tool & Die, Purveyors to the Trade | bzs@world.std.com 1330 Beacon St, Brookline, MA 02146, (617) 739-0202 | {xylogics,uunet}world!bzs
sun@me.utoronto.ca (Andy Sun Anu-guest) (11/11/89)
In article <4573@emory.mathcs.emory.edu> arnold@emory.UUCP (Arnold D. Robbins {EUCC}) writes: >>In article <1612@xn.LL.MIT.EDU>, rkc@XN.LL.MIT.EDU (rkc) writes: >>> I have "spreadsheet-like" data that looks like this: >>> a1 b1 c1 >>> a2 b2 c2 >>> a3 b3 c3 >>> and I want to get it in a form like: >>> a1 a2 a3 >>> b1 b2 b3 >>> c1 c2 c3 > >In article <2560@ucsfcca.ucsf.edu> root@cca.ucsf.edu (Systems Staff) writes: >>the transpose with standard Unix tools (I would look at awk first) .... > >Aha! No sooner said than done. From the 2.11 gawk.texinfo manual: > >-- Begin Quote -- >The following example treats its input as a two-dimensional array of >fields; it rotates this array 90 degrees clockwise and prints the >result. It assumes that all lines have the same number of >elements. > [program deleted] > >When given the input: > > 1 2 3 4 5 6 > 2 3 4 5 6 1 > 3 4 5 6 1 2 > 4 5 6 1 2 3 > >it produces: > > 4 3 2 1 > 5 4 3 2 > 6 5 4 3 > 1 6 5 4 > 2 1 6 5 > 3 2 1 6 > >-- End Quote -- Someone have brought it up already but I might as well clarify this again (since it seems to wander further and further from the original thing). What the original poster wanted is called TRANSPOSE, or in other words, row-to-column, column-to-row operations. It's like reflecting the matrix about its diagonal. Matrix inversion is a totally different thing. Now, the above "quote" was doing yet another thing, neither inverse or transpose. It simply rotates the matrix clockwise 90 degrees (as it said up there). Do we have an agreement on what "invert" or "transpose" are or the definitions vary from place to place? Andy -- _______________________________________________________________________________ Andy Sun | Internet: sun@me.utoronto.ca University of Toronto, Canada | UUCP : csri.toronto.edu!me.utoronto.ca!sun Dept. of Mechanical Engineering | BITNET : sun@me.utoronto.BITNET
tchrist@convex.COM (Tom Christiansen) (11/11/89)
In article <1989Nov9.233341.14788@world.std.com> bzs@world.std.com (Barry Shein) writes: |>I have "spreadsheet-like" data that looks like this: |> a1 b1 c1 |> a2 b2 c2 |> a3 b3 c3 |>and I want to get it in a form like: |> a1 a2 a3 |> b1 b2 b3 |> c1 c2 c3 | |Assume you have 3 columns as shown: | | #!/bin/sh | cp /dev/null outfile | for i in 1 2 3 | do | cut -f$i datafile | tr '\012' '\011' >> outfile | echo '' >> outfile | end | ed - outfile <<'EOF' | 1,$s/.$// | w | q | EOF I had to change the "end" to "done" (too much csh, barry? :-) and add "-d' '" to the cut line to make this work. This is a nice example of using small, dedicated UNIX tools to do your job. It is also a good example of why doing this is terribly slow. Here's the perl version. It's bit more general than the sh version because it doesn't have to be hacked to work for different sized matrices or different files, but does basically the same thing: #!/usr/local/bin/perl $[ = 1; # array should start at 1 not 0 while (<>) { split; $rows++; $cols = $#_ if $#_ > $cols; for ($j = 1; $j <= $#_; $j++) { $matrix{$rows,$j} = $_[$j]; } } for ($i = 1; $i <= $rows; $i++) { for ($j = 1; $j <= $cols; $j++) { print $matrix{$j,$i}, " "; } print "\n"; } Here are the timings for the original dataset. I ran both tests twice to get everything in memory. All timings were run on a Convex-C1. % /bin/time xpose.sh 2.2 real 0.2 user 1.6 sys % /bin/time xpose.perl < mat > outfile 0.4 real 0.0 user 0.2 sys That's of 5.5 times longer on real time, call it twice as long system, and 8 times as long on system for the sh version. This is not a great test because of the resolution. Let's try it on this dataset: % cat mat2 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 Now I'll go hack on xpose.sh to make it work for square matrices of order 10 (and coming from file mat2 not mat) and rerun the timings. % /bin/time xpose.sh 6.6 real 0.8 user 4.4 sys % /bin/time xpose.perl < mat2 > outfile 0.9 real 0.5 user 0.2 sys That's more than 7.3x the real, 1.6x the user, and 22x on system time for the sh version. 'Nuff said. --tom Tom Christiansen {uunet,uiucdcs,sun}!convex!tchrist Convex Computer Corporation tchrist@convex.COM "EMACS belongs in <sys/errno.h>: Editor too big!"
tchrist@convex.COM (Tom Christiansen) (11/11/89)
In article <1989Nov9.233341.14788@world.std.com> bzs@world.std.com (Barry Shein) writes: |>I have "spreadsheet-like" data that looks like this: |> a1 b1 c1 |> a2 b2 c2 |> a3 b3 c3 |>and I want to get it in a form like: |> a1 a2 a3 |> b1 b2 b3 |> c1 c2 c3 | |Assume you have 3 columns as shown: | | #!/bin/sh | cp /dev/null outfile| for i in 1 2 3 | do | cu| -f$i datafile | tr '\012' '\011' >> outfile | echo '' >> outfmle | end | ed - outfile <<'EOF'| 1,$s/.$// | w| q | EOF I hal to change the &end" to "done" ,too much csl, barry? :-) and add "-d' '" to the cut line to make this work. This is a nice example of using small, dedicated UNIX tools to do your job. It is also a good example of why doing this is terribly slow. Here's the perl version. It's bit more general than the sh version because it doesn't have to be hacked to work for different sized matrices or different files, but does basically the same thing: #!/usr/local/bin/perl $[ = 1; # array should start at 1 not 0 while (<>) { split; $rows++; $cols = $#_ if $#_ > $cols; for ($j = 1; $j <= $#_; $j++) { $matrix{$rows,$j} = $_[$j]; } } for ($i = 1; $i <= $rows; $i++) { for ($j = 1; $j <= $cols; $j++) { print $matrix{$j,$i}, " "; } print "\n"; } Here are the timings for the original dataset. I ran both tests twice to get everything in memory. All timings were run on a Convex-C1. % /bin/time xpose.sh 2.2 real 0.2 user 1.6 sys % /bin/time xpose.perl < mat > outfile 0.4 real 0.0 user 0.2 sys That's of 5.5 times longer on real time, call it twice as long system, and 8 times as long on system for the sh version. This is not a great test because of the resolution. Let's try it on this dataset: % cat mat2 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 Now I'll go hack on xpose.sh to make it work for square matrices of order 10 (and coming from file mat2 not mat) and rerun the timings. % /bin/time xpose.sh 6.6 real 0.8 user 4.4 sys % /bin/time xpose.perl < mat2 > outfile 0.9 real 0.5 user 0.2 sys That's more than 7.3x the real, 1.6x the user, and 22x on system time for the sh version. 'Nuff said. --tom Tom Christiansen {uunet,uiucdcs,sun}!convex!tchrist Convex Computer Corporation tchrist@convex.COM "EMACS belongs in <sys/errno.h>: Editor too big!"
arnold@mathcs.emory.edu (Arnold D. Robbins {EUCC}) (11/11/89)
>In article <1612@xn.LL.MIT.EDU>, rkc@XN.LL.MIT.EDU (rkc) writes: >> I have "spreadsheet-like" data that looks like this: >> a1 b1 c1 >> a2 b2 c2 >> a3 b3 c3 >> and I want to get it in a form like: >> a1 a2 a3 >> b1 b2 b3 >> c1 c2 c3 In article <2560@ucsfcca.ucsf.edu> root@cca.ucsf.edu (Systems Staff) writes: >the transpose with standard Unix tools (I would look at awk first) .... Aha! No sooner said than done. From the 2.11 gawk.texinfo manual: -- Begin Quote -- The following example treats its input as a two-dimensional array of fields; it rotates this array 90 degrees clockwise and prints the result. It assumes that all lines have the same number of elements. awk '{ if (max_nf < NF) max_nf = NF max_nr = NR for (x = 1; x <= NF; x++) vector[x, NR] = $x } END { for (x = 1; x <= max_nf; x++) { for (y = max_nr; y >= 1; --y) printf("%s ", vector[x, y]) printf("\n") } }' When given the input: 1 2 3 4 5 6 2 3 4 5 6 1 3 4 5 6 1 2 4 5 6 1 2 3 it produces: 4 3 2 1 5 4 3 2 6 5 4 3 1 6 5 4 2 1 6 5 3 2 1 6 -- End Quote -- -- Arnold Robbins -- Emory U. Information Technology Div. | Laundry increases DOMAIN: arnold@emoryu1.cc.emory.edu | exponentially in the UUCP: gatech!emoryu1!arnold PHONE: +1 404 727-7636 | number of children. BITNET: arnold@emoryu1 FAX: +1 404 727-2599 | -- Miriam Hartholz
merlyn@iwarp.intel.com (Randal Schwartz) (11/11/89)
In article <1612@xn.LL.MIT.EDU>, rkc@XN (rkc) writes: | I have "spreadsheet-like" data that looks like this: | a1 b1 c1 | a2 b2 c2 | a3 b3 c3 | and I want to get it in a form like: | a1 a2 a3 | b1 b2 b3 | c1 c2 c3 | Before I re-invent the wheel, does anyone have anything that will do this? Is | there perhaps a unix tool that does this sort of thing? Perl, anyone? (Eeekkk... it's the Perl monster!) #!/usr/bin/perl while (<>) { @F = split; $fn = 0; for $f (@F) { $final{++$fn} .= "$f "; } $maxfn = $fn if $maxfn < $fn; } for ($fn = 1; $fn <= $maxfn; $fn++) { ($_ = $final{$fn}) =~ s/ $//; print "$_\n"; } There, not quite one line, but it'll do. :-) Just another Perl hacker, -- /== Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ====\ | on contract to Intel's iWarp project, Hillsboro, Oregon, USA, Sol III | | merlyn@iwarp.intel.com ...!uunet!iwarp.intel.com!merlyn | \== Cute Quote: "Welcome to Oregon... Home of the California Raisins!" ==/
karish@forel.stanford.edu (Chuck Karish) (11/11/89)
In article <1612@xn.LL.MIT.EDU>, rkc@XN (rkc) writes: | I have "spreadsheet-like" data that looks like this: | a1 b1 c1 | a2 b2 c2 | a3 b3 c3 | and I want to get it in a form like: | a1 a2 a3 | b1 b2 b3 | c1 c2 c3 | Before I re-invent the wheel, does anyone have anything that will do this? Is | there perhaps a unix tool that does this sort of thing? The `rs' utility distributed with 4.2 BSD UNIX does exactly this sort of thing. Pipe your data through `rs -t' and it will come out transposed. It was in /usr/src/new on the distribution tape. Chuck Karish karish@mindcraft.com (415) 493-9000 karish@forel.stanford.edu
peter@ontmoh.UUCP (Peter Renzland) (11/14/89)
awk '{for (i=1;i<=NF;i++) a[NR,i]=$i} # transpose a matrix: a[i,j] <--> a[j,i] END {for (j=1;j<=NF;j++) {for (i=1;i<=NR;i++) printf("%s ",a[i,j]); print}}' $*