@xlab1.uucp (01/19/91)
I am not sure if this question has been asked before... Supposing I have two files with three collumns in each. How do I merge the files and generate a single file with six or more collumns using shell script? for example if File A has collumns a, c, e and File B has collumns b, d, f. I want to generate File C with collumns a,b,c,d,e,f. Also it would be nice to be able to using the arithematic feature in awk... Finally, how do u specify the "rest of the line" in awk?? thanks ashok
tchrist@convex.COM (Tom Christiansen) (01/20/91)
From the keyboard of @xlab1.uucp (): : I am not sure if this question has been asked before... : : Supposing I have two files with three collumns in each. How do : I merge the files and generate a single file with six or more : collumns using shell script? for example if File A has collumns a, c, e : and File B has collumns b, d, f. I want to generate File C : with collumns a,b,c,d,e,f. Also it would be nice to be able to : using the arithematic feature in awk... This originally went also to comp.unix.internals. I sure wouldn't say an awk question is a unix internal. Someone out there may have as paste solution, but I didn't see one. In old, standard awk, it's really quite cumbersome, as you have to read in all the first file, then all the second file. I find this to be a pretty cumbersome solution. #!/bin/awk -f { a[NR] = $1; b[NR] = $2; c[NR] = $3; } END { count = NR/2; for (i = 1; i <= count; i++) { print a[i], a[i+count], b[i], b[i+count], c[i], c[i+count]; } } In gawk (and nawk if you're rich), it's a little easier because you can redirect getilne from a file, effectively reading two lines and writing one line each iteration. #!/usr/gnu/bin/gawk -f BEGIN { for (;;) { if ((getline < ARGV[1]) <= 0) break; a = $1; c = $2; e = $3; if ((getline < ARGV[2]) <= 0) break; b = $1; d = $2; f = $3; print a, b, c, d, e, f; } } It's also pretty easy in perl: #!/usr/bin/perl $[ = 1; $, = " "; $\ = "\n"; # awk emulation open(F1, $ARGV[1]); open(F2, $ARGV[2]); while ( (@a = split(' ',<F1>)) && (@b = split(' ', <F2>)) ) { print $a[1], $b[1], $a[2], $b[2], $a[3], $b[3]; } Other advantages of perl are: 1) you get better error messages for syntax errors 2) you can symbolically debug your program 3) no limits on lines/fields (gawk is better than nawk at this) 4) can often be made to run faster than awk 5) better usage and i/o failure error messages (i didn't do this here) If you only have awk and not gawk and perl, you should get them, because they are both free and compile on a vast array (list? :-) of platforms. Find them wherever GNUware is stored. : Finally, how do u specify the "rest of the line" in awk?? I'm not really sure what you mean. The whole line is $0. What's the rest of the line? You mean fields past the third one? --tom -- "Hey, did you hear Stallman has replaced /vmunix with /vmunix.el? Now he can finally have the whole O/S built-in to his editor like he always wanted!" --me (Tom Christiansen <tchrist@convex.com>)
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (01/21/91)
In article <1991Jan19.194124.2335@convex.com> tchrist@convex.COM (Tom Christiansen) writes: > : for example if File A has collumns a, c, e > : and File B has collumns b, d, f. I want to generate File C > : with collumns a,b,c,d,e,f. Also it would be nice to be able to > : using the arithematic feature in awk... > Someone out there may have as paste solution, but I didn't see one. Is that a challenge? #!/bin/sh # untested, but too simple to fail in strange ways # type X as tab awk '{ print $1; print $2; print $3 }' < "$1" > /tmp/file1.$$ awk '{ print $1; print $2; print $3 }' < "$2" > /tmp/file2.$$ paste /tmp/file1.$$ /tmp/file2.$$ | ( while read i do read j; read k echo "$iX$jX$k" done ) rm /tmp/file1.$$ /tmp/file2.$$ ---Dan
mrd@ecs.soton.ac.uk (Mark Dobie) (01/22/91)
In <3404@d75.UUCP> @xlab1.uucp writes: > Finally, how do u specify the "rest of the line" in awk?? I am a relative beginner with awk, but I ran into this problem too. My solution was to set the fields I wasn't interested in to "" and then use $0. eg $1 = "" ; print "rest of line is " $0 Is this a good way? Mark. -- Mark Dobie M.Dobie@uk.ac.soton.ecs (JANET) University of Southampton M.Dobie@ecs.soton.ac.uk (Bitnet)
gwc@root.co.uk (Geoff Clare) (01/23/91)
In article <1991Jan19.194124.2335@convex.com> tchrist@convex.COM (Tom Christiansen) writes: > : for example if File A has collumns a, c, e > : and File B has collumns b, d, f. I want to generate File C > : with collumns a,b,c,d,e,f. Also it would be nice to be able to > : using the arithematic feature in awk... > Someone out there may have as paste solution, but I didn't see one. In <25041:Jan2017:21:1491@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: }Is that a challenge? } #!/bin/sh } # untested, but too simple to fail in strange ways } # type X as tab } awk '{ print $1; print $2; print $3 }' < "$1" > /tmp/file1.$$ } awk '{ print $1; print $2; print $3 }' < "$2" > /tmp/file2.$$ } paste /tmp/file1.$$ /tmp/file2.$$ | ( } while read i } do } read j; read k } echo "$iX$jX$k" } done } ) } rm /tmp/file1.$$ /tmp/file2.$$ That is really gross! Try this: paste "$1" "$2" | awk '{print $1, $4, $2, $5, $3, $6}' -- Geoff Clare <gwc@root.co.uk> (Dumb American mailers: ...!uunet!root.co.uk!gwc) UniSoft Limited, London, England. Tel: +44 71 729 3773 Fax: +44 71 729 3273
martin@mwtech.UUCP (Martin Weitzel) (01/30/91)
In article <3404@d75.UUCP> @xlab1.uucp () writes: > Supposing I have two files with three collumns in each. How do > I merge the files and generate a single file with six or more > collumns using shell script? for example if File A has collumns a, c, e > and File B has collumns b, d, f. I want to generate File C > with collumns a,b,c,d,e,f. Also it would be nice to be able to > using the arithematic feature in awk... IMHO this is not feasable with OLD "awk" for LARGE files. Small files could be saved in an associative array. awk ' FILENAME == "first" { line[NR] = $0 } FILENAME == "second" { print line[++i] " " $0 } ' first second Of course, UNIX has enough friendly commands to help you, e.g.: pr -tm first second | awk '{ whatever you like }' With NEW "awk" (nawk) merging is feasable, e.g: nawk '{ printf "%s ", $0 getline < "second" print }' first > Finally, how do u specify the "rest of the line" in awk?? I don't quite understand this. Do you mean the following: 33.5 ZZZ 4564.334 foo bar ^^^^^^^--- processed as "rest of line" ^^^^ ^^^ ^^^^^^^^ ---------- processed as $1, $2, $3 In this case there are several solutions: If in your input data the first three fields always occupy the same space, say 18 chars, you can access the "rest of line" as substr($0, 19). If the $1..$3 have no equal witdh, but you are sure that there is only one separator between them, you may sum them up and get the rest of the line with substr($0, length($1) + length($2) + length($3) + 3). In any case my advice would be - if possible - to re-design your input data, e.g. to put some unique separator before the "rest of the line, say: 33.5 ZZZ 4564.334 !foo bar ^------------ unique, i.e. must not appear as part of $1, $2, $3 or the rest of the line. Then you can use split($0, xx, "!") and access the rest of the line with xx[2]. My general observation is that "awk" is a real "power tool", but to get out most of it with not too complicated programs you should obbey certain design criteria for your input data, e.g. you should use unique separators in a hierachical way: XXX:a,b,c:YYYYYYY ZZZZZ ^^^^^---- $2 ^^^^^^^^^^^^^^^^^ --------- $1 ^^^^^ ----------------- split($1, xx, ":") -> xx[2] ^ ----------------- split(xx[2], yy, ",") -> yy[3] Some other small hint: It's trivial to design a "comment feature" for your input data using the familiar style that every line starting with a "#" is thrown away. The following is an excerpt which can be found in many of my "awk"-programs: awk ' /^[ \t]*#/ { next; } ...... ...... rest of program ...... ' -- Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83
pbiron@weber.ucsd.edu (Paul Biron) (02/02/91)
In article <1070@mwtech.UUCP> martin@mwtech.UUCP (Martin Weitzel) writes: >In article <3404@d75.UUCP> @xlab1.uucp () writes: [stuff deleted] >> Finally, how do u specify the "rest of the line" in awk?? > >I don't quite understand this. Do you mean the following: > > 33.5 ZZZ 4564.334 foo bar > ^^^^^^^--- processed as "rest of line" > ^^^^ ^^^ ^^^^^^^^ ---------- processed as $1, $2, $3 > >first three fields always occupy the same space, say 18 chars, you >can access the "rest of line" as substr($0, 19). > >If the $1..$3 have no equal witdh, but you are sure that there is >only one separator between them, you may sum them up and get the rest >of the line with substr($0, length($1) + length($2) + length($3) + 3). > [stuff deleted] Another, albeit data dependent, way to do "rest of line" in {n,g}awk is the following: #!/usr/local/bin/gawk -f { start = index ($0, $3) print "first is", $1, $2 print "rest is", substr ($0, start) } This assumes that what you want to process as the "rest of line" does not occur in the "first" part of the line. While in general I agree with Martin that structuring your input data makes your life a lot easier when it comes to writing awk scripts, however, that is not always possible. I don't know wheter it *is* possible in the case of the original poster. Hope this helps, -------------------------------------------------------------------------------- STOP THE WAR IN THE GULF --- NOW !!!!!!! -------------------------------------------------------------------------------- Paul Biron pbiron@ucsd.edu (619) 534-5758
sleepy@wybbs.mi.org (Mike Faber) (02/04/91)
In article <1070@mwtech.UUCP> you write: >In article <3404@d75.UUCP> @xlab1.uucp () writes: >> Supposing I have two files with three collumns in each. How do >> I merge the files and generate a single file with six or more >> collumns using shell script? for example if File A has collumns a, c, e >> and File B has collumns b, d, f. I want to generate File C >> with collumns a,b,c,d,e,f. Also it would be nice to be able to >> using the arithematic feature in awk... > >IMHO this is not feasable with OLD "awk" for LARGE files. [Good discussion of old/new awk and solution] Aren't we overlooking the easy solution here? paste -d"|" filea fileb | awk -F"|" ' { printf("%s %s %s %s %s %s\n", \ $1,$3,$5,$2,$4,$6) } ' >outputfile OK, it's brute force, but it's simple, easy to read, and flexible, in case the file changes. -- Michael Faber sleepy@wybbs.uucp