root@csoftec.csf.com (Cliff Manis (cmanis@csoftec)) (12/14/89)
I am needing help with data conversion, and would appreciate help in AWK or SED and/or awk & sed. Or whatever.... While attempting to convert thie data for sometime with 'awk', I have learned a lot, but still need some help with this one. (here is the ASCII datafile I have) AA1|name1|4you|ABC Co|3|56|a|bbb|c|d|eeee|fff|g|1| AA1|name1|4you|5th ST||||||||||2| AA1|name1|4you|Go4it, TX||||||||||3| ZZ1|name2|5ght|SED Co.|4|88|b|ccc|c|d|eee2|ff2|h|1| JJ1|name3|6ghi|AWK Inc.|4|98|c|ddd|e|f|gggg|ff3|i|1| JJ1|name3|6ghi|2nd St||||||||||2| JJ1|name3|6ghi|POB 34||||||||||3| JJ1|name3|6ghi|Town, USA||||||||||4| BB1|name4|7mob|B H Co|2|56|d|eee|f|g|hhhh|ii4|j|1| BB1|name4|7mob|Athens, TN||||||||||2| Please note that some records have one line to 4 lines of data, and the address field is field four. The last field is the sub-record number. (and this is the datafile configuration which I need from the data above) AA1|name1|4you|ABC Co|5th ST|Go4it, TX||3|56|a|bbb|c|d|eeee|fff|g|1 ZZ1|name2|5ght|SED Co.||||4|88|b|ccc|c|d|eee2|ff2|h|1 JJ1|name3|6ghi|AWK Inc.|2nd St|POB 34|Town, USA|4|98|c|ddd|e|f|gggg|ff3|i|1 BB1|name4|7mob|B H Co|Athens, TN|||2|56|d|eee|f|g|hhhh|ii4|j|1 Thanks in advance for any help sent to me or over the net. As anyone can readily see, the top datafile is a BIG waste of space.....cliff ______________________________________________________ + Unix & Xenix today for a better tommorrow! + + Cliff Manis ARPA: hcscm@afbs.af.mil + + INTERNET: cmanis@csoftec.csf.com + + UUCP: {dpmizar|texbell|swrinde}!csoftec!cmanis + -
rock@rancho.uucp (Rock Kent) (12/14/89)
<note I have deleted comp.unix.wizards . . . I'm sure they already know this.> On 13 Dec 89 23:53:07 GMT, Cliff Manis(cmanis@csoftec) said: Cliff> I am needing help with data conversion, and would appreciate help Cliff> in AWK or SED and/or awk & sed. [to convert from] Cliff> AA1|name1|4you|ABC Co|3|56|a|bbb|c|d|eeee|fff|g|1| Cliff> AA1|name1|4you|5th ST||||||||||2| Cliff> AA1|name1|4you|Go4it, TX||||||||||3| [to] Cliff> AA1|name1|4you|ABC Co|5th ST|Go4it, TX||3|56|a|bbb|c|d|eeee|fff|g|1 Try: awk -f reform.awk inputfile where reform.awk is as follows: ---------snip-snip-snip-snip--------------------------------------------------- BEGIN{FS="|"} { if (prior != $1 $2 $3) { if (prior != ""){ for (j=1;j<17;j++) {printf "%s|", line[j]}; printf "%s\n", line[17]; }; prior = $1 $2 $3; split($0,line,"|"); for (i=17;i>7;i--){line[i]=line[i-3]}; for (i=5;i<8;i++){line[i]=""}; l=4} else { l=l+1; line[l]=$4;} } END{ for (j=1;j<17;j++) {printf "%s|", line[j]}; printf "%s\n", line[17]; } ---------snip-snip-snip-snip--------------------------------------------------- *************************************************************************** *Rock Kent rock@rancho.uucp POB 8964, Rancho Sante Fe, CA. 92067* ***************************************************************************
tchrist@convex.COM (Tom Christiansen) (12/14/89)
In article <539@csoftec.csf.com> root@csoftec.csf.com (Cliff Manis (cmanis@csoftec)) writes: >I am needing help with data conversion, and would appreciate help >in AWK or SED and/or awk & sed. Or whatever.... If you'll accept perl (version 3.0 because I like the array slices), this code does it for you, assuming the input files are really as shown: #!/usr/bin/perl while (<>) { chop; split(/\|/); if ( $_[$#_] == 1 ) { &print_line unless $#addrs < 0; @preamble = @_[0..2]; @addrs = @_[3]; @postamble = @_[4..$#_]; } else { push(@addrs, $_[3]); } } &print_line; sub print_line { push(@addrs,'') while $#addrs < 3; print join ('|', @preamble, @addrs, @postamble), "\n"; } I thought it a bit odd that the input data had a trailing pipe and the requested output data didn't, but that's what the guy wanted. If he really does want a trailing pipe, change the last print line to: print join ('|', @preamble, @addrs, @postamble, "\n"); Tom Christiansen {uunet,uiucdcs,sun}!convex!tchrist Convex Computer Corporation tchrist@convex.COM "EMACS belongs in <sys/errno.h>: Editor too big!"
johnson@ncrons.StPaul.NCR.COM (Wayne D. Johnson) (12/15/89)
In article <539@csoftec.csf.com> root@csoftec.csf.com (Cliff Manis (cmanis@csoftec)) writes: >(here is the ASCII datafile I have) > >AA1|name1|4you|ABC Co|3|56|a|bbb|c|d|eeee|fff|g|1| >AA1|name1|4you|5th ST||||||||||2| >AA1|name1|4you|Go4it, TX||||||||||3| >ZZ1|name2|5ght|SED Co.|4|88|b|ccc|c|d|eee2|ff2|h|1| >JJ1|name3|6ghi|AWK Inc.|4|98|c|ddd|e|f|gggg|ff3|i|1| >JJ1|name3|6ghi|2nd St||||||||||2| >JJ1|name3|6ghi|POB 34||||||||||3| >JJ1|name3|6ghi|Town, USA||||||||||4| >BB1|name4|7mob|B H Co|2|56|d|eee|f|g|hhhh|ii4|j|1| >BB1|name4|7mob|Athens, TN||||||||||2| > Interesting little bugger... The first problem is to identify the records. The easiest way to do that is with awk. Try search paremeters like: /1|$/ /2|$/ This will seperate the records so you can process them. The 1| part matches the 1| in the record and the $ guarentees that it will only match when the 1| is the last characters of the line. The processing should be somthing like: /1|$/ { addr1=$4; f1=$1; f2=$2; f3=$3...} /2|$/ { addr2=$4; ...} /3|$/ { addr3=$4; ...} /4|$/ { addr4=$4; ...} This will save off the contents of the address(?) field from record 1 and 2, you will need to have a process for each record type. This allows you to process all the records but how about outputing them. You first need to detect when the line read is part of a new record. I like to use somthing like: NR == 1 { old = $1} This will set the variable old to the first field of the first record ($1 != old) || END { print f1 f2 f3 addr1 addr2 addr3 addr4 ... addr1="" addr2="" addr3="" addr4="" old=$1 } This will detect when you have read the first line of the next record. When this happens, you print out the information you have gathered from the last record and set your old variable to the new key. Lastly, you need to define your field seperator character as | so you put IFS="|" and OFS="|" on the awk command line. I've been pretty general and breif about this whole thing, if you need more help or have a question about anything, just email me. Disclaimer: I have not run any of this code, but it should work. note that the code listed has some generalities to it (i.e. "...") that means additional code could be placed here. I wasn't shure what your knowlege of awk was so I tried to be pretty simple. -- Wayne Johnson | Is a baby's life worth more than the right to NCR Comten, Inc. | make a choice? Babies are people too. Roseville MN 55113 +----------------------------------------------------- (Voice) 612-638-7665 (E-MAIL) W.Johnson@StPaul.NCR.COM
lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (12/15/89)
In article <539@csoftec.csf.com> root@csoftec.csf.com (Cliff Manis (cmanis@csoftec)) writes: : I am needing help with data conversion, and would appreciate help : in AWK or SED and/or awk & sed. Or whatever.... Here is, of course, the answer in "whatever": :-) #!/usr/bin/perl @key = ('','',''); while (<>) { ($key,$addr,$rest,$sub) = /^([^|]*\|[^|]*\|[^|]*)\|([^|]*)\|(.*(\d))/; &printone if $key ne $lastkey; $addr[$sub-1] = $addr; } &printone; sub printone { print join('|',$lastkey,@addr,$lastrest),"\n" if $lastkey ne ''; $lastkey = $key; $lastrest = $rest; @addr = ('','','',''); } I think you should find this fairly readable. If you're uncomfortable with the hairy regular expression, here's one that uses split instead: #!/usr/bin/perl @key = ('','',''); while (<>) { (@key[0..2],$addr,@remainder) = split(/[|]/,$_,14); $sub = pop(@remainder); $key = join('|',@key); &printone if $key ne $lastkey; $addr[$sub-1] = $addr; } &printone; sub printone { print join('|',$lastkey,@addr,@lastremainder,"1\n") if $lastkey ne ''; $lastkey = $key; @lastremainder = @remainder; @addr = ('','','',''); } Larry Wall lwall@jpl-devvax.jpl.nasa.gov