mario@wjvax.UUCP (Mario Dona) (04/20/91)
HELP! I have a situation that just cries out for an awk solution, however I'm at a loss over some minor, but important details. I have a list of companies that need to be preprocessed before sending them to our typing department. A simplified portion of the input file as follows: COMPANY1 2800 FULLING P O BOX 3608 HARRISBURG PA 17105 COMPANY2 500 ELM MILWAUKEE WI 53122 COMPANY3 13500 CENTRAL P O BOX 655303 DALLAS TX 75265-5303 ^ ^ ^ ^ ^ | | | | | 1 11 30 47 71 My mission, which I chose to accept, was to reformat the list so that it looks like this: Company1 28 Fulling P O Box 3608 Harrisburg PA 17105 Company2 500 Elm Milwaukee WI 53122 Company3 13500 Central P O Box 655303 Dallas TX 75265-5303 Using the SUBSTR function to get the parts I want was trivial; the problem is, I can't figure out: 1. How to prevent blank lines from printing if there is nothing to print (e.g. the second address in COMPANY2 above). 2. How to concatenate the city and zip fields as shown. 3. If a word is greater than 2 characters, lowercase all letters except for the first character (this is to keep state capitols capitalized). My feeble attempt so far is shown below. If anyone has any ideas, I'd be much obliged. BEGIN { RS="\n" } { name=substr($0,1,10) address1=substr($0,11,19) address2=substr($0,30,17) city=substr($0,47,24) zip=substr($0,71,10) printf("%s\n%s\n%s\n%s\n%s\n\n", name, address1, address2, city, zip) } Mario Dona ...!{ !decwrl!qubix, ames!oliveb!tymix, pyramid}!wjvax!mario The above opinions are mine alone and not, in any way, those of WJ.
goer@ellis.uchicago.edu (Richard L. Goerwitz) (04/20/91)
In article <1817@wjvax.UUCP> mario@wjvax.UUCP (Mario Dona) writes: > >HELP! I have a situation that just cries out for an awk solution, however >I'm at a loss over some minor, but important details. I have a list of >companies that need to be preprocessed before sending them to our typing >department. A simplified portion of the input file as follows: > >COMPANY1 2800 FULLING P O BOX 3608 HARRISBURG PA 17105 >^ ^ ^ ^ ^ >| | | | | >1 11 30 47 71 > >My mission, which I chose to accept, was to reformat the list so that it looks >like this: > >Company1 >28 Fulling >P O Box 3608 >Harrisburg PA 17105 >... Here is one Icon solution. Note that it omits blank lines, capitalizes multi-word city, street, and company names, and removes the annoying space between the P and O in "P O Box." It also inserts three spaces between the state abbreviation and the zipcode (2 or 3 spaces is standard these days). A slight alteration (one line) would be all that you'd need to add in to force all-uppercase company names. Note that I split the line based on the column positions you gave, although I can't imagine how the gatherers of these statistics managed to fit everything into such tight spaces! procedure main() every line := trim(!&input,'\t ') do { line ? { every i := 11|30|47 do write("" ~== capitalize_words(tab(i) \ 1)) writes(capitalize_words(tab(71), 1), " ") write(tab(0), "\n") } } end procedure capitalize(s) s ? (return (move(1) || map(tab(upto('\t ') | 0)) || tab(0)) | "") end procedure capitalize_words(s, sw) s2 := "" trim(s,'\t ') ? { while chunk := capitalize(tab(upto('\t '))) do { s2 ||:= chunk || { if chunk == "P" & =" O " then "O " else " " } tab(many('\t ')) } if \sw & s2 ~== "" then s2 ||:= tab(0) else s2 ||:= capitalize(tab(0)) } return s2 end -- -Richard L. Goerwitz goer%sophist@uchicago.bitnet goer@sophist.uchicago.edu rutgers!oddjob!gide!sophist!goer
lewis@tramp.Colorado.EDU (LEWIS WILLIAM M JR) (04/21/91)
To prevent printing blank lines, simply do (in awk): address = substr(... ) if (length(adress) > 0) print address etc., etc. The upper/lower case problem is more difficult
merlyn@iwarp.intel.com (Randal L. Schwartz) (04/21/91)
In article <1817@wjvax.UUCP>, mario@wjvax (Mario Dona) writes: | HELP! I have a situation that just cries out for an awk solution, however | I'm at a loss over some minor, but important details. Well, to *me* it just cries out for a Perl solution. Try this: while (<>) { s/([A-Z]{3,})/\u\L$1$2/g; ($name,$address1,$address2,$city,$zip) = unpack("A10A19A17A24A*",$_); print "$name\n"; print "$address1\n"; print "$address2\n" if $address2; print "$city $zip\n"; print "\n"; } Works just fine on your test data. print "Just another Perl hacker," -- /=Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ==========\ | on contract to Intel's iWarp project, Beaverton, Oregon, USA, Sol III | | merlyn@iwarp.intel.com ...!any-MX-mailer-like-uunet!iwarp.intel.com!merlyn | \=Cute Quote: "Intel: putting the 'backward' in 'backward compatible'..."====/
goer@ellis.uchicago.edu (Richard L. Goerwitz) (04/21/91)
merlyn@iwarp.intel.com (Randal L. Schwartz) writes: > >while (<>) { > s/([A-Z]{3,})/\u\L$1$2/g; > ($name,$address1,$address2,$city,$zip) = unpack("A10A19A17A24A*",$_); > print "$name\n"; > print "$address1\n"; > print "$address2\n" if $address2; > print "$city $zip\n"; > print "\n"; >} > >Works just fine on your test data. No, no! The gentleman said quite plainly that he wanted his data to look like this: Company1 28 Fulling P O Box 3608 Harrisburg PA 17105 Company2 500 Elm Milwaukee WI 53122 Company3 13500 Central P O Box 655303 Dallas TX 75265-5303 Did you actually try running your perl code? -- -Richard L. Goerwitz goer%sophist@uchicago.bitnet goer@sophist.uchicago.edu rutgers!oddjob!gide!sophist!goer
merlyn@iwarp.intel.com (Randal L. Schwartz) (04/22/91)
In article <1991Apr21.045226.16050@midway.uchicago.edu>, goer@ellis (Richard L. Goerwitz) writes: | merlyn@iwarp.intel.com (Randal L. Schwartz) writes: | > | >while (<>) { | > s/([A-Z]{3,})/\u\L$1$2/g; | > ($name,$address1,$address2,$city,$zip) = unpack("A10A19A17A24A*",$_); | > print "$name\n"; | > print "$address1\n"; | > print "$address2\n" if $address2; | > print "$city $zip\n"; | > print "\n"; | >} | > | >Works just fine on your test data. | | No, no! The gentleman said quite plainly that he wanted his data to | look like this: | | Company1 | 28 Fulling | P O Box 3608 | Harrisburg PA 17105 | | Company2 | 500 Elm | Milwaukee WI 53122 | | Company3 | 13500 Central | P O Box 655303 | Dallas TX 75265-5303 | | Did you actually try running your perl code? Yes. That's exactly what came out. If it didn't come out on *your* Perl, you have an old Perl. (I used the new \u\L operators, if that's what you're objecting to.) I did make a silly typo in the first one. The line: s/([A-Z]{3,})/\u\L$1$2/g; should read: s/([A-Z]{3,})/\u\L$1/g; The $2 was a leftover from doing it as two partial expressions, but then I realized I didn't need to do that. But the code worked in either case, which is why I didn't catch it. :-) This line finds 3 or more letters, and then lowercases all letters after the first. That was part of the spec. (I hope I'm responding to your criticism. It wasn't very specific. But believe me, the code *does* work as requested.) print "Just another Perl hacker," -- /=Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ==========\ | on contract to Intel's iWarp project, Beaverton, Oregon, USA, Sol III | | merlyn@iwarp.intel.com ...!any-MX-mailer-like-uunet!iwarp.intel.com!merlyn | \=Cute Quote: "Intel: putting the 'backward' in 'backward compatible'..."====/
goer@quads.uchicago.edu (Richard L. Goerwitz) (04/22/91)
merlyn@iwarp.intel.com (Randal L. Schwartz) writes (in response to my objection that his Perl code didn't do what's expected): > >Yes. That's exactly what came out. If it didn't come out on *your* >Perl, you have an old Perl. (I used the new \u\L operators, if that's >what you're objecting to.) Understood. I tried it out on perl 3.0 pl 18. I wouldn't call perl 3.0 an "old" perl by any stretch of the imagination, especially since 4.0 just finished coming over the net a couple of days ago! -Richard -- -Richard L. Goerwitz goer%sophist@uchicago.bitnet goer@sophist.uchicago.edu rutgers!oddjob!gide!sophist!goer
harrison@necssd.NEC.COM (Mark Harrison) (04/23/91)
In article <1817@wjvax.UUCP>, mario@wjvax.UUCP (Mario Dona) writes: > HELP! I have a situation that just cries out for an awk solution [converting from] > COMPANY1 2800 FULLING P O BOX 3608 HARRISBURG PA 17105 [to] > Company1 > 28 Fulling > P O Box 3608 > Harrisburg PA 17105 > 1. How to prevent blank lines from printing if there is nothing to print Add this line after your BEGIN rule: /^$/ {next} #skip blank lines If you want to skip lines that may have white space: /^[ \t]*$/ {next} #skip blank (non-text) lines > 2. How to concatenate the city and zip fields as shown. To concatenate: city_and_zip = city " " zip To strip trailing space from city before concatenating: while (substr(city, length(city)) == " ") city = substr(city, 1, length(city) - 1) > 3. If a word is greater than 2 characters, lowercase all letters > except for the first character (this is to keep state capitols > capitalized). This is doable, but not enjoyable. There is more of a chance if you use nawk or gawk. Otherwise, make an array: uc["a"] = "A" ... uc["z"] = "Z" lc["A"] = "a" ... lc["Z"] = "z" and loop for the length of the string: if (uc[substr(str, i, 1)] == "") newstr = newstr substr(str, i , 1) else newstr = uc[substr(str, i, 1)]
martin@mwtech.UUCP (Martin Weitzel) (05/09/91)
In article <1991Apr20.220114.8727@colorado.edu> lewis@tramp.Colorado.EDU (LEWIS WILLIAM M JR) writes: >To prevent printing blank lines, simply do (in awk): > > address = substr(... ) > if (length(adress) > 0) > print address Still simpler: address = substr(... ) if (address) print address Or: # assign substring to adress and print if not empty if (address = substr(... )) print address BE WARNED: I explicitly wrote a comment in front of the if statement. So don't start a discussion thread whether it is obscure, good, bad, professional or whatever programming style to write assignments within conditional contexts :-) -- Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83