plantz@manta.NOSC.MIL (Glen W. Plantz) (01/01/91)
I posted an "awk" question several weeks ago, that I'm still having trouble with. I have a "modified" version of the same question here. Any help would be appreciated. I need to use awk to scan a file that consists of lines, each line starting with an integer, followed by a _LONG_ (paragraph) line of text that should have a period at the end of the paragraph, and then a newline character following that. The script should save the number at the beginning of the line, and if the line does not have a "period" prior to the "newline", to print out a message, with the integer number that was at the beginning of the line. The problem I've had so far with our version of "awk", is that the lines (paragraphs) that have too many fields cause an error of the type: 547 awk: record ` 479 Provide techn...' has too many fields record number 4 These "_LONG_" lines could have several hundred words on them. How can I get awk or another unix utilitity to process this text?
tchrist@convex.COM (Tom Christiansen) (01/01/91)
From the keyboard of plantz@manta.NOSC.MIL (Glen W. Plantz): :The problem I've had so far with our version of "awk", is that the lines :(paragraphs) that have too many fields cause an error of the type: : 547 :awk: record ` 479 Provide techn...' has too many fields : record number 4 : :These "_LONG_" lines could have several hundred words on them. How can :I get awk or another unix utilitity to process this text? Run your awk script through the awk-to-perl translator, a2p, then run perl on the resulting script as perl has no such limitations. By default, the translator will convert awk splits into perl splits with a maximum number of resulting field of 999, but you can easily increase or remove that restriction. --tom -- Tom Christiansen tchrist@convex.com convex!tchrist "With a kernel dive, all things are possible, but it sure makes it hard to look at yourself in the mirror the next morning." -me
skwu@boulder.Colorado.EDU (WU SHI-KUEI) (01/02/91)
In article <1990Dec31.200723.7929@convex.com> tchrist@convex.COM (Tom Christiansen) writes: >From the keyboard of plantz@manta.NOSC.MIL (Glen W. Plantz): >:The problem I've had so far with our version of "awk", is that the lines >:(paragraphs) that have too many fields cause an error of the type: >: 547 >:awk: record ` 479 Provide techn...' has too many fields >: record number 4 >: >:These "_LONG_" lines could have several hundred words on them. How can >:I get awk or another unix utilitity to process this text? > >Run your awk script through the awk-to-perl translator, a2p, then run perl >on the resulting script ...... No need for 'perl', a boon to the majority of UNIX users that do not use it. Simply replace the first, white space field separator with a some, otherwise unused glyph (i.e. @) using 'sed' and then set the awk FS to that glyph.
tchrist@convex.COM (Tom Christiansen) (01/02/91)
From the keyboard of skwu@spot.Colorado.EDU (WU SHI-KUEI), quoting me: :>Run your awk script through the awk-to-perl translator, a2p, then run perl :>on the resulting script ...... : :No need for 'perl', a boon to the majority of UNIX users that do not use it. :Simply replace the first, white space field separator with a some, otherwise :unused glyph (i.e. @) using 'sed' and then set the awk FS to that glyph. While for this particular application, it may well be that this solution suffices, there remain all kinds of internal limits you're going to run into with awk. Eventually these will annoy you enough to stop using it for large and/or complex problems. For example, if the application were to build an associative array of word-frequencies and you had the tremendously long lines described by the original poster, then awk wouldn't be able to handle it, causing you to go through brain-twisting and gut-wrenching contortions to pound the data back into something awk can handle. Although perl isn't really new anymore, it's still generally perceived to be so, and the resistance to new, useful tools in the community is so high that some people will insist on shooting themselves in the foot using old, limited (and even brain-damaged) software for years in the future. (Yes, I know it's hard to get things standardized across millions of systems, but that shouldn't stop us from striving to forge ahead.) My suspicion is that this is just a manifestation in Unixdom of a principle familiar to sociologists and historians. While the desire to embrace better technology may be somewhat higher amongst computer users than in the general populace, there will always be some who wish to live (if you can call that living) in a totally static environment where nothing ever changes, where no improvement is ever radically different from previous practice, and where the JCL scripts from 25 years ago still function. Use awk while you can. When you can't, be aware that there's an easy, portable, freely-available upgrade path that doesn't require recoding everything in C, and is a lot easier than trying to get AT&T to invest the time in fixing awk. You could reasonably argue that there are actually two such paths, since gawk comes close to meeting these criteria: it has greatly increased the limits of things like line length and number of fields. However, these limits still exist even in gawk, whereas in perl they're entirely removed, so gawk may not be enough. It all depends on the problem. Different problems are often best solved by employing different tools, even if perl is the Swiss army chainsaw of UNIX. --tom -- Tom Christiansen tchrist@convex.com convex!tchrist "With a kernel dive, all things are possible, but it sure makes it hard to look at yourself in the mirror the next morning." -me
skwu@boulder.Colorado.EDU (WU SHI-KUEI) (01/03/91)
In article <1991Jan02.133911.24428@convex.com> tchrist@convex.COM (Tom Christiansen) writes: .....quoting my posting, which quoted his posting, and then continues.... >While for this particular application, it may well be that this solution >suffices, there remain all kinds of internal limits you're going to run into >with awk....... > >Although perl isn't really new anymore, it's still generally perceived to >be so, and the resistance to new, useful tools in the community is so high >that some people will insist on shooting themselves in the foot using old, >limited (and even brain-damaged) software for years in the future..... ...... >While the desire to embrace better >technology may be somewhat higher amongst computer users than in the >general populace, there will always be some who wish to live (if you can >call that living) in a totally static environment where nothing ever >changes, where no improvement is ever radically different from previous >practice, and where the JCL scripts from 25 years ago still function. ........ > Different problems are often best solved by employing >different tools, even if perl is the Swiss army chainsaw of UNIX. Has it ever struck you that perl scripts and JCL code are painfully similar precisely because perl is a Swiss army chainsaw?
tchrist@convex.COM (Tom Christiansen) (01/03/91)
From the keyboard of skwu@spot.Colorado.EDU (WU SHI-KUEI): :Has it ever struck you that perl scripts and JCL code are painfully similar :precisely because perl is a Swiss army chainsaw? Nope, not in the least. Perl highly resembles its predecessors: awk, C, and sed. Pain is a matter of one's own making and perception. You should compare JCL with its UNIX equivalent, the original shell; you know, the one where glob was a separate command. Perl is far more analogous to REXX on modern VM/CMS systems. Only history will tell for sure, of course. --tom -- Tom Christiansen tchrist@convex.com convex!tchrist "With a kernel dive, all things are possible, but it sure makes it hard to look at yourself in the mirror the next morning." -me
lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (01/03/91)
In article <1991Jan2.164006.24557@csn.org> skwu@spot.Colorado.EDU (WU SHI-KUEI) writes:
: Has it ever struck you that perl scripts and JCL code are painfully similar
: precisely because perl is a Swiss army chainsaw?
No, I haven't stopped beating my wife, but her backhand is improving.
Larry Wall
lwall@jpl-devvax.jpl.nasa.gov
kimcm@diku.dk (Kim Christian Madsen) (01/04/91)
skwu@boulder.Colorado.EDU (WU SHI-KUEI) writes: >No need for 'perl', a boon to the majority of UNIX users that do not use it. >Simply replace the first, white space field separator with a some, otherwise >unused glyph (i.e. @) using 'sed' and then set the awk FS to that glyph. That (awk) solution will not work, at least not on most System V systems, since the number of allowable fields in each record is hardcoded into the source code. If you have the source code you can increase the number and recompile. If not I suggest you find another tool, Tom Christensen has provided a pointer to one of the more useful ones. Best Regards Kim Chr. Madsen
alex@am.sublink.org (Alex Martelli) (01/04/91)
tchrist@convex.COM (Tom Christiansen) writes on awk vs perl: ... >that this is just a manifestation in Unixdom of a principle familiar to >sociologists and historians. While the desire to embrace better I rather believe it's a principle more familiar to booksellers - we're just waiting for THE BOOK to get into our little grabby hands!-) I know I don't speak for all Unix-lovers, but I wouldn't use awk, ksh, icon, and so on, so willingly, if each did not have a good-to-great book about it. Great-to-good books ain't all (I *do* use dmake, and *don't* use ratfor, for example...) - but they surely DO help! -- Alex Martelli - (home snailmail:) v. Barontini 27, 40138 Bologna, ITALIA Email: (work:) staff@cadlab.sublink.org, (home:) alex@am.sublink.org Phone: (work:) ++39 (51) 371099, (home:) ++39 (51) 250434; Fax: ++39 (51) 366964 (work only), Fidonet: 332/401.3 (home only).
oz@yunexus.yorku.ca (Ozan Yigit) (01/05/91)
In article <1991Jan02.133911.24428@convex.com> tchrist@convex.COM (Tom Christiansen) writes: >... and the resistance to new, useful tools in the community is so high >that some people will insist on shooting themselves in the foot using old, >limited (and even brain-damaged) software for years in the future. You may want to remind yourself this when the replacement of perl is out. We too, don't like limited (and even brain-damaged) software. oz --- Good design means less design. Design | Internet: oz@nexus.yorku.ca must serve users, not try to fool them. | UUCP: utzoo/utai!yunexus!oz -- Dieter Rams, Chief Designer, Braun. | phonet: 1+ 416 736 5257
john@basho.uucp (John Lacey) (01/07/91)
alex@am.sublink.org (Alex Martelli) writes: >I know I don't speak for all Unix-lovers, but I wouldn't use awk, ksh, >icon, and so on, so willingly, if each did not have a good-to-great >book about it. Great-to-good books ain't all (I *do* use dmake, and >*don't* use ratfor, for example...) - but they surely DO help! Ditto here. In fact, I find that when a program comes with _any_ documentation, it is better than programs that come with none. And better documentation seems to be a good indicator of a better program. These are generalizations, of course, broken from time to time. My favorite examples are TeX (ahh, bliss) and AWK. The best counter-example I know is Microsoft Word for the Macintosh, which has well above average documentation .... -- John Lacey 614 436 3773 73730,2250 john@basho.uucp or basho!john@cis.ohio-state.edu
david@cs.dal.ca (David Trueman) (01/08/91)
In article <1991Jan02.133911.24428@convex.com> tchrist@convex.COM (Tom Christiansen) writes: >You could reasonably argue that there are >actually two such paths, since gawk comes close to meeting these criteria: >it has greatly increased the limits of things like line length and number >of fields. However, these limits still exist even in gawk, whereas in >perl they're entirely removed, so gawk may not be enough. It all depends As the current primary developer of gawk, I would like to say that I am unaware of any such limits in gawk (except the size of an int and the size of your swap space -- limits that I am sure perl also has). If there is any limit, it is an unknown bug that will be fixed -- yes, like perl and unlike some unnamed commercial products, we do fix bugs!! As they say, you get what you pay for. -- {uunet watmath}!dalcs!david or david@cs.dal.ca