djo7613@milton.u.washington.edu (Dick O'Connor) (03/06/91)
In order to follow an ancient bitpath left over from the days of HASP and card punch queues, I need to split a file of 150 character lines into a file of lines 80 characters wide or less. Back when our Cyber was active, we did this with a simple Fortran program; now I've got Ultrix and even perl (!) at my disposal, and this problem cries out for a filter, not a pgm. Our old program copied the first 78 characters on a line to the output file after prepending and appending the character 'A'. Characters 79-150 were written to line 2 of the output file with prepended and appended 'B'. A simple routine glued things back together at the other end. Trouble is, I'm enough of a novice that I can't see the simple Unix way. All of the utilities I've looked at would be a lot happier if I could move entire lines or fields. But this is a file of typical scientific data; no tabs, blanks may or may not separate fields depending on the width of the (right-justified) number in that field, and an overall adherence to a pre-defined "format." After searching the perl man page, I'm stuck. I keep thinking split can do what I need, but the examples don't make it clear. I'd prefer a perl solution to one using "other" utilities; any takers? It's that or I use f77, and we wouldn't want to do that, would we?? :) Thanks! "Moby" Dick O'Connor djo7613@u.washington.edu Washington Department of Fisheries *I brake for salmonids*
merlyn@iwarp.intel.com (Randal L. Schwartz) (03/07/91)
In article <17806@milton.u.washington.edu>, djo7613@milton (Dick O'Connor) writes: | Our old program copied the first 78 characters on a line to the output file | after prepending and appending the character 'A'. Characters 79-150 were | written to line 2 of the output file with prepended and appended 'B'. A | simple routine glued things back together at the other end. To split'em: perl -pe 'chop; ($a,$b) = unpack("a78a*",$_); $_ = "A${a}A\nB${b}B\n";' To join'em: (presuming alternating lines of A's and B's from above): perl -pe 's/^A(.*)A\n$/$1/ || s/^B(.*)B\n$/$1\n/;' If you can have an A line without a B line, you will need to maintain state between lines. That's an exercise for you. print "Just another Perl hacker," -- /=Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ==========\ | on contract to Intel's iWarp project, Beaverton, Oregon, USA, Sol III | | merlyn@iwarp.intel.com ...!any-MX-mailer-like-uunet!iwarp.intel.com!merlyn | \=Cute Quote: "Intel: putting the 'backward' in 'backward compatible'..."====/
eichin@athena.mit.edu (Mark W. Eichin) (03/07/91)
[My apologies for the tutorial style below; I'm writing this for the reader that doesn't know perl at all, but needs to use it. I welcome technical corrections publicly, and style comments privately...] pack/unpack does exactly what you want. The man page isn't all that clear on this, though I think the Camel Book has examples which make it clear... the pack string is almost exactly analogous to the FORMAT statement in Fortran (or rather, FORTRAN, since I mean the "classic" versions as opposed to the new standards effort) to the extent that someone could probably write a translator with little difficulty. As for your particular example, the one-liner: perl -ne '@two=unpack("a78a*",$_); print "A",$two[0],"\nB",$two[1];' should do it. Data follows. unpack is taking the current line ($_) and unpacking it into a string of 78 chars and a string of "thre rest" (*) and leaving the results into an array called "two" (@two). Then it's printing the A, the first element of @two ($two[0] - arrays start at zero, like C, by default, though you can set a variable to adjust that), then the newline and the B ("\nB"), and then the second element of two (which *already* contains the trailing newline... $_ is the *entire* line, and we never did a chop to split off the newline so it is still there. Using "a78a72" would have also chopped off the newline, as it is the 151st character...) The -n wraps a loop around the whole thing, the -e indicates that we're putting the line right here instead of off in a script. I hope this helps; I didn't really want to provide a naked one-liner, thus the windy explanation. The *important* thing, of course, is that running the above line, then feeding it the following three lines of data (78 equals + 72 stars each): ==============================================================================************************************************************************ ==============================================================================************************************************************************ ==============================================================================************************************************************************ yields: A============================================================================== B************************************************************************ A============================================================================== B************************************************************************ A============================================================================== B************************************************************************ Hmmm. Double checking your note, you want the A and B *appended* as well - Ok, fine, I'll leave the above because it makes a point about newlines, and submit: perl -ne '@two=unpack("a78a72",$_); print "A",$two[0],"A\nB",$two[1],"B\n";' A==============================================================================A B************************************************************************B A==============================================================================A B************************************************************************B A==============================================================================A B************************************************************************B Items for further exploration: a) the reassembly could be done with pack. b) if the line is less than 150 columns, so will the output. I suspect the fortran code had the same problem - and that the data *doesn't* have that problem. See what pack("A78") does, and note how it would solve that problem. c) There is a substr function, but you'd have to use it twice; would that be slower? [probably, since it would still have to create the temporary values - but it might be more memory efficient, though not by enough to matter in this example.] Enjoy... _Mark_ <eichin@athena.mit.edu> MIT Student Information Processing Board Watchmaker Computing <eichin@watch.com>
marcl@ESD.3Com.COM (Marc Lavine) (03/08/91)
djo7613@milton.u.washington.edu (Dick O'Connor) writes: >Our old program copied the first 78 characters on a line to the output file >after prepending and appending the character 'A'. Characters 79-150 were >written to line 2 of the output file with prepended and appended 'B'. A >simple routine glued things back together at the other end. eichin@athena.mit.edu (Mark W. Eichin) writes: > As for your particular example, the one-liner: >perl -ne '@two=unpack("a78a72",$_); print "A",$two[0],"A\nB",$two[1],"B\n";' I just started hacking with Perl last week (and think it's great -- thanks for another wonderful tool, Larry), but I'm a long-time fan of regular expressions (in the distant past, I used to edit files with "ex"). I really like having Perl's "fancy" regular expressions available. So, here's a different solution to the problem using only regular expressions (which should be quite fast): To split the lines use: perl -pe 's/^(.{78})(.{72})$/A\1A\nB\2B/' And to join them use: perl -pe 's/^A(.{78})A\n/\1/; s/^B(.{72})B$/\1/;' (which came out very similar to Randal Schwartz's suggestion of: perl -pe 's/^A(.*)A\n$/$1/ || s/^B(.*)B\n$/$1\n/;' ) BTW, I came up with the following motto for Perl: Perl: Kitchen sink included. -- Marc Lavine Broken: marcl%3Com.Com@sun.com Smart: marcl@3Com.Com UUCP: ...{sun|decwrl}!3com.3com!marcl