[comp.unix.questions] Splitting up a too-wide text file

thomson@zazen.macc.wisc.edu (Don Thomson) (03/14/91)

I've got a file of ASCII text that has lines that are too long to easily
print, formatted in columns.  I'd like to run the file through a filter that
will essentially break each page in half horizontally at a column break and
place the right-hand side of the broken-off text on a new following page,
resulting in a new file of reasonable width.  I've got a few relatively
inelegant solutions in mind, but am interested in suggestions on how other
people might approach the problem with an appropriate combination of UNIX
tools.  Any ideas?
--
 
----- Don Thomson ----- MACC, 1210 W. Dayton, Madison, WI  53706 -------------
    (608) 262-0138      thomson@macc.wisc.edu / thomson@wiscmacc.bitnet

jik@athena.mit.edu (Jonathan I. Kamens) (03/15/91)

  If you've got "cut", you can use that.  RTFM cut.

  If you've got "colrm", you can use that.  RTFM colrm.

  If you don't have "cut" or "colrm", then you can use "sed" to remove the
first part of each line, like

    sed 's/^................................................//'

or to print the last part of each line, like

    sed 's/^.*\(..............................................\)/\1/'

You get the idea, I hope.

  You can use perl, although perhaps that's overkill in this case (or perhaps
perl is *always* overkill :-).

  You can use "substr" inside awk to do it.

-- 
Jonathan Kamens			              USnail:
MIT Project Athena				11 Ashford Terrace
jik@Athena.MIT.EDU				Allston, MA  02134
Office: 617-253-8085			      Home: 617-782-0710

tchrist@convex.COM (Tom Christiansen) (03/15/91)

From the keyboard of jik@athena.mit.edu (Jonathan I. Kamens) come various
solution for truncating lines.  But I don't think that's what he wanted.
he has things like this:

            (col)

        0       80
1       a b c d e f g h i j
2       a b c d e f g h i j
3       a b c d e f g h i j
4       a b c d e f g h i j
....

and wants this
1       a b c d e
2       a b c d e
3       a b c d e
4       a b c d e
(now at page break, the stuff you chopped)
1       f g h i j
2       f g h i j
3       f g h i j
4       f g h i j


i'm not sure how best to do that.  he wants to chop at fields, not columns,
but doesn't want it to go over col 80, and he wants that stuff chopped
off placed on the next page.  i could certainly do it with a perl 
script, but was hoping for a quick cut and paste solution.

--tom

jik@athena.mit.edu (Jonathan I. Kamens) (03/15/91)

  Um, I thought it would be obvious that you use the programs I mentioned in
order to create a new file containing the portions of the lines that you want
printed on separate pages, and then print out that file.

  Now, if you want the "continuation pages" to appear right after their
corresponding pages in the original file, then you can write a script to
process the files sixty lines at a time (or however many lines fit on a page)
and write out a new file containing two pages for each original page.

  If that's really what you want to do, then perhaps perl *isn't* overkill,
because it will allow you to do this more easily than any other other
utilities I mentioned.

  I guess you could go through contortions to get sed and awk to keep enough
state to be able to write out a whole page in one glob and then the right side
of that page immediately afterward.  With sed, I would print the first part of
each line as I encountered it, adding the second page to the hold space, and
then print the hold space and clear it at the end of every sixty lines.  With
awk, I would do pretty much the same thing, but I would keep the second page
constructed thus far in a variable, not the hold space (which awk doesn't
have).

  If the sed hold space is limited in length, you lose.  If the length of a
string variable in awk is limited in length, you lose, or you can use an array
to store each line of the second page rather than storing it in one string.

  In either case, I think perl would be easier, if you know perl.

-- 
Jonathan Kamens			              USnail:
MIT Project Athena				11 Ashford Terrace
jik@Athena.MIT.EDU				Allston, MA  02134
Office: 617-253-8085			      Home: 617-782-0710

lwall@jpl-devvax.jpl.nasa.gov (Larry Wall) (03/15/91)

In article <THOMSON.91Mar14072844@zazen.macc.wisc.edu> thomson@zazen.macc.wisc.edu (Don Thomson) writes:
: I've got a file of ASCII text that has lines that are too long to easily
: print, formatted in columns.  I'd like to run the file through a filter that
: will essentially break each page in half horizontally at a column break and
: place the right-hand side of the broken-off text on a new following page,
: resulting in a new file of reasonable width.  I've got a few relatively
: inelegant solutions in mind, but am interested in suggestions on how other
: people might approach the problem with an appropriate combination of UNIX
: tools.  Any ideas?

The sed/cut/colrm solutions are okay unless you really do want the pages
to alternate, in which case you'll have to program it somehow.  I wouldn't
try to program it in shell, though it's certainly possible.  Here's a crack
at in (of all things) Perl.  :-)

#!/usr/bin/perl

$LINES = 55;			# put 55 lines on a page
$TEMPLATE = 'A80A*';		# split after 80 columns

while (<>) {
    chop;
    ($a,$b) = unpack($TEMPLATE, $_);
    push(@a, $a . "\n");
    push(@b, $b . "\n");
    &dopage if @a >= $LINES;
}
&dopage if @a;

sub dopage {
    print @a, "\f", @b, "\f";
    @a = @b = ();
}

I suppose this could be generalized to print n pages up.

Overkill, eh?

Larry Wall
lwall@jpl-devvax.jpl.nasa.gov

npl@cbnewsi.att.com (nickolas.landsberg) (03/15/91)

How about the following for those who don't have access to perl and such:
(pseudo Bourne sh, ksh script)

LENGTH=whateveryourlinelength
split -${LENGTH} <file>
for ii in x*  # assumes no other files in directory starting with "x"
do
	A="your_favorite_awk_or_sed_or_cut_or_other_script"
	${A} "args to get first 80 cols" ${ii}
	${A} "args to get next 80 cols" ${ii}
	# more if needed
done
rm -f x??

OK.... I know that it takes mucho disk space if the file is large, but,
honestly, where is there a whole in the above?

Nick Landsberg

jik@athena.mit.edu (Jonathan I. Kamens) (03/16/91)

In article <1991Mar15.032648.21@cbnewsi.att.com>, npl@cbnewsi.att.com (nickolas.landsberg) writes:
|> LENGTH=whateveryourlinelength

  Surely you mean "whateveryourpagelength" here and not
"whateveryourlinelength?"  Otherwise, I don't see how your script makes any
sense, since the "split" command isn't going to split the file in columns,
it's going to split it in lines.

  If that's what you meant, then sure, you can get things to work the way you
specified.  But personally, it seems to me that using the hold space in sed or
a variable/array in awk to store text would be a better solution, since it
would require no temporary files and world use far fewer forks and execs in
order to do the job (and therefore would be significantly faster).

-- 
Jonathan Kamens			              USnail:
MIT Project Athena				11 Ashford Terrace
jik@Athena.MIT.EDU				Allston, MA  02134
Office: 617-253-8085			      Home: 617-782-0710