agw@broadway.columbia.edu (Art Werschulz) (07/02/87)
Hi all. I have a file consisting of some lines that have 80 chars or fewer, and some with more than 80 characters (in fact, some may have more than 160 characters). I wish to break up only the >80-char lines, so that no line has more than 80 chars. Here's the catch: I want to break the long lines only at whitespace. Thus, I can't use % fold -80 foo.tex since a space in the middle of a word would be disatrous. Any suggestions? Art Werschulz ARPAnet: agw@columbia.edu USEnet: ... seismo!columbia!agw BITnet: agw%columbia.edu@wiscvm CCNET: agw@columbia ATTnet: Columbia University (212) 280-3610 280-2736 Fordham University (212) 841-5323 841-5396
wrp@burdvax.PRC.Unisys.COM (William R. Pringle) (07/03/87)
in article <4780@columbia.UUCP>, agw@broadway.columbia.edu (Art Werschulz) says: > > > Hi all. > > I have a file consisting of some lines that have 80 chars or fewer, > and some with more than 80 characters (in fact, some may have more > than 160 characters). I wish to break up only the >80-char lines, so > that no line has more than 80 chars. > > Here's the catch: I want to break the long lines only at whitespace. > Thus, I can't use > > % fold -80 foo.tex > > since a space in the middle of a word would be disatrous. > > Any suggestions? > > Art Werschulz > > ARPAnet: agw@columbia.edu > USEnet: ... seismo!columbia!agw > BITnet: agw%columbia.edu@wiscvm > CCNET: agw@columbia > ATTnet: Columbia University (212) 280-3610 280-2736 > Fordham University (212) 841-5323 841-5396 ===================== Here is a little script that I use to do that. Hope it helps. ---------------- Cut Here --------- #!/bin/sh # # fold lines at whitespace # # by Bill Pringle # burdvax!wrp # sed -e 's/ / /g' $* | # convert tabs to 8 spaces awk - ' BEGIN { N = 80 } { if ((n = length($0)) <= N) print else { LEN = N for (i = 1; n > N; n -= LEN) { while (substr($0,LEN+i-1,1) != " ") { LEN -= 1 } if (i==1) { printf "%s\\\n", substr($0, i, LEN) } else { printf "> %s\\\n", substr($0, i, LEN) } i += LEN; } printf "> %s\\\n", substr($0,i) } } '
mwm@eris.BERKELEY.EDU (Mike (My watch has windows) Meyer) (07/04/87)
<in article <4780@columbia.UUCP>, agw@broadway.columbia.edu (Art Werschulz) says: [A request for a sed or awk tool to break 80-character lines at whitespace.] Some problems just aren't amenable to tackling with sed/awk. I think this is one of them. It may be doable with sed, but I'm not sure how. Any awk script to do this wiill be almost as complicated as a C program to do the same thing. For example: In article <somearticle@somehost> someone writes: <Here is a little script that I use to do that. Hope it helps. < <---------------- Cut Here --------- <#!/bin/sh <# <# fold lines at whitespace <# < <sed -e 's/ / /g' $* | # convert tabs to 8 spaces <awk - ' <BEGIN { N = 80 } <{ if ((n = length($0)) <= N) < print < else { < LEN = N < for (i = 1; n > N; n -= LEN) { < while (substr($0,LEN+i-1,1) != " ") { < LEN -= 1 < } < if (i==1) { < printf "%s\\\n", substr($0, i, LEN) < } else { < printf "> %s\\\n", substr($0, i, LEN) < } < i += LEN; < } < printf "> %s\\\n", substr($0,i) < } <} ' This is what I mean. First, converting tabs directly to 8 spaces has *got* to be wrong. Secondly, this fails on files with lines longer than awks internal buffer for records (minor, and usually acceptable). The loose problem spec doesn't help much, of course. But that just means the problem is a "real-life" problem, and not a classroom exercise. The C code to solve the problem has some differences (no tags on folded lines, and the whitespace where the fold is doesn't get printed). It's also a pure filter, but allows for user-specified fold columns, instead of wiring it to 80. The main loop of the C code is 26 lines, not counting comments. The awk script is 19 lines. The C code would shrink to 22 lines by using printfs instead of fputs/putchar, and formatting if/else the same way the awk script is. Since (as far as I'm concerned)) sed and awk are for quickly building programs that would be difficult in C, the small difference between the two programs - which hopefully indicates a small difference in construction time - shows that this is an problem for which awk isn't really suited. On the other hand, some simple test case (the first n integers on a single line, seperate by a singe space) show the C version can handle n = 10000 in about the same sys and user times (as reported by /bin/time on a Sun 3/50 running SunOS 3.3) as the sed/awk version for n = 100. The sed/awk version drops core for n >= 1000, and the C version takes less that 1/10th of a second of sys and user time for n <= 1000, so I didn't do direct comparisons. The shell script to emulate the awk/sed script user interfaces, and the more complex script to combine the two, is left as an exercise for the reader. <mike /* * wfold - fold stdin on column n, n being the first (and only) argument. * If unspecified, n is 80. A throwaway for demo purposess on the * net. */ #include <stdio.h> /* * MAXFOLD is the largest fold column we're willing to accept. All others * rejected. */ #define MAXFOLD 160 void main(argc, argv) int argc; char **argv; { register foldc = 80 ; char buffer[MAXFOLD + 2] ; register char *fold_point, *leftovers ; /* Argument processing */ if (argc > 2) { fprintf(stderr, "useage: %s [n]\n", argv[0]) ; exit(1) ; } if (argc == 2) foldc = atoi(argv[1]) ; if (foldc <= 0 || foldc > MAXFOLD) { fprintf(stderr, "%s: only fold columns between 1 and %d supported\n", argv[0], MAXFOLD) ; exit(1) ; } /* * The plan is to treat each line + leftovers from last read as * a new line. fold_point indicates where the end of the leftovers * end. Initially set to the beginning of the buffer, it's set up * correctly each time through the loop. * * We need to get one more characters than the maximum fold, as * the first character past the fold column might be whitespace, * and that's a legit fold point. Since fgets reads at most n-1 * characters (n is the second argument), we need to ask for foldc+2 * characters, minus however much leftovers there are from last loop. */ leftovers = buffer ; while (fgets(leftovers, foldc+2-(leftovers-buffer), stdin) != NULL) { /* * If we got a complete line, print it. */ if (buffer[strlen(buffer) - 1] == '\n') { fputs(buffer, stdout) ; leftovers = buffer ; } /* * Got a long line. Find the fold point, print up to the fold, * then shuffle the remaining characters forward and try again. */ else { fold_point = buffer + foldc ; while (*fold_point != ' ' && *fold_point != '\t' && fold_point > buffer) fold_point -= 1 ; /* Test for lines with no whitespace */ if (fold_point == buffer) { fputs(buffer, stdout) ; putchar('\n') ; leftovers = buffer ; } else { /* Dump up to fold point */ *fold_point = '\0' ; fputs(buffer, stdout) ; putchar('\n') ; /* Now, deal with the leftovers */ fold_point += 1 ; strcpy(buffer, fold_point) ; leftovers = &buffer[strlen(buffer)] ; } } } exit(0) ; } -- I'm gonna lasso you with my rubberband lazer, Mike Meyer Pull you closer to me, and look right to the moon. mwm@berkeley.edu Ride side by side when worlds collide, ucbvax!mwm And slip into the Martian tide. mwm@ucbjade.BITNET
lied@ihuxy.ATT.COM (Bob Lied) (07/06/87)
In article <4780@columbia.UUCP>, agw@broadway.columbia.edu (Art Werschulz) writes: > > Here's the catch: I want to break the long lines only at whitespace. All us well-trained C programmers immediately look for a way to find that last space before column 80. That was my first impulse. Here's an awk script which uses split() in a fairly clever way to try to find the last word before column 80. Besides not handling lines over 160 characters, it also has a fatal flaw if that last word appears earlier in the string. Other than that, it might have at least tutorial value: newform -i file | # Convert those tabs to equivalent spaces! awk 'length > 80 { str = substr($0, 1, 80) n = split(str, words) divide = index(str, words[n]) left = substr($0, 1, divide-1) right = substr($0, divide) print left "\n" right } length <= 80 { print }' Then I had a brainstorm! What well-known text processing program already knows how to break text at white space? Why, of course: nroff (or its much faster local cousin, sroff). Use awk to insert formatter commands, then run the bugger through nroff! (Use grep -v to eliminate the extra blank lines.) newform -i file | awk 'BEGIN {print ".nf" ; print ".na"; print ".nh"; print ".ll 80"} length>80 {print ".fi" ; print ; print ".nf" } length<=80 {print}' | nroff | grep -v '^$' See! All the caffeine and Nutrasweet eventually pays off. Bob Lied ihnp4!ihuxy!lied
PAAAAAR%CALSTATE.BITNET@wiscvm.wisc.EDU (07/08/87)
Art Werschulz <agw@broadway.columbia.EDU> wrote asking for a way to 'word wrap' text. About 2 months ago I needed to do something similar when preparing handouts etc for a class. I designed, coded, documented, and ported a program 'br.c' as an example for the class. It is more useful than I expected. In BSD Mail I use '~br -70' to quickly format messages. In vi you can ignore layout when editting and then reformat a paragraph with '!{ br'. The speed is magical. Another use is for multicolumn printing of a list: '!20! sortcat -nbr 20pr -3 -w70 -o10' I am now working on joining up lines that have been word-wrapped. Given 'jn' (join) and 'br' I will have a kind of poor person's nroff by 'jn oldbr newpr whatever' Has any one got a speedy program to unwrap word-wrapped text? I won't post 'br.c', 'br.1', etc as they are longer than previous examples. If any one wants source, manual pages, and/or full design documentation - contact me directly and I'll send the stuff back by email(inshallah). Here are a list of features/bugs: 1. 'br width' reads standard input and produces standard output(only). 2. No output line is longer than w characters. 3. Whole groups of whitespace characters are replaced by a newline(CR in ASCII). 4. It interprets tabs as 6 characters wide. 5. 'br' treats nonprinting characters as having zero width. Backspace causes problems. 6. Words that are longer than 'w' are hyphenated before the break. 7. 'br -width' and 'br width' do the same thing. 8. The default width is 76. Dick Botting, CSU San Ber'do 5500, State University Pkwy, San Bernardino, CA 92407 714-887-7368(voice), 714-887-7365(modem - login as guest) paaaar@calstate.bitnet paaaaar@ccs.csussc.edu paaaaar%calstate.bitnet@wiscvm.wisc.edu Disclaimer - I am only an egg.