[comp.unix.questions] Stripping "hard returns" from UNIX mail files

patrick@casbs.Stanford.EDU (Patrick Goebel) (10/24/90)

Dear UNIX gurus--

Given that I have saved a mail file, I'd like to download it and
reformat it in my favorite PC wordprocessor.  Problem is, saved mail
files end up with a "hard return" at the end of every line and this
prevents most PC wordprocessors from doing their reformatting thing.

It is relatively easy to write a macro within the PC wordprocessor to
"clean up" these unwanted hard returns, but I was wondering if there is
a UNIX utility that would do the job before the file is downloaded.

Thanks!

--patrick

peter@ontmoh.UUCP (Peter Renzland) (10/29/90)

patrick@casbs.Stanford.EDU (Patrick Goebel) asks for a UNIX utility
to remove "hard returns" from mail messages for subsequent processing
by MS-DOS wordprocessors.

Unix considers it natural for text to be made up of lines, and all
programs that do useful things with text assume that such lines are
within some reasonable limit.  This corresponds to things that naturally
contain lines (text in books or on your display, or on typewriter, or
a line printer), and those things, naturally, have limits on the line
length.

The RETURN key, and its code, is an implementation of the typewriter's
"carriage" return.

Text which is thus made up of lines can easily be formatted in all sorts
of ways.  But, if we format it so that we have (limitless) multiline
paragraphs and no longer any line separators, some of our programs that
are so handy with lines of text may break in the face of possibly huge
paragraphs.

Having said that, you could try something like this little program:

awk '
NF==0	{ if(LINE) { print LINE ; LINE="" } ; print ; next}
	{ if(LINE) LINE=LINE " " $0 ; else LINE=LINE $0 }
END	{ print LINE }
' $*

I would prefer to use the PC wordprocessor's text import facilities to
take standard line-oriented text and convert it to its own paragraph
format.

-- 
Peter Renzland @ Ontario Ministry of Health  416/964-9141  peter@ontmoh.UUCP

lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (10/30/90)

In article <657182670.22483@ontmoh.UUCP> peter@ontmoh.UUCP (Peter Renzland) writes:
: patrick@casbs.Stanford.EDU (Patrick Goebel) asks for a UNIX utility
: to remove "hard returns" from mail messages for subsequent processing
: by MS-DOS wordprocessors.
: 
: Unix considers it natural for text to be made up of lines, and all
: programs that do useful things with text assume that such lines are
: within some reasonable limit.

Painting with a broad brush here, aren't you?  Both Gnu emacs and Perl
agree that the only "reasonable limit" on line length is the amount of
swap space available on your machine.

: This corresponds to things that naturally
: contain lines (text in books or on your display, or on typewriter, or
: a line printer), and those things, naturally, have limits on the line
: length.
:
: The RETURN key, and its code, is an implementation of the typewriter's
: "carriage" return.

Fair enough.  But someday we have to escape the typewriter/punchcard metaphor.
Word processors are just beginning to get us out of this straitjacket.

: Text which is thus made up of lines can easily be formatted in all sorts
: of ways.  But, if we format it so that we have (limitless) multiline
: paragraphs and no longer any line separators, some of our programs that
: are so handy with lines of text may break in the face of possibly huge
: paragraphs.

So rewrite the programs so they aren't busted.

: Having said that, you could try something like this little program:
: 
: awk '
: NF==0	{ if(LINE) { print LINE ; LINE="" } ; print ; next}
: 	{ if(LINE) LINE=LINE " " $0 ; else LINE=LINE $0 }
: END	{ print LINE }
: ' $*

I think gawk will now handle "infinite" lines, but older awks will blow
up on longer paragraphs.  It would be a tad nicer if it threw in an
extra space after lines that end a sentence.

: I would prefer to use the PC wordprocessor's text import facilities to
: take standard line-oriented text and convert it to its own paragraph
: format.

Some of use don't have such a clever importer.  Yeah, I know, rewrite the
programs...

Sigh.

Larry

tronix@polari.UUCP (David Daniel) (10/31/90)

Just a reminder:

Unix doesn't utilize a 'CR' (ASCII 13), but instead uses a 'LF' (ASCII 10) to
terminate a line.
You'll need to check the docs on your word processor to be sure that it's 
using a CR rather than a LF.

If you need to do the conversion the easiest way is to use thr 'tr' program
resident on most Unix machines -

tr /015 /012 <CR-flile >LF-file

-- 
David Daniel (The man with no disclaimer)  tronix@polari.UUCP
"Beware the Truth. If you find a Truth it can demand that you make painful
changes."  - Frank Herbert

buck@sct60a.sunyct.edu (Jesse R. Buckley Jr.) (11/20/90)

        A hard return is an  actual <CR> while a soft return is put in by a
word processor and can be reformated.


-- 
-Buck                    ! User n.: A programmer who will believe 
(buck@sct60a.sunyct.edu) !          anything you tell him.