[comp.sys.mac] How to convert text

chuq@Apple.COM (Chuq Von Rospach) (12/22/89)

dgp0@bunny.gte.com (Dennis Pratt) writes:

>Mr. James Scott writes that the solution of removing paragraph marks at 
>the end of a line is to to use the "change" command to replace ^p 
>(paragraph mark) with a space.

>That's how I do it, but what a PAIN!!  It takes me hours to convert a 
>document taken off a line oriented system to a WORD document.  I want 
>headings to be marked as headings.  I want indents to be marked as tabs, 
>not spaces.  I want paragraph breaks at paragraphs, not at line breaks.  I 
>want extraneous spaces removed.

That's funny. I do it all the time. It doesn't take me hours. Maybe you
aren't using the right tools? Or working efficiently?

>So, how about it MicroSoft??  Why not improve WORD in order to have it 
>efficiently alter the clunky stuff we pull out of these network services 
>into beautiful WORD documents?

Since there seems to be some need for it, a quick "How to convert text from
Unix to Mac (and back again) tutorial:

Those tools already exist. There's no need to hack Word to do it. 

Three conversion tools I know of:

o McSink (shareware)/Vantage commercial): text hacking DA. Convenient,
  powerful, flexible, macro-programmable.

o Macify: Application that does paragraph formatting, quote curling, and
  etc.

o Add/Strip: ditto. Also has the power to uncurl, unformat and make stuff
  ready to go from a Mac to a Unix box, which the other two don't have.

I use Add/Strip a fair amount. Same with McSink. Macify's not as powerful as
Add/Strip, so I don't keep it around any more. (all of the above are
available on CompuServe, probably GEnie, and I don't know where else).

There's also the manual method. For paragraphs it's quick. For curling
quotes it's horrid (if you're going to curled quotes, you definitely want
add/strip).

If your text is in reasonable form (paragraphs delimited by a blank line, by
a <cr><tab> or <cr><spaces> any of these tools will convert them over pretty
cleanly. I find there's very little manual hacking needed.

If you're in a hurry or you don't have these tools handy, here's how to
quickly convert paragraphs in Word:

o Load the unconverted file into Word (3 or 4)
o Select the text to change (or the entire document)
o First we take all the real paragraph marks and convert them to a holding
  marker. To do this, select "Change". The "Find What" string is whatever it
  takes to match the paragraph end: "^p^p" for blank lines, "^p^t" for a
  tab, or "^p     " for five spaces (or whatever). The "Change to" is set to
  some strange string. I use "&foo&". click on "Change all".
o All of the real paragraph endings are now flagged. Now zap all the line
  endings. "Find What" is "^p". "Change to: is " " (space). click on "Change
  all".
o You know have one massive paragraph. Unflag the paragraph endings. "Find
  What" is "&foo&", "Change to" is "^p". click on "change all".
o clean up: "find what" is "  ", "change to: is " " (two spaces to one
  space). Change all until you get no changes.

Your text is now in word-processing paragraph format. Takes very little
time, works fine.

The *problem* with asking Microsoft to automate this is that how people
start paragraphs isn't really standard. With OtherRealms, I get lots of
e-mail submissions, and everyone seems to have a different format. I use
blank lines between paragraphs. Blank lines and tabs are popular. no blank
lines and tabs are, too. So is indenting with five spaces, and indenting
with three spaces, and I've seen indenting with 2, 4 and eight spaces as
well. There are already good utilities to do these kinds of conversions, so
why try to wedge a special purpose function into an already complex program?
It's one of those things that really ought to be a separate program. Maybe
Microsoft can buy the rights to Add/Strip (I really like that program), or
bundle Vantage with the program, but if you need this kind of capability,
you ought to be using a program designed to do it. 

-- 

Chuq Von Rospach   <+>   chuq@apple.com   <+>   [This is myself speaking]

For herein may be seen noble chivalry, courtesy, humanity, friendliness,
cowardice, murder, hate, virtue and sin. Do after the good and leave the
evil, and it shall bring you to good fame and renown. -- Malory

urlichs@smurf.ira.uka.de (12/22/89)

In comp.sys.mac chuq@Apple.COM (Chuq Von Rospach) writes:
< dgp0@bunny.gte.com (Dennis Pratt) writes:
< 
< >Mr. James Scott writes that the solution of removing paragraph marks at 
< >the end of a line is to to use the "change" command to replace ^p 
< >(paragraph mark) with a space.
< 
< >That's how I do it, but what a PAIN!!  It takes me hours to convert a 
< >document taken off a line oriented system to a WORD document.  I want 
< >headings to be marked as headings.  I want indents to be marked as tabs, 
< >not spaces.  I want paragraph breaks at paragraphs, not at line breaks.  I 
< >want extraneous spaces removed.
< 
< That's funny. I do it all the time. It doesn't take me hours. Maybe you
< aren't using the right tools? Or working efficiently?
< 
Chuq, you are using Word 4.  Dennis is using Word 3.
Replacing N instances of X with Y, with N >>100, is impossible in Word 3
without plenty of time, frequent saves, and (depending on how much memory you
have) a bomb shelter.

< (Method everybody seems to use to convert text to real text deleted)
This would be a whole lot easier if only Word had regular expressions.
I'll probably buy Nisus as soon as I can find _any_ dealer around here who
actually has it on display. Fat chance. :-(

< For herein may be seen noble chivalry, courtesy, humanity, friendliness,
< cowardice, murder, hate, virtue and sin. Do after the good and leave the
< evil, and it shall bring you to good fame and renown. -- Malory

Put that into one of the news.announce.newusers postings. It seems to be
sorely needed. (Not that anyone will pay attention to it, of course. :-( )
-- 
Matthias Urlichs

chuq@Apple.COM (Chuq Von Rospach) (12/23/89)

urlichs@smurf.ira.uka.de writes:

>Chuq, you are using Word 4.  Dennis is using Word 3.

There's no reason why it shouldn't work in Word 3. Most of what I do now I
did in Word 3 before the upgrade. 

>This would be a whole lot easier if only Word had regular expressions.

True. Better wildcards would help. If I'm doing something fairly complex, I
sometimes pop the text into Lightspeed C so I can use grep mode.


-- 

Chuq Von Rospach   <+>   chuq@apple.com   <+>   [This is myself speaking]

An argument requires two voices. Without the opposition, it's just a
whine.  To argue, you have to listen to and rebut the opposition. Most
USENET arguments aren't. They're simply two monologues happening at once.

bill@ut-emx.UUCP (Bill Jefferys) (12/23/89)

In article <1330@smurf.ira.uka.de> urlichs@smurf.ira.uka.de (Matthias Urlichs) writes:
#In comp.sys.mac chuq@Apple.COM (Chuq Von Rospach) writes:
#< dgp0@bunny.gte.com (Dennis Pratt) writes:
#< 
#< >Mr. James Scott writes that the solution of removing paragraph marks at 
#< >the end of a line is to to use the "change" command to replace ^p 
#< >(paragraph mark) with a space.
#< 
#< That's funny. I do it all the time. It doesn't take me hours. Maybe you
#< aren't using the right tools? Or working efficiently?
#< 
#Chuq, you are using Word 4.  Dennis is using Word 3.
#Replacing N instances of X with Y, with N >>100, is impossible in Word 3
#without plenty of time, frequent saves, and (depending on how much memory you
#have) a bomb shelter.

I used to do this on Word 3 with long documents, but there's
a trick. Suppose ^p^p is the end of a paragraph, and ^p the
end of a line. Then I would first replace ^p with ^p\. 
Then, ^p\^p\ --> ^p; Finally ^p\ --> <space>. This gets
around the problem with Word 3 that when you get rid of
all the paragraph marks to get one long paragraph (as Chuck
suggested) it takes FOREVER.

But I wish that Word had special tools for this.

Bill Jefferys

gillies@p.cs.uiuc.edu (12/24/89)

dgp0@bunny.gte.com (Dennis Pratt) writes:

>Mr. James Scott writes that the solution of removing paragraph marks at 
>the end of a line is to to use the "change" command to replace ^p 
>(paragraph mark) with a space.

>That's how I do it, but what a PAIN!!  ...Microsoft, how about adding
>this as a feature....

Big deal.  Use find/replace and macromaker or the supplied automac to
add the feature yourself.  The tools are within your grasp, if you'd
just reach for them...

I want MS-Word to incorporate things that are currently impossible,
not oodles of shortcuts for the couch potato.