[net.micro] Unsofting Wordstar

apratt@iuvax.UUCP (02/17/85)

The previous response about high bits being used for print control was
not quite right.  WordStar sets the high bit of the last character of a
"word".  This is for various esoteric paragraph-formatting reasons.
Also, a "soft" carriage return will be a ^J with the high bit set
(decimal 138). If you want to "unsoft" a WordStar file, you have to
zero all the high bits in it.

But there's more: Page breaks are flagged with a special character, I
think, and print controls are embedded just like they are when you
entered them:  ^S for underscore, ^B for bold, etc. You should remove
these characters completely, or translate them to something printable
(_^Hx is the UNIX convention for underscoring a letter (where x is that
letter).

You're not done yet! Soft hyphens which appear in the middle of a line
are stored as one character, and soft hyphens which are at the END of a
line (and therefore should be preserved as true hyphens) are
represented by another.  Last, if you want a truly printable, but
unformatted, document, you should remove all lines beginning with a
period.

Note that all this assumes you are dealing with WordStar Document files,
not the output you get from Printing a Document file to disk.  If you
do that, you will not have to contend with soft hyphens or page breaks
(page breaks are ^L if you use Form Feeds, or just blank lines if not).
The Print Controls will change, too, depending on what printer your WS
is set up for.

Note also that you MUST do some of this before you can transmit a WS file
to a host computer: specifically, the high bits may not transfer (or they
may cause parity errors), and the ^S for underscore is FATAL to most hosts.
I wrote a program in 8088 Assembly to do all this un-softing, and I have one
for CP/M, but I don't care to regenerate the code...
----
						-- Allan Pratt
					...ihnp4!inuxc!iuvax!apratt