[comp.text.tex] MS-Word <-->

matsuda@linc.cis.upenn.edu (Kenjiro Matsuda) (02/12/91)

Hi, sorry for cross-posting this lazy novice question.  Does anyone know any
programs that can convert the MS-Wordly formatted files to LaTex format ones
automatically and vice versa, maybe on Mac?   I was just informed that there is
one that does this sort of stuff for between MacWrite <--> troff, so I gather
there must be a similar kind of program somewhere in the cyberspace.  

Thanks,

Ken Matsuda
matsuda@linc.cis.upenn.edu

stan@dnlunx.pttrnl.nl (Stan van de Burgt) (02/12/91)

matsuda@linc.cis.upenn.edu (Kenjiro Matsuda) writes:

>Hi, sorry for cross-posting this lazy novice question.  Does anyone know any
>programs that can convert the MS-Wordly formatted files to LaTex format ones
>automatically and vice versa, maybe on Mac?   I was just informed that there is
>one that does this sort of stuff for between MacWrite <--> troff, so I gather
>there must be a similar kind of program somewhere in the cyberspace.  

I've seen this question before on the mac groups and on this group. Up to now
I've seen no answer. I'd like to know more about this question. Are you looking
for such a utility just for printing purposes, i.e. should the output of La(TeX)
just resemble the word output as good as possible? Or should the LaTeX source
should be as well-structured and readable as possible? The latter is not as 
trivial as you might think! Also, how should things like pictures, tables, 
formulas, etc be processed?

			Stan.

-- 
   S.P. van de Burgt                       PTT Research, Neher Labs
                                           PO Box 421, Leidschendam
   E-mail: SP_vdBurgt@pttrnl.nl            the Netherlands

grodan@cyklop.nada.kth.se (Mats G L|fdahl) (02/13/91)

stan@dnlunx.pttrnl.nl (Stan van de Burgt) writes:

 matsuda@linc.cis.upenn.edu (Kenjiro Matsuda) writes:

 >Hi, sorry for cross-posting this lazy novice question.  Does anyone know any
 >programs that can convert the MS-Wordly formatted files to LaTex format ones
 >automatically and vice versa, maybe on Mac? I was just informed that there is
 >one that does this sort of stuff for between MacWrite <--> troff, so I gather
 >there must be a similar kind of program somewhere in the cyberspace.  

 I've seen this question before on the mac groups and on this group. Up to now
 I've seen no answer. I'd like to know more about this question. Are you 
 looking for such a utility just for printing purposes, i.e. should the output 
 of La(TeX) just resemble the word output as good as possible? Or should the 
 LaTeX source should be as well-structured and readable as possible?
 The latter is not as trivial as you might think! Also, how should
 things like pictures, tables, formulas, etc be processed? 


I've been looking for this kind of program, too. I would like it to
try to be as smart as possible about logical constructs. It doesn't
need to finish the job, just to make the translation process easier
for me.

I think the proper medium to start from is a file output from MS-Word
in the Rich Text Format (RTF (interchange format in the MS-Word menu)).

The basic capabilities of the translation program should be something
like:

 1) Identifying paragraphs, and if possible items.
 2) Finding TeX/LaTeX control sequences for special characters.
 3) Identifying section headings, if possible with the proper
    section/subsection/subsubsection nesting.
 4) Identifying figures, producing figure environments with captions.
 5) Identifying tables, producing table environments with captions,
    and if possible some rudimentary table with the entries in the
    right positions, and in math mode if needed.
 6) Finding and replacing RTF logical constructs with the appropriate
    LaTeX logical constructs. 
 7) Finding font changes, especially to italics, that could be
    translated into {\em ...}.
 8) Doing its best with formulas. Apart from translating special
    characters, as mentioned above, it would be nice if it could
    distinguish between inline formulas and displayed ones. Fractions,
    and roots might also be possible to handle correctly.
 9) Deleting all other MS-Word control sequences

Figures and complicated tables could be left out. They can be input
by hand in LaTeX or as postscript files in \special or in any other
way the user chooses. With long tables, however, there would be no
harm done if the program did its best. At least one would not have to
type in all the entries in a long table a second time. If translating
into the correct table structure is to difficult for the program, just
lines with all entries in a row, preceded by % characters would be a
great help.


If you or anyone else would write such a program, I'd be very
interested in the result, and would be happy to assist with testing.


--
 -----------------------------------------------------------------------------
 Mats Lofdahl, Stockholm Observatory, S-133 36 Saltsjobaden | +46 - 8 16 44 75 
 -----------------------------------------------------------------------------
 Internet: lofdahl@astro.su.se | Bitnet: grodan@sekth | Sunet: royacs::lofdahl
 -----------------------------------------------------------------------------

extel@quagga.ru.ac.za (Dr. Eberhard Lisse) (02/13/91)

In <6787@dnlunx.pttrnl.nl> stan@dnlunx.pttrnl.nl (Stan van de Burgt) writes:

>matsuda@linc.cis.upenn.edu (Kenjiro Matsuda) writes:

>>Hi, sorry for cross-posting this lazy novice question.  Does anyone know any
>>programs that can convert the MS-Wordly formatted files to LaTex format ones
>>automatically and vice versa, maybe on Mac?   I was just informed that there is
>>one that does this sort of stuff for between MacWrite <--> troff, so I gather
>>there must be a similar kind of program somewhere in the cyberspace.  

>I've seen this question before on the mac groups and on this group. Up to now
>I've seen no answer. I'd like to know more about this question. Are you looking
>for such a utility just for printing purposes, i.e. should the output of La(TeX)
>just resemble the word output as good as possible? Or should the LaTeX source
>should be as well-structured and readable as possible? The latter is not as 
>trivial as you might think! Also, how should things like pictures, tables, 
>formulas, etc be processed?

>			Stan.

>-- 
>   S.P. van de Burgt                       PTT Research, Neher Labs
>                                           PO Box 421, Leidschendam
>   E-mail: SP_vdBurgt@pttrnl.nl            the Netherlands

I would like to just be able to LaTeX the text soemone has entered
into Word, because the output is much nicer on the same printer.

I don't care how it looks in the .TEX file, I can read it no matter
what. I write them all the time and in uEmacs 3.10.

emTeX has these specials and perhaps one can leave them out in a
translator. And then there is gnuplot which does my little pictures,
thank you very much, outputs into /LaTeX/PICTeX/emTeX and HPGL, which
one can include into Word.


Please someone just write one. RTF to LaTeX (and back ?)


regards, el
-- 
Dr. Eberhard W. Lisse, Katatura State Hospital
Private Bag 13260
Windhoek
Namibia

jaap@mtxinu.COM (Jaap Akkerhuis) (02/14/91)

In article <6787@dnlunx.pttrnl.nl> stan@dnlunx.pttrnl.nl (Stan van de Burgt) writes:
 > matsuda@linc.cis.upenn.edu (Kenjiro Matsuda) writes:
 > 
 > >Hi, sorry for cross-posting this lazy novice question.  Does anyone know any
 > >programs that can convert the MS-Wordly formatted files to LaTex format ones
 > >automatically and vice versa, maybe on Mac?
 > 
 > I've seen this question before on the mac groups and on this group. Up to now
 > I've seen no answer. I'd like to know more about this question. Are you looking
 > for such a utility just for printing purposes, i.e. should the output of La(TeX)
 > just resemble the word output as good as possible? Or should the LaTeX source
 > should be as well-structured and readable as possible? The latter is not as 
 > trivial as you might think! Also, how should things like pictures, tables, 
 > formulas, etc be processed?
 > 

There is going to be an obvious plug at the end of this message, but yes,
I've seen the same type questions before and also wondered whether the
various requesters actually realized where they getting into. The whole
problem of translation documents from one format to another is not
trivial. And as Stan points out, the question what needs to be
translated for which purpose and how to deal with non-textual matter
is important.

A discussion of the basic problems and a description of our experience
with interchanging processable documents can be found in our book:

	Multi-media Doument Translation, 
	ODA and the EXPRESS Project
	Johanthan Rosenberg, Mark Sherman, Ann Marks, ...
	Springer-Verlag, New York, ISBN 0-387-97397-4
	Springer-Verlag, Berlin, ISBN 3-540-97397-4

Sorry for the plug, but since this question seems to pop up on a
regular base, I thought that people dealing with document interchange
might be interested in this book.

	jaap