[comp.text.tex] Glossed examples in TeX

malouf@acsu.buffalo.edu (Robert Malouf) (11/01/90)

Does anyone know of a macro for formatting glossed examples in TeX?  I want to
include examples like (1) or (2) in a paper.

(1) Mosimane o-   bed- its-w-   e    ke monna.
    boy      SUBJ-beat-PRV-PASS-MOOD by man
    `The/*a boy was beaten by a/the man.'

(2) Mosimane o-bed-its-w-e           ke monna.
    boy      SUBJ-beat-PRV-PASS-MOOD by man
    `The/*a boy was beaten by a/the man.'

Thanks in advance for any advice.

Rob Malouf
malouf@acsu.buffalo.edu

neubauer@bsu-ucs.uucp (Paul Neubauer) (11/01/90)

In article <43647@eerie.acsu.Buffalo.EDU>, malouf@acsu.buffalo.edu 
  (Robert Malouf) asks:
> Does anyone know of a macro for formatting glossed examples in TeX?  I want to
> include examples like (1) or (2) in a paper.

Well, now, this is a question that I have been worrying about since I started
to get interested in TeX (and LaTeX).  Unfortunately, that has not been real
long and I'm still not very good.  However, I have got something that appears
to more or less work.  I am posting it rather than mailing it to Robert in the
hope that I can get some useful feedback from more competent TeXnicians.

Here are some macros that I have cobbled together.  (I have put them into a
.sty file that I am still working on, but I am by no means committed to
sticking with.  Any improvements will be gratefully accepted.)  (These are for
LaTeX, BTW, though I assume that something similar could be arranged for plain
TeX, using some other construct instead of "list".)

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% linguistics macros for examples

\newcounter{exampleno}
\newcounter{subexampleno}
\def\theexampleno{\arabic{exampleno}.}
\def\thesubexampleno{\theexampleno \alph{subexampleno})}
%  Note that this last def produces examples labelled:
%       1.a)  fkdjslfkdsjl;asjldsjh
%       1.b)  kewjhtrlehjodsfsdfljkhsdf
% etc, which is NOT exactly optimal.  I haven't got anything better, so this is
% one of the places where I am hoping for help.  I want:
%       1.a)  lkjfdslafjd;lkjaslkfjd;s
%         b)  fdjlsjfdlfdjksl;jfl
% yet I also would like to have cross-references in the text to (1b) or (1.b)
% rather than to 1.b) or, worse, b)

% This next environment seems to work more or less reasonably for plain
% (English-language) examples.
\newenvironment{example}{
  \stepcounter{exampleno}
  \typeout{Example number \theexampleno}
  \def\*{\strut\kern-\starhang*}
  \begin{list}{\theexampleno\hfill}{
    \leftmargin 3.5em
    \labelwidth 2.5em
  }\item
 }{\end{list}}

\newenvironment{abexample}{
  \stepcounter{exampleno}
  \typeout{Example number \theexampleno}
  \begin{list}{\thesubexampleno\hfill}{
    \usecounter{subexampleno} 
    \def\*{\strut\kern-\starhang*}
    \leftmargin 3.5em
    \labelwidth 2.5em
  }
 }{\catcode\lq*=12 \end{list}}

\newenvironment{glossedexample}{
  \stepcounter{exampleno}
  \typeout{Example number \theexampleno}
  \def\*{\strut\kern-\starhang*}
  \begin{list}{\raisebox{\arraystretch \ht\strutbox}{\theexampleno}\hfill}{
    \leftmargin 3.5em
    \labelwidth 2.5em 
    \raggedbottom
    \@itempenalty 10000
    \interlinepenalty 10000
  }
 }{\end{list}}

\def\@gobblelbracket{\@ifnextchar [{\@gobble}{\ignorespaces}}
\def\@gobblerbracket{\@ifnextchar ]{\@gobble}{\ignorespaces}}
\def\stacktext#1{\begin{tabular}{l}#1 \end{tabular}}
\def\translation{\item[]}
\def\gloss{\item\@glossbegin}
\def\@glossbegin#1{
 \@gobblelbracket#1\@gloss
 }
\def\@gloss#1{
 \stacktext{#1}\rule[-3ex]{1ex}{0pt}\@glossend
 }
\def\@glossend{
 \@ifnextchar]{\@gobblerbracket}{\@gloss}
 }
\newdimen\starhang	\setbox0=\hbox{*}	\starhang=\wd0

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Some more notes on the preceding macros:
The \typeout line in each macro is just to help me keep track of what LaTeX is
doing and has no other significance.  I may well delete these lines after I am
finished debugging.

\begin{example}
This is an ordinary linguistic example in English with no gloss or other fancy
formatting to worry about.
\end{example}

Note that each of the example macros contains the line:
    \def\*{\strut\kern-\starhang*}

The function of this local definition of \* is to allow the TEXT of all
examples on the page to line up, regardless of a preceding asterisk. 
Unfortunately, I have been able to do this only with a \* macro, rather than
the a priori more desireable alternative of making * active and having it cause
a kern to the left.  (Note that this is only more desireable IF we assume that
there will be no asterisks in the text of the example.)

For example, consider a pair of examples like:
\begin{abexample}
\item\label{ex:thumb} I caught my thumb in the door. 
\item\label{ex:finger} \*I caught a finger in the door. 
\end{abexample}

We would not use \ref{ex:finger} if what got caught was a thumb; nor
\ref{ex:thumb} if what got caught was one of the four non-thumbs.  

The glossedexample macro is presumably quite opaque (at least it is to me :-).
The intended usage is something like:

\begin{glossedexample}
\gloss[{Mosimane\\boy}{o-\\SUBJ-}{bed-\\beat-}{its-\\PRV}{w-\\PASS}{e\\MOOD}
	{ke\\by}{monna.\\man}]
\translation{`The/*a boy was beaten by a/the man.'}
\end{glossedexample}

Note that the single quotes around the translation could presumably be added to
the macro without much difficulty.  I simply have not thought to do so yet as I
have not decided for the style I was creating whether I wanted the translation
in a different typeface (e.g., italics) or what.

Note that the preceding puts a fair amount of space between the glossed
elements, so you may well prefer:

\begin{glossedexample}
\gloss[{Mosimane\\boy}{o-bed-its-w-e\\SUBJ-beat-PRV-PASS-MOOD}
	{ke\\by}{monna.\\man}]
\translation{`The/*a boy was beaten by a/the man.'}
\end{glossedexample}

The other (another?) real bad hack in the glossedexample environment is the
\raisebox business for the example number.  LaTeX will otherwise put the
example number *between* the lines of the stacked text, and it will look sort
of like:
 
     Mosimane o-bed-its-w-e           ke monna.
 (2)
     boy      SUBJ-beat-PRV-PASS-MOOD by man
     `The/*a boy was beaten by a/the man.'

Naturally, the actual TeX output is not quite this ugly, but after all, this
was done on a terminal, with fixed line spacing.

Happy TeXing.  I hope to get back to improving this sometime soon, so I really
would appreciate any suggestions.


====
    Paul Neubauer    00prneubauer@bsu-ucs.uucp  or  00PRNEUBAUER@BSUVAX1.BITNET
                     neubauer@bsu-cs.uucp
                     {backbones}!iuvax!bsu-cs!neubauer

steve@txsil.lonestar.org (Steve McConnel) (11/04/90)

In article <43647@eerie.acsu.Buffalo.EDU> malouf@acsu.buffalo.edu
 (Robert Malouf) writes:
>Does anyone know of a macro for formatting glossed examples in TeX?  I want to
>include examples like (1) or (2) in a paper.
>
>(1) Mosimane o-   bed- its-w-   e    ke monna.
>    boy      SUBJ-beat-PRV-PASS-MOOD by man
>    `The/*a boy was beaten by a/the man.'
>
>(2) Mosimane o-bed-its-w-e           ke monna.
>    boy      SUBJ-beat-PRV-PASS-MOOD by man
>    `The/*a boy was beaten by a/the man.'
>
>Thanks in advance for any advice.


We've been working on TeX macros for typesetting interlinear text for
a couple of years, and are almost ready to publish a book describing
how to use them.  (``Almost ready'' as in ``goes to press next week''.)
I'll try to make the macros and programs available on the net sometime
in December when the books are available.  Before that, "beta-test"
versions are available on request.

The primary focus of our work has been on a set of macros for plain
TeX to format books of annotated text, but a LaTeX style has also been
developed.  An environment named `interlinear' is the focal point of
the LaTeX macros.  This environment creates a box containing aligned
text.  (Because it's all in a box, each interlinear example must fit on
a page, and examples will never split across pages.)  If the examples
are short enough to fit on one line, it is possible to include the
freeform annotation in the interlinear environment.  Before using the
interlinear environment, each annotation field must be defined by an
\aligning command to establish font selection, leading, and so on.
For the examples given by Robert Malouf, the following commands would
set things up appropriately:

    \aligning{text}{cmr10}{}{}{}{}
    \aligning{gloss}{cmr10}{}{}{}{}
    \aligning{free}{cmr10}{}{}{}{}

(Optional parameters are perhaps not handled in true LaTeX style, as the
macro definition was copied verbatim from the plain TeX package.)

Within the interlinear environment, \[ and \] delimit a \vbox containing
stacked annotations.  \< and \> delimit a \hbox, allowing nesting, for
example, to align morphemes rather than words.  \+ causes the nested
\vbox's to butt up against each other rather than being separated by a
modest amount of space.  { and } at the innermost level delimit the
actual data, handling the setup (font switches and such) established
by \aligning commands earlier.

With that background, here's the encoding of Robert Malouf's first example:

    \begin{interlinear}
    \[ \< \[       {Mosimane} {boy}          \]
          \[ \< \[ {o--}      {SUBJ--} \]\+
                \[ {bed}      {beat}   \]\+
                \[ {--its}    {--PRV}  \]\+
                \[ {--w}      {--PASS} \]\+
                \[ {--e}      {--MOOD} \] \> \]
          \[       {ke}       {by}           \]
          \[       {monna.}   {man}          \] \>
                   {`The/*a boy was beaten by a/the man.'} \]
    \end{interlinear}

Of course, if you want to use word-aligned examples, that's also possible.
Here is how it would be encoded:

    \begin{interlinear}
    \[ \< \[ {Mosimane}          {boy}                         \]
          \[ {o--bed--its--w--e} {SUBJ--beat--PRV--PASS--MOOD} \]
          \[ {ke}                {by}                          \]
          \[ {monna.}            {man}                         \] \>
             {`The/*a boy was beaten by a/the man.'}                 \]
    \end{interlinear}

Note that the whitespace in these examples is purely to make the nesting
and alignment easier to decipher.  Within the interlinear environment,
newlines are given the same \catcode as spaces, so that empty lines will
not trigger paragraphing.  Also, within a \[...\] pair, spaces are thrown
away except for the actual data inside the {...} pairs.

In addition to the commands described above, there are parameters for
adjusting margins and spacing, and a couple of commands for accessing
multiple fonts within a single annotation field.  An `interlinear*'
environment is also provided for unnumbered examples.  For those who
are working with multiple languages, the \itfreset command erases the
information stored by prior \aligning commands.

The TeX macros and associated software will be free, but we'll have to
charge for the book and floppies. :-( (but probably less than $20) :-)
Since we aren't on the Internet, I can't put things up on `anonymous
FTP' directly.  Posting to USENET is the obvious alternative.

What newsgroup would it be appropriate to post the TeX macros to?
Currently, ITF.TEX (the plain TeX macro file) is 79K, ITFL.STY (the
LaTeX style file) is 15K, and a pair of auxiliary TeX/LaTeX macro
files add another 10K to the total.  Then, adding the C source code
to a couple of programs for translating data into ITF/TeX format...
It adds up to a fair amount of data.
-- 
Stephen McConnel
Summer Institute of Linguistics  PHONE: 214-709-2418
7500 W. Camp Wisdom Road          UUCP: ...!{convex|utafll}!txsil!steve
Dallas, TX 75236              Internet: steve@txsil.lonestar.org

marcel@cs.caltech.edu (Marcel van der Goot) (11/07/90)

In <43647@eerie.acsu.Buffalo.EDU> Robert Malouf
(malouf@acsu.buffalo.edu) asks

> Does anyone know of a macro for formatting glossed examples in TeX?

In <46341@bsu-ucs.uucp> Paul Neubauer (neubauer@bsu-ucs.uucp)
commiserates

> Well, now, this is a question that I have been worrying about since
> I started to get interested in TeX (and LaTeX).

And in <373@txsil.lonestar.org> Steve McConnel (steve@txsil.lonestar.org)
answers

> We've been working on TeX macros for typesetting interlinear text for
> a couple of years, and are almost ready to publish a book ...

It was not entirely clear to me what exactly the macros are supposed
to do (really, if you want people who know about TeX but not about your
field, to answer questions, then the least you can do is clearly describe
what the problem is), but as I understand it you want to typeset two
sentences atop each other, where the individual words are vertically
aligned. (If that's wrong, you can hit 'n' now ...) I assume that the
big problem is not how to number your examples.

I have a macro \gloss that, when used as follows
	\gloss This is an example
	       Dit is een voorbeeld
will create an hbox with contents
	This is an  example
	Dit  is een voorbeeld
You can then do whatever you want with this hbox.

Your examples could be done as
	\gloss Mosimane o-   bed-  its- w-    e     ke monna.
	       boy     subj- beat- prv- pass- mood- by man.
and
	\gloss Mosimane o-bed-its-w-e ke monna.
	       boy      subj-beat-prv-pass-mood by man.
This seems a bit more readable than the other macros that were suggested.

The file gloss.tex is available via anonymous ftp from
csvax.cs.caltech.edu [131.215.131.131] in directory pub/tex.
The macros were written for TeX, but I don't see compelling reasons
why they couldn't be used in combination with LaTeX.

				    Marcel van der Goot
				    marcel@vlsi.cs.caltech.edu