[comp.text] Bi-directional text and Unicode

tut@cairo.Sun.COM (Bill "Bill" Tuthill) (09/14/90)

Now that the US is falling into a military entanglement in the mideast,
I thought comp.texters might be interested in a discussion of BiDi,
taken from the minutes of a recent Unicode meeting.  (Note: Arabic and
Hebrew are written from right to left, so mixed text must be BiDi.)

--------------------------------------------------------------------
Scheinberg then turned to the final IBM area of concerns:  Bidi
Architecture.  He proposed that Unicode should:
"	* Remove BIDI thingies from Unicode R1.0
	* Remove (implicit) string direction from Unicode R1.0 "

The discussion started with a review of the three major Bidi text
models:	
	visual (store order is the same as presentation order)
	logical explicit (store order is logical, but presentation
		order is controlled with direction change controls)
	logical implicit (store order is logical, and presentation
		order is controlled algorithmically based in
		implicit directionality rules)

Unicode clearly has adopted the logical implicit model of bidirectional
text.  Scheinberg noted that "all IBM host data bases are in VISUAL",
and that "All IBM terminals have VISUAL user interface", and proposed
that to accomodate this Unicode should remove all reference to
implicit string direction.  The requirement to remove "BIDI thingies"
refers to U+200E and U+200F in particular, the LEFT-TO-RIGHT MARK
and RIGHT-TO-LEFT MARK (context markers, not direction controls;
used by implicit algorithms to force the marked layout order on a
bidi run of text when the unmarked layout order is not the desired order).

Someone was heard to remark "Visual is the most logical order..."

A number of examples of problematical cases were worked through, with
the general consensus remaining that logical order was clearly
specifiable, and that the implicit algorithm does the best job of
handling bidi presentation.  All (including Scheinberg) agreed that
logical order was necessary for general transmission of mixed
Arabic/Hebrew and Roman plain text out of context.  But he and
Gera argued strongly for not disallowing visual order text encoding
(in support of the preexisting database stores and software in Israel,
etc.).

Whistler argued that there is a difference between 1) generic Unicode
(including bidi), which is textual data to be picked up (in principle)
at random by a receiver and interpreted, and 2) contractual Unicode,
where software implemented in Unicode is involved in a "contractual"
intercommunication with a communicating source which has certain
storage and interface conventions.  For generic Unicode bidi plain
text there is no reasonable alternative to logical text order.
However, for contractual Unicode bidi text, this need not be the
case.  For example, a Unicode database front end hooked up to an
Israeli database containing visual bidi data fields should have
no problem doing the correct parsing (field by field--since the
fields themselves provide the necessary context for relating
visual to logical order).  All communication to the database can
be exactly in the form expected by the database, and communication
with the user can be maintained with a visual order editing interface,
if that is the expected behavior.  The only expectation which cannot
be met is that this can be done without having to change any software.
However, it CAN be done without having to change the databases or
the data stored in them.

Scheinberg argued that the implicit algorithm for bidi text must
be published in the Unicode document, so that all implementers will
be doing the same thing.  This was generally viewed as a desirable
thing to do, but Becker and Davis claimed that the publication
should not be held up for a 100% algorithm.  The 99% case is relatively
easy and should be published now.  Davis agreed to write up a
strawman statement of the relevant algorithm for discussion and
comment.

The meeting adjourned, but the arguments continued.....

							--Ken Whistler