tut@cairo.Sun.COM (Bill "Bill" Tuthill) (09/14/90)
Now that the US is falling into a military entanglement in the mideast, I thought comp.texters might be interested in a discussion of BiDi, taken from the minutes of a recent Unicode meeting. (Note: Arabic and Hebrew are written from right to left, so mixed text must be BiDi.) -------------------------------------------------------------------- Scheinberg then turned to the final IBM area of concerns: Bidi Architecture. He proposed that Unicode should: " * Remove BIDI thingies from Unicode R1.0 * Remove (implicit) string direction from Unicode R1.0 " The discussion started with a review of the three major Bidi text models: visual (store order is the same as presentation order) logical explicit (store order is logical, but presentation order is controlled with direction change controls) logical implicit (store order is logical, and presentation order is controlled algorithmically based in implicit directionality rules) Unicode clearly has adopted the logical implicit model of bidirectional text. Scheinberg noted that "all IBM host data bases are in VISUAL", and that "All IBM terminals have VISUAL user interface", and proposed that to accomodate this Unicode should remove all reference to implicit string direction. The requirement to remove "BIDI thingies" refers to U+200E and U+200F in particular, the LEFT-TO-RIGHT MARK and RIGHT-TO-LEFT MARK (context markers, not direction controls; used by implicit algorithms to force the marked layout order on a bidi run of text when the unmarked layout order is not the desired order). Someone was heard to remark "Visual is the most logical order..." A number of examples of problematical cases were worked through, with the general consensus remaining that logical order was clearly specifiable, and that the implicit algorithm does the best job of handling bidi presentation. All (including Scheinberg) agreed that logical order was necessary for general transmission of mixed Arabic/Hebrew and Roman plain text out of context. But he and Gera argued strongly for not disallowing visual order text encoding (in support of the preexisting database stores and software in Israel, etc.). Whistler argued that there is a difference between 1) generic Unicode (including bidi), which is textual data to be picked up (in principle) at random by a receiver and interpreted, and 2) contractual Unicode, where software implemented in Unicode is involved in a "contractual" intercommunication with a communicating source which has certain storage and interface conventions. For generic Unicode bidi plain text there is no reasonable alternative to logical text order. However, for contractual Unicode bidi text, this need not be the case. For example, a Unicode database front end hooked up to an Israeli database containing visual bidi data fields should have no problem doing the correct parsing (field by field--since the fields themselves provide the necessary context for relating visual to logical order). All communication to the database can be exactly in the form expected by the database, and communication with the user can be maintained with a visual order editing interface, if that is the expected behavior. The only expectation which cannot be met is that this can be done without having to change any software. However, it CAN be done without having to change the databases or the data stored in them. Scheinberg argued that the implicit algorithm for bidi text must be published in the Unicode document, so that all implementers will be doing the same thing. This was generally viewed as a desirable thing to do, but Becker and Davis claimed that the publication should not be held up for a 100% algorithm. The 99% case is relatively easy and should be published now. Davis agreed to write up a strawman statement of the relevant algorithm for discussion and comment. The meeting adjourned, but the arguments continued..... --Ken Whistler