manfred@swi.psy.uva.nl (Manfred Aben) (10/02/90)
Hi there! On behalf of a friend who does not have access to the NewsNet, I would like to know if any of you have some ideas about the following (possibly also other newsgroups): Does SGML provide a way of specifying an 'Umlaut', as is used in German, in contrast wit a 'diaresis', used in many languages, such as the Dutch language. In printed text, i.e. typographically, these two are not distinguishable anymore. Note that an umlaut historically was printed as two vertical stripes (") on top of a vowel, whereas a diaresis should e printed as two dots (..) on top of a vowel. These two things are not the same, the umlaut changes the sound of the vowel, whereas the diaresis is used to mark a seperation in a word. All replies are greatly appreciated (references, ideas?) You can mail me directly (manfred@swi.psy.uva.nl) or reply in this group. Znx in advance! =;-) ---------------------------------------------------- Manfred Aben Dept. of Social Sciences Informatics University of Amsterdam Herengracht 196 1016 BS AMSTERDAM (The Netherlands) -----------------------------------------------------
killian@galvia.enet.dec.com (10/09/90)
To: manfred@swi.psy.uva.nl () Cc: Subject: Re: Umlaut vs. Diaresis (?) > Does SGML provide a way of specifying an 'Umlaut', as is used in German, in > contrast wit a 'diaresis', used in many languages, such as the Dutch language. In general, yes! There are two possible ways of doing this but the choice of which depends on whether you want to enter the accented character in text (ie: content) or markup. In the more common case where you want to enter the accented character in text, SGML would advise the use of an SDATA entity (system specific data entity). For example, ö could be used to enter a small 'o' with an umlat, and &odiar; could be used to enter a small 'o' with a diaresis. Of course these entities need to be declared in the scope of the Document Type Definition. This is normally done for a complete collection of such entities and transportability is enhansed when user communities can agree on, and standardise, such entity sets. 'ouml' is present in the ISO Latin 1 entity set published as an informative annex to the SGML standard; its description includes 'o' diaresis, so this particular entity set will not solve your problem. In addition, an SGML system must be able to correctly interpret the 'o' diaresis entity reference when the parser finds it in the text. For example, your SGML typesetting system must be able to translate the declared value of the 'odiar' entity to the correct glyph shape. The other solution, which would also allow the use of the accented character in markup (eg: tag names), is to use a character set that has a code position for the accented character (a different code position than the similar 'o' umlat). SGML is character set independant, in that the SGML declaration (before the Document Type Definition, but absent from most SGML documents) allows the identification or definition of the document character set. Of course, the SGML parser must be able to accept the SGML declaration (not every one does) and the SGML system must be able to accept (eg: typeset) text in that character set. Defining your own character set is not always a smart thing to do. I have also seen some unconventional solutions to your problem. One such solution involved defining a special element (tag) that was used to enter accented characters. For example: <special>odiar</special>. Again, this element would have to be defined in the scope of the Document Type Definition and the SGML system would have to be capable of translating the 'odiar' text into the required accented character. My recommendation is to use the SDATA entity solution if the accented character is not required in markup. Regards, Aidan
inc@tc.fluke.COM (Gary Benson) (10/11/90)
In article <4410@swi.swi.psy.uva.nl> manfred@swi.psy.uva.nl () writes: >Does SGML provide a way of specifying an 'Umlaut', as is used in German, in >contrast wit a 'diaresis', used in many languages, such as the Dutch language. >In printed text, i.e. typographically, these two are not distinguishable >anymore. Note that an umlaut historically was printed as two vertical stripes >(") on top of a vowel, whereas a diaresis should e printed as two dots (..) on >top of a vowel. >These two things are not the same, the umlaut changes the sound of the vowel, >whereas the diaresis is used to mark a seperation in a word. >All replies are greatly appreciated (references, ideas?) Those two dots are not either an umlaut **OR** a diaresis - - - for example, in Finnish, a letter "A" with two dotss above is a separate letter altogether...A(two-dots) comes after Z in the alphabet! The same holds true inother languages.... French for example uses "grave" and "acute" "accents" to make different letters, (Oh! look at the German "ss" that looks like an English upper-case letter "B") Instead of trying to get the world to see things the way some local language sees it, don't you think we'd gain more by looking at symbols attached to letters as part of the letter, rather than trying to define what they mean in the language of origin? Really. In Finnish, I canmnmot write write "maki" on this vt100 terminal. "maki" is just a hill. In Finnish, it needs two dots above the "A" to make it a real word. If you read it without the two dots, it means nothing. So: in some languages at least, the two dots are not an intensive -- they make the letter into a diofferent letter. -- Gary Benson -=[ S M I L E R ]=- -_-_-_-inc@fluke.com_-_-_-_-_-_-_-_-_-_- He who shits on the road will meet flies on his return. -South African Proverb