[comp.lang.c] Request for help w/porting strings

frotz@drivax.UUCP (Frotz) (09/23/89)

	Does anyone have any experience porting strings from 7- & 8-bit
languages to 16-bit languages like Kanji?  Intuitively, I should be
using mblen(), mbtowc(), mbstowcs(), wctomb() and wcstombs().

	All of the manuals that I have for this are very clear on the
function of these calls.  However, it does not give me any clue as to
WHY I should use them...  Any help would be greatly appreciated. 

--Frotz @Digital Research, Incorporated		amdahl!drivax!frotz
	 70 Garden Court, B15			(408) 649-3896
	 Monterey, California  93940		Ask for John Fa'atuai

"Something to talk about other than free(), NULL, etc..."
	-- Me;-)

aic@mentor.cc.purdue.edu (George A. Basar) (09/25/89)

In article <251AB160.3031@drivax.UUCP>, frotz@drivax.UUCP (Frotz) writes:
> 
> 	All of the manuals that I have for this are very clear on the
> function of these calls.  However, it does not give me any clue as to
> WHY I should use them...  Any help would be greatly appreciated. 
> 
       The 'why' use of these routines is to ensure compatability on
machines running DBCS(double byte character set) display devices. This makes
your application (relatively) easily portable to DBCS machines.
       The 'where' use of these routines, is on any data that may be displayed
by your application.  This includes strings, but is not limited to them.  For
national language compatability you should also consider date and monetary
formats.


* George A. Basar                              (317)742-8799 (home)
* aic@mentor.cc.purdue.edu                     basar@PURCCVM.BITNET    
| General Consultant                   	       (317)494-1787 (work)
| Purdue University Computing Center

frotz@drivax.UUCP (Frotz) (09/26/89)

aic@mentor.cc.purdue.edu (George A. Basar) writes:

>In article <251AB160.3031@drivax.UUCP>, frotz@drivax.UUCP (Frotz) writes:
>> 
>> 	All of the manuals that I have for this are very clear on the
>> function of these calls.  However, it does not give me any clue as to
>> WHY I should use them...  Any help would be greatly appreciated. 
>> 
>       The 'why' use of these routines is to ensure compatability on
>machines running DBCS(double byte character set) display devices. This makes
>your application (relatively) easily portable to DBCS machines.
>       The 'where' use of these routines, is on any data that may be displayed
>by your application.  This includes strings, but is not limited to them.  For
>national language compatability you should also consider date and monetary
>formats.

	Sorry.  Perhaps I misused the word 'why'.  I understand that
these services are for internationalization.  My question, restated,
is "When do you use wctomb() and mbtowc()?".  "In what order do you
use these routines?".  Perhaps a psuedo-code-fragment would help.


	For a frame work, take this situation.  I have a small set of
embedded strings (everything else has been externalized).
Why/How/When do I need to convert to a 'wide-char' or
'multi-byte-string' and when/how/why do I convert back?

	I understand that Kanji characters are a series of
<character><modifier> sequences, my question is are these characters
wide-chars or are they multi-byte strings?

	Finally, does anyone have any warnings about porting to 16-bit
languages?  I have heard mention about checking to see if your
character is really the character you want and not a modifier.  Any
other problems? 

--
Frotz

aic@mentor.cc.purdue.edu (George A. Basar) (09/27/89)

> aic@mentor.cc.purdue.edu (George A. Basar) writes:
> >In article <251AB160.3031@drivax.UUCP>, frotz@drivax.UUCP (Frotz) writes:
> >> 	All of the manuals that I have for this are very clear on the
...
> >> WHY I should use them...  Any help would be greatly appreciated. 
> >> 
> >       The 'why' use of these routines is to ensure compatability on
...
> 
> 	Sorry.  Perhaps I misused the word 'why'.  I understand that
> 
  My mistake, too. Sorry.
> 
> 	For a frame work, take this situation.  I have a small set of
...

  The Kanji character set(all this info is from doing some NLS work on
a Kanji PS/2) is a double byte character set.  This means two bytes to 
represent a single Kanji character. I imagine(conjecture here, I'm unsure of
what wide-char is supposed to represent, never heard the term) that the 
wctomb routines are for reading input, since the input device(keyboard)
use display for reading will generate some wide-char representation for
dislay, and the mbtowc() routines are for output conversion.
> 
> 	Finally, does anyone have any warnings about porting to 16-bit
> languages?  I have heard mention about checking to see if your
> character is really the character you want and not a modifier.  Any
> other problems? 
  Along these lines, the standard C string library is not DBCS enabled,
For things like strncpy, you have to examine the characters to make sure
you won't split a DBCS character. Strcmp to make sure the characters are
of the same type, etc.

> Frotz

* George A. Basar                              (317)742-8799 (home)
* aic@mentor.cc.purdue.edu                     basar@PURCCVM.BITNET    
| General Consultant                   	       (317)494-1787 (work)
| Purdue University Computing Center