[comp.lang.perl] problems with umlaute

ruba@molbio.ethz.ch (Rudolf Baumann) (01/02/91)

I have a small problem converting 'Umlaute' in a database file from 
a Vax. To preserve the special characters I transferred the file in
binary mode. The  characters in question have the MSB set eg. an
a umlaut becomes octal 344. Is there a any possibility in perl to
substitute such a character by a more useful sequence like 'ae'
(for a umlaut)? I didn't succeed with my attempts to solve the problem.
My intention is to print out a table from this file, but the foreign
characters disturb the appearance of this table.
Thank you for any hint.
	ruedi
--
Rudolf E. Baumann                                    ruba@molbio.ethz.ch
Institut fuer Molekularbiologie & Biophysik          ruba@biophys.uucp
ETH Hoenggerberg (HPM G6)                            MOLEKULA@CZHETH5A.bitnet
CH-8093 Zuerich/Switzerland                          Tel. ++41 1 377 33 97

inc@tc.fluke.COM (Gary Benson) (01/16/91)

In article <6527@biophys.zir.ethz.ch> ruba@molbio.ethz.ch (Rudolf Baumann) writes:
>
>I have a small problem converting 'Umlaute' in a database file from 
>a Vax. To preserve the special characters I transferred the file in
>binary mode. The  characters in question have the MSB set eg. an
>a umlaut becomes octal 344. Is there a any possibility in perl to
>substitute such a character by a more useful sequence like 'ae'
>(for a umlaut)? I didn't succeed with my attempts to solve the problem.
>My intention is to print out a table from this file, but the foreign
>characters disturb the appearance of this table.
>Thank you for any hint.
>	ruedi
>--
>Rudolf E. Baumann                                    ruba@molbio.ethz.ch
>Institut fuer Molekularbiologie & Biophysik          ruba@biophys.uucp
>ETH Hoenggerberg (HPM G6)                            MOLEKULA@CZHETH5A.bitnet
>CH-8093 Zuerich/Switzerland                          Tel. ++41 1 377 33 97

I needed to do exactly that with WordPerfect 8-bit codes; here's
what I was able to come up with:

#! /usr/local/perl

# German substitutions - 8-bit WordPerfect ascii to common sequences

while (<>) {    
    s/\201/ue/g;                  # u-umlaut
    s/\204/ae/g;                  # a-umlaut
    s/\204/oe/g;                  # o-umlaut
    s/\232/Ue/g;                  # U-umlaut
    s/\341/ss/g;                  # eszet
    print;
           }

Change the numbers to whatever you've got coming in, and you're in business.
Or, as a subroutine:

sub subs {
    $tmp =~ s/\201/ue/g;
    $tmp =~ s/\204/ae/g;
    $tmp =~ s/\204/oe/g;
    $tmp =~ s/\232/Ue/g;
    $tmp =~ s/\341/ss/g;
print $tmp;
         }


I hope this is a help -- if so, it marks my *first* useful submission to
this newsgroup! I am thankful for the opportunity, having received so
much help here myself.



-- 
Gary Benson    -=[ S M I L E R ]=-   -_-_-_-inc@fluke.com_-_-_-_-_-_-_-_-_-_-

Those who mourn for "USENET like it was" should remember the original design
estimates of maximum traffic volume: 2 articles/day.   -Steven Bellovin