ruba@molbio.ethz.ch (Rudolf Baumann) (01/02/91)
I have a small problem converting 'Umlaute' in a database file from a Vax. To preserve the special characters I transferred the file in binary mode. The characters in question have the MSB set eg. an a umlaut becomes octal 344. Is there a any possibility in perl to substitute such a character by a more useful sequence like 'ae' (for a umlaut)? I didn't succeed with my attempts to solve the problem. My intention is to print out a table from this file, but the foreign characters disturb the appearance of this table. Thank you for any hint. ruedi -- Rudolf E. Baumann ruba@molbio.ethz.ch Institut fuer Molekularbiologie & Biophysik ruba@biophys.uucp ETH Hoenggerberg (HPM G6) MOLEKULA@CZHETH5A.bitnet CH-8093 Zuerich/Switzerland Tel. ++41 1 377 33 97
inc@tc.fluke.COM (Gary Benson) (01/16/91)
In article <6527@biophys.zir.ethz.ch> ruba@molbio.ethz.ch (Rudolf Baumann) writes: > >I have a small problem converting 'Umlaute' in a database file from >a Vax. To preserve the special characters I transferred the file in >binary mode. The characters in question have the MSB set eg. an >a umlaut becomes octal 344. Is there a any possibility in perl to >substitute such a character by a more useful sequence like 'ae' >(for a umlaut)? I didn't succeed with my attempts to solve the problem. >My intention is to print out a table from this file, but the foreign >characters disturb the appearance of this table. >Thank you for any hint. > ruedi >-- >Rudolf E. Baumann ruba@molbio.ethz.ch >Institut fuer Molekularbiologie & Biophysik ruba@biophys.uucp >ETH Hoenggerberg (HPM G6) MOLEKULA@CZHETH5A.bitnet >CH-8093 Zuerich/Switzerland Tel. ++41 1 377 33 97 I needed to do exactly that with WordPerfect 8-bit codes; here's what I was able to come up with: #! /usr/local/perl # German substitutions - 8-bit WordPerfect ascii to common sequences while (<>) { s/\201/ue/g; # u-umlaut s/\204/ae/g; # a-umlaut s/\204/oe/g; # o-umlaut s/\232/Ue/g; # U-umlaut s/\341/ss/g; # eszet print; } Change the numbers to whatever you've got coming in, and you're in business. Or, as a subroutine: sub subs { $tmp =~ s/\201/ue/g; $tmp =~ s/\204/ae/g; $tmp =~ s/\204/oe/g; $tmp =~ s/\232/Ue/g; $tmp =~ s/\341/ss/g; print $tmp; } I hope this is a help -- if so, it marks my *first* useful submission to this newsgroup! I am thankful for the opportunity, having received so much help here myself. -- Gary Benson -=[ S M I L E R ]=- -_-_-_-inc@fluke.com_-_-_-_-_-_-_-_-_-_- Those who mourn for "USENET like it was" should remember the original design estimates of maximum traffic volume: 2 articles/day. -Steven Bellovin