[comp.lang.perl] split problem with non-ascii chars

jv@mh.nl (Johan Vromans) (01/13/91)

[I was preparing this when patches 42-44 came in. So I quickly built
3.044 to verify that the problem still exists. It did.]

System: DECsystem5000 / Ultrix4.0 / Gcc 1.37 [OSF version]

'split' considers the non-ascii character \0351 to be whitespace:

 	@a = split (/\s+/,"abc\351def");
	print join(":",@a), "\n";
	=> abc:def

But with \s instead of \s+ :

 	@a = split (/\s/,"abc\351def");
	print join(":",@a), "\n";
	=> abc\351def	

However,

	"abc\351def" =~ /\s/
	"abc\351def" =~ /\s+/

fail, like they should. Apparently the regexp matching and split use a
different approach.

The problem does not occur on VAX3100 / Ultrix3.1 / Gcc 1.37 .

If it helps: in C:

	isspace('\351') => 8
	char c = '\351'; isspace(c) => 8
	unsigned char c = '\351'; isspace(c) => 0

Johan
-- 
Johan Vromans				       jv@mh.nl via internet backbones
Multihouse Automatisering bv		       uucp: ..!{uunet,hp4nl}!mh.nl!jv
Doesburgweg 7, 2803 PL Gouda, The Netherlands  phone/fax: +31 1820 62911/62500
------------------------ "Arms are made for hugging" -------------------------