[comp.lang.perl] Split question

bjaspan@athena.mit.edu (Barr3y Jaspan) (02/28/91)

I started off with the goal of splitting a string on whitespace
without breaking quoted strings and ended up coming across the
following split behavior that I do not understand.  This script

print "$]\n";
@_ = split(/ *(0[^0]+0) *| /, '0foo bar0 baz frob 0la la la0 quux');
print join(', ', @_),"\n";

produces the following output:

$Header: perly.c,v 3.0.1.9 90/11/10 01:53:26 lwall Locked $
Patch level: 41
, 0foo bar0, baz, , frob, 0la la la0, quux

(Think of 0 as a quotation mark and it will make more sense.  I didn't
want to deal with escaping ".)  Why is there a null entry between
baz and frob in the returned array?

Thanks.

Barr3y

piet@cs.ruu.nl (Piet van Oostrum) (03/01/91)

>>>>> bjaspan@athena.mit.edu (Barr3y Jaspan) (BJ) writes:

BJ> print "$]\n";
BJ> @_ = split(/ *(0[^0]+0) *| /, '0foo bar0 baz frob 0la la la0 quux');
BJ> print join(', ', @_),"\n";

BJ> produces the following output:

BJ> $Header: perly.c,v 3.0.1.9 90/11/10 01:53:26 lwall Locked $
BJ> Patch level: 41
BJ> , 0foo bar0, baz, , frob, 0la la la0, quux

BJ>   Why is there a null entry between baz and frob in the returned array?

Perl apparently always puts a 'separator' entry between the split entities,
even if the particular separator didn't match the part with parentheses
(this is the same with normal regexp matching). So if the separator is a
space the part between parentheses matches the null string. I think it
would be nicer if perl didn't include the separator in this case, but that
might be too difficult. It also would make it different from $1, $2,...
behaviour.

The solution is simply to weed out the null entries if there can't be any
legal null entries in the data (which is not the case in the above example
- there will be a null entry if there are two successive spaces, but that
could be a mistake). Anyway, the following might be what you mean:

@_ = grep(!/^$/, 
	split(/ *(0[^0]+0) *| +/, '0foo bar0 baz frob 0la la la0 quux'));
print join('|', @_),"\n";
-- 
Piet* van Oostrum, Dept of Computer Science, Utrecht University,
Padualaan 14, P.O. Box 80.089, 3508 TB Utrecht, The Netherlands.
Telephone: +31 30 531806   Uucp:   uunet!mcsun!ruuinf!piet
Telefax:   +31 30 513791   Internet:  piet@cs.ruu.nl   (*`Pete')