bjaspan@athena.mit.edu (Barr3y Jaspan) (02/28/91)
I started off with the goal of splitting a string on whitespace without breaking quoted strings and ended up coming across the following split behavior that I do not understand. This script print "$]\n"; @_ = split(/ *(0[^0]+0) *| /, '0foo bar0 baz frob 0la la la0 quux'); print join(', ', @_),"\n"; produces the following output: $Header: perly.c,v 3.0.1.9 90/11/10 01:53:26 lwall Locked $ Patch level: 41 , 0foo bar0, baz, , frob, 0la la la0, quux (Think of 0 as a quotation mark and it will make more sense. I didn't want to deal with escaping ".) Why is there a null entry between baz and frob in the returned array? Thanks. Barr3y
piet@cs.ruu.nl (Piet van Oostrum) (03/01/91)
>>>>> bjaspan@athena.mit.edu (Barr3y Jaspan) (BJ) writes:
BJ> print "$]\n";
BJ> @_ = split(/ *(0[^0]+0) *| /, '0foo bar0 baz frob 0la la la0 quux');
BJ> print join(', ', @_),"\n";
BJ> produces the following output:
BJ> $Header: perly.c,v 3.0.1.9 90/11/10 01:53:26 lwall Locked $
BJ> Patch level: 41
BJ> , 0foo bar0, baz, , frob, 0la la la0, quux
BJ> Why is there a null entry between baz and frob in the returned array?
Perl apparently always puts a 'separator' entry between the split entities,
even if the particular separator didn't match the part with parentheses
(this is the same with normal regexp matching). So if the separator is a
space the part between parentheses matches the null string. I think it
would be nicer if perl didn't include the separator in this case, but that
might be too difficult. It also would make it different from $1, $2,...
behaviour.
The solution is simply to weed out the null entries if there can't be any
legal null entries in the data (which is not the case in the above example
- there will be a null entry if there are two successive spaces, but that
could be a mistake). Anyway, the following might be what you mean:
@_ = grep(!/^$/,
split(/ *(0[^0]+0) *| +/, '0foo bar0 baz frob 0la la la0 quux'));
print join('|', @_),"\n";
--
Piet* van Oostrum, Dept of Computer Science, Utrecht University,
Padualaan 14, P.O. Box 80.089, 3508 TB Utrecht, The Netherlands.
Telephone: +31 30 531806 Uucp: uunet!mcsun!ruuinf!piet
Telefax: +31 30 513791 Internet: piet@cs.ruu.nl (*`Pete')