markb@agora.uucp (Mark Biggar) (07/11/90)
As Larry said the simplest way to write a tokenizer in perl is to use s/^...// to chop token off the front of your string. With that in mind the following are a set of usefull regular expressions for this purpose: m|/\*[^*]*\*+([^/][^*]*\*+)*/| matches just the first C-style nonnested comment on line /"[^"\\]*(\\.[^"\\]*)*"/ matches just the first " string on line with \ escapes /("[^"]*")+/ matches just the first string on line ADA style You can use the following to translate C-style \ escapes in a string matched by the RE above. NOTE: the order of the alternatives in the RE below if significant. s/\\(([0-7]{1,3})|x([\da-fA-F]+)|(.))/$trans($2,$3,$4)/eg sub trans { local($oct,$hex,$single) = @_; if ($oct ne '') { pack("c",oct($oct)); } elsif ($hex ne '') { pack("c",hex($hex)); } else { #singleton case must have matched if others didn't substr($trans,ord($single),1); # def of $trans left as exericse for reader :-) } } -- Perl's Maternal Uncle Mark Biggar