jmc@eagle.inesc.pt (Miguel Casteleiro) (01/29/91)
Hi there! I have some problems that I need to solve so I can finish some perl scripts. 1) I need to split the following line: "This is a line ( test,with ugly typing." into the array: ('"This','is','a','line','(','test,','with','ugly','typing."') Please note the punctuation characters. Is there a split pattern to do this? 2) I need to replace the word: 'teste' by the word: 'aebtecd' In this replace operation the character 't' gets replaced by 'a' and 'c' is appended to the word, and the character 's' is replaced by 'b' and 'd' is appended to the word. I need this strange replace to use a 7-bit sort to sort 8-bit (ISO-8859-1) text. What I need is to translate some characters by some others, and for each character translated I need to append a character to the string (something like: tr/ts/ab/cd/ :-). Is there an easy way to do this? 3) Finally, is there an easy way to print numbers in the form: 12345678.12 -> 12,345,678$12 Thanks for any help! -- __ Miguel Casteleiro at __ /// INESC, Lisboa, Portugal. "News: so many articles, \\\/// Only Email: jmc@eagle.inesc.pt so little time..." \XX/ Amiga
tchrist@convex.COM (Tom Christiansen) (01/30/91)
From the keyboard of jmc@eagle.inesc.pt (Miguel Casteleiro): :Hi there! Bom Dia! Voce^s conhecem Perl no Portugal???? :I have some problems that I need to solve so I can finish some :perl scripts. : :1) I need to split the following line: : : "This is a line ( test,with ugly typing." : :into the array: : : ('"This','is','a','line','(','test,','with','ugly','typing."') : :Please note the punctuation characters. :Is there a split pattern to do this? Well....... I can think of a several ways off the top of my head: 0) You can split on /([\s,]+)/ and retain the delimiters and then run back through the array and merge the ones that are commas and toss those that aren't. Ug. 1) You could first simply split on white space, and then run back through the array looking for \S,\S and splitting those, but retaining the comma. Kinda ug. 2) You can munge the data first to fix the ugly typing: s/,(\S)/, $1/g; and now split on white space as usual. This seems best to me of these three approaches. :2) I need to replace the word: 'teste' : by the word: 'aebtecd' : :In this replace operation the character 't' gets replaced by 'a' :and 'c' is appended to the word, and the character 's' is replaced :by 'b' and 'd' is appended to the word. :I need this strange replace to use a 7-bit sort to sort 8-bit :(ISO-8859-1) text. :What I need is to translate some characters by some others, and :for each character translated I need to append a character to the :string (something like: tr/ts/ab/cd/ :-). :Is there an easy way to do this? I ask myself which 7-bit ascii sort you're using, and why the existing sorts don't work for 8-bits. It's the collating sequence, right? For your example, I did this: $_ = 'teste'; $_ .= 'c' x s/t/a/g; $_ .= 'd' x s/s/b/g; print "result is $_\n"; or with a different variable: $foo = 'teste'; $foo .= 'c' x ($foo =~ s/t/a/g); $foo .= 'd' x ($foo =~ s/s/b/g); print "result is $foo\n"; But that yields 'aebaeccd', not what you said you wanted. Did you not want all the t's translated? If you only want the first one, the /g should be removed. :3) Finally, is there an easy way to print numbers in the form: : : 12345678.12 -> 12,345,678$12 $_ = '12345678.12'; # note quotes!!! s/\./\$/; 1 while s/(.*\d)(\d{3})/$1,$2/; result -> "12,345,678$12" or in euronotation: $_ = '12345678.12'; s/\./\,/; 1 while s/(.*\d)(\d{3})/$1.$2/; # result -> "12.345.678,12" The quotes are so we don't have problems going into floating point notation. This would also help first: $_ = sprintf("%10.2f", $_); # discard boring bits --tom -- "Hey, did you hear Stallman has replaced /vmunix with /vmunix.el? Now he can finally have the whole O/S built-in to his editor like he always wanted!" --me (Tom Christiansen <tchrist@convex.com>)
jmc@eagle.inesc.pt (Miguel Casteleiro) (01/31/91)
In article <1991Jan29.171228.17738@convex.com> tchrist@convex.COM (Tom Christiansen) writes: >From the keyboard of jmc@eagle.inesc.pt (Miguel Casteleiro): >:Hi there! > >Bom Dia! Voce^s conhecem Perl no Portugal???? ^^em Sim, e mais algumas coisitas! >:1) I need to split the following line: >: >: "This is a line ( test,with ugly typing." >: >:into the array: >: >: ('"This','is','a','line','(','test,','with','ugly','typing."') >: > [some ways off do it deleted] > >2) You can munge the data first to fix the ugly typing: > s/,(\S)/, $1/g; > and now split on white space as usual. This seems best to me > of these three approaches. I'll use this approach. It seems to work fine. Thanks! >:2) I need to replace the word: 'teste' >: by the word: 'aebtecd' >: >:In this replace operation the character 't' gets replaced by 'a' >:and 'c' is appended to the word, and the character 's' is replaced >:by 'b' and 'd' is appended to the word. >:I need this strange replace to use a 7-bit sort to sort 8-bit >:(ISO-8859-1) text. >:What I need is to translate some characters by some others, and >:for each character translated I need to append a character to the >:string (something like: tr/ts/ab/cd/ :-). >:Is there an easy way to do this? > >I ask myself which 7-bit ascii sort you're using, and why the existing >sorts don't work for 8-bits. It's the collating sequence, right? I'm using 7-bit and 8-bit sorts and none of them do what I want! The test I gave was incorrect, sorry :-( What I want is to replace the word 'teste' by the word 'aebaecdc'. I'll explain better what I want. Let's say that "A" is an "a" with a grave accent and "B" is an "a" with an acute accent. So, the sorting order will be (at least for the portuguese): a A B b c d e f ... So, I will have the following sorted words: Aac abc Abc acbd Please note that "a" = "A" = "B" only if the words are different (not counting the characters "a", "A" and "B"). If they are equal then "a" < "A" < "B". To accomplish this, the best way I can think of, is to replace: "a" by "a" and append "a" "A" by "a" and append "b" "B" by "a" and append "c" So, a 7-bit sort will see the previous words as: aacba abca abcb acbda and will sort properly. The code I'm using to do this is: $word = "Aac"; $_ = $word; tr/aAB/aaa/; $sword = $_; $_ = $word; tr/aAB//c; tr/aAB/abc/; $sword .= $_; and $sword will be 'aacba'. If there is an easy way to do this, please let me know. Also, if someone can think of a better way to sort 8-bit text, please let me know. > [ A solution for the 'teste' example deleted ] > >:3) Finally, is there an easy way to print numbers in the form: >: >: 12345678.12 -> 12,345,678$12 > > $_ = '12345678.12'; # note quotes!!! > > s/\./\$/; > 1 while s/(.*\d)(\d{3})/$1,$2/; > > result -> "12,345,678$12" I'll use this code, Thanks! > [ A solution for the euronotation deleted ] > >--tom > always wanted!" --me (Tom Christiansen <tchrist@convex.com>) -- __ Miguel Casteleiro at __ /// INESC, Lisboa, Portugal. "News: so many articles, \\\/// Only Email: jmc@eagle.inesc.pt so little time..." \XX/ Amiga
raymond@math.berkeley.edu (Raymond Chen) (01/31/91)
In article <1991Jan30.181924.47@eagle.inesc.pt>, jmc@eagle (Miguel Casteleiro) writes: >[T]he sorting order will be (at least for the portuguese): > >a A B b c d e f ... > >If there is an easy way to do this, please let me know. # This is a standard trick. # You only need to do this part once. $portuguese_order = "aABbcdef"; $ascii_order = pack("c" . length($portuguese_order), 1 .. length($portuguese_order)); eval 'sub port2sort { foreach (@_) { tr/'.$portuguese_order.'/'.$ascii_order.'/; } }'; eval 'sub sort2port { foreach (@_) { tr/'.$ascii_order.'/'.$portuguese_order.'/; } }'; # and here's how you use it: @words = ("a", "A", "B", "c", "b"); &port2sort(@words); # convert to intermediate format @sorted_words = sort @words; # sort the intermediate format &sort2port(@sorted_words); # convert back print join(":", @sorted_words); # Observe that a similar trick can be used to perform other types of sorting; # for example, if you want digits to sort *after* letters, or if you want # the letter "p" to alphabetize before the letter "h", like this: for(sort("herl ","Just ","packer,","anotper ")){y/Jahp/Japh/;print;}
ccplumb@rose.uwaterloo.ca (Colin Plumb) (02/02/91)
So you start out with a key string, with some magic letters. You want to build a primary key with the magic letters replaced by other letters and the non-magic ones retained, and a secondary key with the magic letters replaced by others with the non-magic letters deleted. This is easy, with tr/.../.../: $primary = $secondary = $_; $primary ~= tr/<magic>/<replace>/; $secondary ~= tr/<non-magic>//d; $secondary ~= tr/<magic>/<replace>/; $key = $primary . $secondary;; Will that do what you want? Note: it might be a good idea to ensure the first character of the secondary key sorts to less than any possible letter in the primary, with an explicit delimiter (space works well, as does null...) if necessary. This is so (if uppercase letters are magic and their non-magic replacements are lower-case) aBcda -> abcdab aBcd -> abcdb These sort in order abcdab, abcdb, which is equivalent to aBcda, aBcd, which isn't what you eventually want if I understand the problem correctly. -- -Colin
ccplumb@rose.uwaterloo.ca (Colin Plumb) (02/16/91)
jmc@eagle.inesc.pt (Miguel Casteleiro) wrote: >Hi there! > >I have some problems that I need to solve so I can finish some >perl scripts. > >1) I need to split the following line: > > "This is a line ( test,with ugly typing." > >into the array: > > ('"This','is','a','line','(','test,','with','ugly','typing."') > >Please note the punctuation characters. >Is there a split pattern to do this? Well, it depends on details, but defining a word as \w+, whitespace as \s+, and punctuation as [^\w\s]+, and assuming you want to split after punctuation and before words, even if there is no explicit space, then just add one and split as usual: s/([^\w\s]+)(\w)/\1 \2/g split; >2) I need to replace the word: 'teste' > by the word: 'aebtecd' > >In this replace operation the character 't' gets replaced by 'a' >and 'c' is appended to the word, and the character 's' is replaced >by 'b' and 'd' is appended to the word. >I need this strange replace to use a 7-bit sort to sort 8-bit >(ISO-8859-1) text. >What I need is to translate some characters by some others, and >for each character translated I need to append a character to the >string (something like: tr/ts/ab/cd/ :-). >Is there an easy way to do this? Yes. $suffix = $_; $suffix =~ tr/ts//cd; # delete anything other then t and d # ^^ This 'cd' has *nothing* to do with the one below! $suffix =~ tr/ts/cd/; # map t and s to c and d tr/ts/ab/; $_ .= $suffix; -- -Colin