[comp.editors] repeated character editing

cadman@cbnewsm.att.com (jerome.schwartz) (03/29/91)

How would I do a simple edit command, vi map or sed script, to
replace all occurrences of repeated characters in a file with one
of each of the characters.

ie:   This:   AAAAPPPPLLLLEEEE   ppppiiiieeee
   to this:   APPLE pie

Thanks in advance,
Jerry
**********************************************************************
Jerome Schwartz               | Standard Disclaimer:
AT&T Bell Laboratories        |
Crawfords Corner Rd.          | Views expressed are my own and
Holmdel, N.J.  07733          | do not necessarily reflect those
Phone  : (201)-949-8560       | of my employer.
**********************************************************************

FFAAC09@cc1.kuleuven.ac.be (Nicole Delbecque & Paul Bijnens) (03/29/91)

In article <1991Mar28.181335.7813@cbnewsm.att.com>, cadman@cbnewsm.att.com
(jerome.schwartz) says:
>How would I do a simple edit command, vi map or sed script, to
>replace all occurrences of repeated characters in a file with one
>of each of the characters.
>
>ie:   This:   AAAAPPPPLLLLEEEE   ppppiiiieeee
>   to this:   APPLE pie

try   (on SYSV)
     tr -s '[\001-\377]' '[\001-\377]' < input  > output

giving:
     APLE pie
      ^
      |
  one letter P!   This program does not know to spell the words!
--
Polleke   (Paul Bijnens)
Linguistics dept., K. University Leuven, Belgium
FFAAC09@cc1.kuleuven.ac.be

cadman@cbnewsm.att.com (jerome.schwartz) (03/29/91)

In article <1991Mar28.181335.7813@cbnewsm.att.com>, cadman@cbnewsm.att.com (jerome.schwartz) writes:
> 
> How would I do a simple edit command, vi map or sed script, to
> replace all occurrences of repeated characters in a file with one
> of each of the characters.
> 
> ie:   This:   AAAAPPPPLLLLEEEE   ppppiiiieeee
>    to this:   APPLE pie
> 
> Thanks in advance,
> Jerry

The example should have been:

AAAAPPPPPPPPLLLLEEEE ppppiiiieeee

With 2 sets of 4 P's.

Still with no success,

Jerry
**********************************************************************
Jerome Schwartz               | Standard Disclaimer:
AT&T Bell Laboratories        |
Crawfords Corner Rd.          | Views expressed are my own and
Holmdel, N.J.  07733          | do not necessarily reflect those
Phone  : (201)-949-8560       | of my employer.
**********************************************************************

ruhtra@turing.toronto.edu (Arthur Tateishi) (03/30/91)

In article <1991Mar28.181335.7813@cbnewsm.att.com> cadman@cbnewsm.att.com (jerome.schwartz) writes:
>
>How would I do a simple edit command, vi map or sed script, to
>replace all occurrences of repeated characters in a file with one
>of each of the characters.
>
>ie:   This:   AAAAPPPPLLLLEEEE   ppppiiiieeee
>   to this:   APPLE pie

If your example was a typo and should have been: APLE pie
then the following reg-exp will work.
:s/\(.\)\1*/\1/g

-- 
Red Alert.
    -- Q, "Deja Q", stardate 43539.1
Arthur Tateishi                 g9ruhtra@zero.cdf.utoronto.edu

FFAAC09@cc1.kuleuven.ac.be (Nicole Delbecque & Paul Bijnens) (03/30/91)

In article <1991Mar29.145724.5569@cbnewsm.att.com>, cadman@cbnewsm.att.com
(jerome.schwartz) says:
>
>In article <1991Mar28.181335.7813@cbnewsm.att.com>, cadman@cbnewsm.att.com
>(jerome.schwartz) writes:
>>
>> How would I do a simple edit command, vi map or sed script, to
>> replace all occurrences of repeated characters in a file with one
>> of each of the characters.
>>
>> ie:   This:   AAAAPPPPLLLLEEEE   ppppiiiieeee
>>    to this:   APPLE pie
>>
>> Thanks in advance,
>> Jerry
>
>The example should have been:
>
>AAAAPPPPPPPPLLLLEEEE ppppiiiieeee
>
>With 2 sets of 4 P's.
>
>Still with no success,

The slighty changed problem has the next solution:

   sed -e 's/\(.\)\1\1\1/\1/g' inputfile > outfile

This changes 4 repeates of the same letter to 1 letter.
It leaves all the other letters (the space in your example) as is.
--
Polleke   (Paul Bijnens)
Linguistics dept., K. University Leuven, Belgium
FFAAC09@cc1.kuleuven.ac.be

noraa@cbnewsk.att.com (aaron.l.hoffmeyer) (04/01/91)

In article <1991Mar30.024851.24414@jarvis.csri.toronto.edu> ruhtra@turing.toronto.edu (Arthur Tateishi) writes:
>In article <1991Mar28.181335.7813@cbnewsm.att.com> cadman@cbnewsm.att.com (jerome.schwartz) writes:
>>
>>How would I do a simple edit command, vi map or sed script, to
>>replace all occurrences of repeated characters in a file with one
>>of each of the characters.
>>
>>ie:   This:   AAAAPPPPLLLLEEEE   ppppiiiieeee
>>   to this:   APPLE pie
>
>If your example was a typo and should have been: APLE pie
>then the following reg-exp will work.
>:s/\(.\)\1*/\1/g
>
>-- 
>Red Alert.
>    -- Q, "Deja Q", stardate 43539.1
>Arthur Tateishi                 g9ruhtra@zero.cdf.utoronto.edu


I've seen the original question asked several times in this news.group
in just tha last six months.  Yes, there are many solutions to this
problem, using tr, sed, reg exps, awk, perl etc. etc. etc.  But some of
the responses and even the people asking the question ignore the
situation that creates the problem.  The ONLY time I have ever seen 4
characters in place of one character is when someone directs nroff
output to a file, then searches for the literal backspace characters
(and underscores, if present) and replaces them with nothing.  nroff
uses the trick of overstriking to embolden words (or underline them).
So, the simplest solution to this problem is to not create it in the
first place.  If you filter the nroff output through col (which is a
standard command in UNIX, I think - maybe it is just system V) with the
-b option,  then you get plain ASCII output that does not have
backspaces (or underscores and backspaces) and multiple occurences of
characters.

I can't recall if there is an "Often Asked Questions" posting in this
newsgroup, but if there is, I am sure this solution is in there.  If
there isn't such a posting for this group, maybe there should be and
this question should be included.

Aaron L. Hoffmeyer
TR@CBNEA.ATT.COM