[comp.emacs] Removal of non-printable characters

aks%anywhere@HUB.UCSB.EDU (Alan Stebbens) (05/02/91)

> I have got a question for you emacs experts: I have some files which
> contain bunch of non-printable characters, such as Control-L (served
> as page breaker), Control-M (line breaker). How do I remove all of
> them from the current buffer if I edit such a file in Emacs? I could,
> in an awkward way, use "search" or "query-replace" function to find
> and kill those characters one by one, but it's very slow (even though
> I can define a little macro to improve the efficiency).  Can you offer
> me any smarter way?

Try:

M-x query-replace-regexp [ C-q C-a - C-q C-h C-q C-k C-q C-l C-q C-n - C-q C-] ] RET RET !

In the above sequence, there are no spaces typed, and "C-m" means
entering the "Enter" key on most keyboards.

The string between the brackets is a regular expression which matches
the character class from Ctrl-A to Ctrl-H, Ctrl-K, Ctrl-L, and Ctrl-N
thru Ctrl-].  The null replacement string means to replace them with
nothing.  This character class regexp specifically excludes Ctrl-I
(TAB), Ctrl-J (NEWLINE), and Ctrl-M (RETURN), all of which are valid
control characters in Unix ASCII text files.  Emacs uses Ctrl-M as
"hidden line" terminators, rather than the usual NEWLINE.

You may also not want to include the Ctrl-L as part of the regexp if you
like to keep "form feeds" in your files; these, too, are useful
characters in the file.

The "!" is a command to the first replacement found, saying to replace
all the occurances in the file.  Of course, if you want to watch each
replacement, just hit the spacebar for each change.

Do "C-h f query-replace-regexp" to read the documentation.

Good luck.

Alan Stebbens        <aks@hub.ucsb.edu>             (805) 893-3221
     Center for Computational Sciences and Engineering (CCSE)
          University of California, Santa Barbara (UCSB)
           3111 Engineering I, Santa Barbara, CA 93106