[comp.emacs] ALL CAPS-->Mixed case converter?

science@nems.dt.navy.mil (Mark Zimmermann) (09/01/90)

does anybody have (a pointer to) an Emacs way to convert ALL CAPITAL
LETTER TEXT into nice mixed-case text?  There are obviously many
degrees of sophistication in doing this; I'd like to be able to give
the system a file of words to be capitalized and/or words to be
lowercased and/or words to be all caps (NASA, USA, etc.), to
customize, and it should try to find and capitalize the first word of
each sentence.  Any suggestions??  Is this better done in C or awk
than in Emacs??  Tnx for help! - ^z - science@nems.dt.navy.mil

mdb@ESD.3Com.COM (Mark D. Baushke) (09/02/90)

On 1 Sep 90 13:06:34 GMT, in article <3245@nems.dt.navy.mil> posted
to comp.emacs, science@nems.dt.navy.mil (Mark Zimmermann) writes:

Mark> does anybody have (a pointer to) an Emacs way to convert ALL CAPITAL
Mark> LETTER TEXT into nice mixed-case text?  There are obviously many
Mark> degrees of sophistication in doing this; I'd like to be able to give
Mark> the system a file of words to be capitalized and/or words to be
Mark> lowercased and/or words to be all caps (NASA, USA, etc.), to
Mark> customize, and it should try to find and capitalize the first word of
Mark> each sentence.  Any suggestions??  Is this better done in C or awk
Mark> than in Emacs??  Tnx for help! - ^z - science@nems.dt.navy.mil

The perl script after my .signature was posted to comp.lang.perl some
time ago. With a little work it should be possible to add an exception
list for some words like NASA, USA, etc. to be output in all caps.

Enjoy!
-- 
Mark D. Baushke
mdb@ESD.3Com.COM

#!/usr/bin/perl

# This program copies its input to STDOUT, converting uppercase characters
# to lowercase, except the first letter of each sentence and the word 'I'.

# DEFINITION: a 'sentence' begins with an alphanumeric and ends at the end
# of the input file or at the first terminator (period, question mark, or 
# exclamation point) which is followed by white space.

    $/ = "\177";        # do not split the input into lines
    $_ = <>;            # read the entire input

    s/(\w)(([^.?!I]+|[.?!]+\S|\BI|I\B)+)/($a=$2)=~tr|A-Z|a-z|,$1.$a/eg;

    print;              # print the results

smithln@honeydew.cs.rochester.edu (Neil Smithline) (09/05/90)

In article <3245@nems.dt.navy.mil>, science@nems (Mark Zimmermann) writes:
>does anybody have (a pointer to) an Emacs way to convert ALL CAPITAL
>LETTER TEXT into nice mixed-case text?  There are obviously many
>degrees of sophistication in doing this; I'd like to be able to give
>the system a file of words to be capitalized and/or words to be
>lowercased and/or words to be all caps (NASA, USA, etc.), to
>customize, and it should try to find and capitalize the first word of
>each sentence.  Any suggestions??  Is this better done in C or awk
>than in Emacs??  Tnx for help! - ^z - science@nems.dt.navy.mil

Good idea - just wrote a function to do it.  The function FIX-CASE
takes two arguments or the region and fixes the case.  It also will
make the words of FIX-CASE-SPECIALS appear as they do in the list.  I
have included only the word "I" in the list - other good ideas would
be your name, organization, etc...

Feel free to mail mail me any comments/suggestions/bugs - Neil
================================================================
;;;
;;; Fix the case of a region of text.
;;; Capitalizes first word of sentences leaving all other words lowercase.
;;; fix-case-specials is a list of words (strings with no whitespace)
;;; that are to be special cased.
;;; Kinda disgusting lisp code but...
;;; Neil Smithline - Tue Sep  4 15:26:35 1990
;;; 

(defvar fix-case-specials '("I")
  "Special words to deal with.")

(defun fix-case (start end)
  "Fix the case of text between point and mark or START and END if
called from a program."
  (interactive "r")
  (save-excursion
    (downcase-region start end)		; start with lowercase region
    (goto-char start)			; skip initial whitespace
    (skip-chars-forward " \C-i\C-j" end)
    (setq start (point))
    ;; Handle first sentence specially to detect sentence beginning at
    ;; word at buffer position START
    (let ((cur (point)))
      (forward-char 1)
      (backward-sentence 1)
      (when (= cur (point))		; first char should be capitalized
	(capitalize-word 1)))
    (goto-char start)			; goto start of next sentence
    (forward-sentence 1)
    (let (cur)
      (while (< (point) end)		; loop through region
	(setq cur (point))		; capitalizing each sentence start
	(capitalize-word 1)
	(goto-char cur)
	(forward-sentence 1)
	(skip-chars-forward " \C-i\C-j" end)))
    (save-restriction
      (narrow-to-region start end)    ; now fix FIX-CASE-SPECIALS
      (let ((case-fold-search t)
	    (case-replace t)
	    (specials fix-case-specials))
	(while specials			; loop through list of specials
	  (goto-char (point-min))
	  (replace-string (car specials) (car specials) t)
	  (setq specials (cdr specials)))))))
-- 
========================================================================
Neil Smithline
ARPA:    smithln@cs.rochester.edu
UUCP:    ..!rutgers!rochester!smithln 
Mail:    CS Dept., University of Rochester, Rochester NY 14627
-- 
========================================================================
Neil Smithline
ARPA:    smithln@cs.rochester.edu
UUCP:    ..!rutgers!rochester!smithln 

inc@tc.fluke.COM (Gary Benson) (09/12/90)

In article <MDB.90Sep1111521@kosciusko.ESD.3Com.COM> mdb@ESD.3Com.COM (Mark D. Baushke) writes:

MarkB: On 1 Sep 90 13:06:34 GMT, in article <3245@nems.dt.navy.mil> posted
MarkB: to comp.emacs, science@nems.dt.navy.mil (Mark Zimmermann) writes:

MarkZ:  does anybody have (a pointer to) an Emacs way to convert ALL CAPITAL
MarkZ:  LETTER TEXT into nice mixed-case text?  There are obviously many
MarkZ:  degrees of sophistication in doing this; I'd like to be able to give
MarkZ:  the system a file of words to be capitalized and/or words to be
MarkZ:  lowercased and/or words to be all caps (NASA, USA, etc.), to
MarkZ:  customize, and it should try to find and capitalize the first word of
MarkZ:  each sentence.  Any suggestions??  Is this better done in C or awk
MarkZ:  than in Emacs??  Tnx for help! - ^z - science@nems.dt.navy.mil

MarkB: The perl script after my .signature was posted to comp.lang.perl some
MarkB: time ago. With a little work it should be possible to add an exception
MarkB: list for some words like NASA, USA, etc. to be output in all caps.

#!/usr/bin/perl

# This program copies its input to STDOUT, converting uppercase characters
# to lowercase, except the first letter of each sentence and the word 'I'.

# DEFINITION: a 'sentence' begins with an alphanumeric and ends at the end
# of the input file or at the first terminator (period, question mark, or 
# exclamation point) which is followed by white space.

    $/ = "\177";        # do not split the input into lines
    $_ = <;            # read the entire input

    s/(\w)(([^.?!I]+|[.?!]+\S|\BI|I\B)+)/($a=$2)=~tr|A-Z|a-z|,$1.$a/eg;

    print;              # print the results

-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_

We had a similar need, but with a slightlight different twist. We needed to convert from:

    A LINE THAT LOOKED LIKE THIS

to:

    A Line That Looked Like This

Here's how we did the exception list, followed by the subroutine that
actually performed the substitution:

# SPECIALS
# words to translate by lookup - the left string must be lower-case
# the right string is the replacement, ie. any case (or word even).
$specials{"a"} = "a";
$specials{"an"} = "an";
$specials{"and"} = "and";
$specials{"as"} = "as";
$specials{"ascii"} = "ASCII";
$specials{"at"} = "at";
$specials{"but"} = "but";
$specials{"by"} = "by";
$specials{"for"} = "for";
$specials{"from"} = "from";
$specials{"in"} = "in";
$specials{"is"} = "is";
$specials{"it"} = "it";
$specials{"nor"} = "nor";
$specials{"of"} = "of";
$specials{"on"} = "on";
$specials{"onto"} = "onto";
$specials{"or"} = "or";
$specials{"out"} = "out";
$specials{"so"} = "so";
$specials{"to"} = "to";
$specials{"the"} = "the";
$specials{"with"} = "with";
$specials{"yet"} = "yet";

# INITIAL CAPITALS SUBROUTINE
# inputs 1 argument: a string, returns the string with initial-caps except
# words which appear in $specials are translated by look up.
sub initcaps {
    local($out,$in,$word,$lcw,$found);
    $in = $_[0];
    $out='';
    while (length($in) > 0) {
	$in =~ s/^( *)([^ ]*)//;
	$out .= $1;			# transfer the white space before words
	$word = $2;			# setup the next word
	($lcw=$word) =~ tr/A-Z/a-z/;	# make a lower-case version
 	$found = $specials{$lcw};	# is the word on the special list?
	if ($found && $out !~ /^ *$/) {	# exception word but not first in line
	    $word = $found;		# exception words as specified.
	} elsif ($word =~ /^[a-z]*$/i){ # word is only alphabetics?
	    $word =~ tr/a-z/A-Z/;	# make an upper-case version
	    $word = substr($word,0,1) . substr($lcw,1,999); # combine lc version
	    }
	$out .= $word;
	}
    $out;
    }
-- 
Gary Benson    -=[ S M I L E R ]=-   -_-_-_-inc@fluke.com_-_-_-_-_-_-_-_-_-_-

Many a bum show has been saved by the flag.   -George M. Cohan

lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (09/12/90)

Here's what Peter Yee uses to translate the NASA articles he posts.
No doubt he has a more up-to-date exception list by now.

Larry Wall
lwall@jpl-devvax.jpl.nasa.gov

#!/bin/sh
: make a subdirectory, cd to it, and run this through sh.
echo 'If this kit is complete, "End of kit" will echo at the end'
echo Extracting unuc
sed >unuc <<'!STUFFY!FUNK!' -e 's/X//'
X#!/usr/bin/perl
X
Xprint STDERR "Loading proper nouns...\n";
Xopen(DICT,"/usr/dict/words") || die "Can't find /usr/dict/words: $!\n";
Xwhile (<DICT>) {
X    if (/^[A-Z]/) {
X	chop;
X	($lower = $_) =~ y/A-Z/a-z/;
X	$proper{$lower} = $_;
X    }
X}
Xclose DICT;
Xprint STDERR "Loading exceptions...\n";
X
Xopen(PATS,"unuc.pats") || die "Can't find unuc.pats: $!\n";
X
X$prog = <<'EOT';
Xwhile (<>) {
X    next if /[a-z]/;
X    y/A-Z/a-z/;
X    s/(\w+)/$proper{$1} ? $proper{$1} : $1/eg;
X    s/^(\s*)([a-z])/$1 . (($tmp = $2) =~ y:a-z:A-Z:,$tmp)/e;
X    s/([-.?!]["']?(\n\s*|  \s*)["']?)([a-z])/$1 . (($tmp = $3) =~ y:a-z:A-Z:,$tmp)/eg;
X    s/\b([b-df-hj-np-tv-xz]+)\b/(($tmp = $1) =~ y:a-z:A-Z:,$tmp)/eg;
X    s/([a-z])'([SDT])\b/$1 . "'" . (($tmp = $2) =~ y:A-Z:a-z:,$tmp)/eg;
XEOT
Xwhile (<PATS>) {
X    chop;
X    next if /^$/;
X    next if /^#/;
X    if (! /;$/) {
X	$foo = $_;
X	$foo =~ y/A-Z/a-z/;
X	print STDERR "Dup $_\n" if $proper{$foo};
X	$foo =~ s/([^\w ])/\\$1/g;
X	$foo =~ s/ /(\\s+)/g;
X	$foo = "\\b" . $foo if $foo =~ /^\w/;
X	$foo .= "\\b" if $foo =~ /\w$/;
X	$i = 0;
X	($bar = $_) =~ s/ /'$' . ++$i/eg;
X	$_ = "s/$foo/$bar/gi;";
X    }
X    $prog .= '    ' . $_ . "\n";
X}
Xclose PATS;
X$prog .= "}\ncontinue {\n    print;\n}\n";
X
X$/ = '';
X#print $prog;
Xeval $prog; die $@ if $@;
!STUFFY!FUNK!
echo Extracting unuc.pats
sed >unuc.pats <<'!STUFFY!FUNK!' -e 's/X//'
XA.M.
XAir Force
XAir Force Base
XAir Force Station
XAmerican
XApr.
XAriane
XAug.
XAugust
XBureau of Labor Statistics
XCIT
XCaltech
XCape Canaveral
XChallenger
XChina
XCorporation
XCrippen
XDaily News in Brief
XDaniel Quayle
XDec.
XDiscovery
XEdwards
XEndeavour
XFeb.
XFord Aerospace
XFri.
XGeneral Dynamics
XGeorge Bush
XHeadline News
XHOTOL
XI
XII
XIII
XIV
XIX
XInstitute of Technology
XJPL
XJan.
XJul.
XJun.
XKennedy Space Center
XLDEF
XLong Duration Exposure Facility
XLong March
XMar.
XMarch
XMartin
XMartin Marietta
XMercury
XMon.
Xin May
Xs/\bmay (\d)/May $1/g;
Xs/\boffice of (\w)/'Office of ' . (($tmp = $1) =~ y:a-z:A-Z:,$tmp)/eg;
XNational Science Foundation
XNASA Select
XNew Mexico
XNov.
XOMB
XOct.
XOffice of Management and Budget
XPresident
XPresident Bush
XRichard Truly
XRocketdyne
XRussian
XRussians
XSat.
XSep.
XSoviet
XSoviet Union
XSoviets
XSpace Shuttle
XSun.
XThu.
XTue.
XU.S.
XUnion of Soviet Socialist Republics
XUnited States
XVI
XVII
XVIII
XVice President
XVice President Quayle
XWed.
XWhite Sands
XKaman Aerospace
XAerospace Daily
XAviation Week
XSpace Technology
XWashington Post
XLos Angeles Times
XNew York Times
XAerospace Industries Association
Xpresident of
XJohnson Space Center
XSpace Services
XInc.
XCo.
XHughes Aircraft
XCompany
XOrbital Sciences
XSwedish Space
XArnauld
XNicogosian
XMagellan
XGalileo
XMir
XJet Propulsion Laboratory
XUniversity
XDepartment of Defense
XOrbital Science
XOMS
XUnited Press International
XUnited Press
XUPI
XAssociated Press
XAP
XCable News Network
XCape York
XZenit
XSYNCOM
XEastern
XWestern
XTest Range
XJcsat
XJapanese Satellite Communications
XDefence Ministry
XDefense Ministry
XSkynet
XFixed Service Structure
XLaunch Processing System
XAsiasat
XLaunch Control Center
XEarth
XCNES
XGlavkosmos
XPacific
XAtlantic
!STUFFY!FUNK!
echo ""
echo "End of kit"
: I do not append .signature, but someone might mail this.
exit