science@nems.dt.navy.mil (Mark Zimmermann) (09/01/90)
does anybody have (a pointer to) an Emacs way to convert ALL CAPITAL LETTER TEXT into nice mixed-case text? There are obviously many degrees of sophistication in doing this; I'd like to be able to give the system a file of words to be capitalized and/or words to be lowercased and/or words to be all caps (NASA, USA, etc.), to customize, and it should try to find and capitalize the first word of each sentence. Any suggestions?? Is this better done in C or awk than in Emacs?? Tnx for help! - ^z - science@nems.dt.navy.mil
mdb@ESD.3Com.COM (Mark D. Baushke) (09/02/90)
On 1 Sep 90 13:06:34 GMT, in article <3245@nems.dt.navy.mil> posted
to comp.emacs, science@nems.dt.navy.mil (Mark Zimmermann) writes:
Mark> does anybody have (a pointer to) an Emacs way to convert ALL CAPITAL
Mark> LETTER TEXT into nice mixed-case text? There are obviously many
Mark> degrees of sophistication in doing this; I'd like to be able to give
Mark> the system a file of words to be capitalized and/or words to be
Mark> lowercased and/or words to be all caps (NASA, USA, etc.), to
Mark> customize, and it should try to find and capitalize the first word of
Mark> each sentence. Any suggestions?? Is this better done in C or awk
Mark> than in Emacs?? Tnx for help! - ^z - science@nems.dt.navy.mil
The perl script after my .signature was posted to comp.lang.perl some
time ago. With a little work it should be possible to add an exception
list for some words like NASA, USA, etc. to be output in all caps.
Enjoy!
--
Mark D. Baushke
mdb@ESD.3Com.COM
#!/usr/bin/perl
# This program copies its input to STDOUT, converting uppercase characters
# to lowercase, except the first letter of each sentence and the word 'I'.
# DEFINITION: a 'sentence' begins with an alphanumeric and ends at the end
# of the input file or at the first terminator (period, question mark, or
# exclamation point) which is followed by white space.
$/ = "\177"; # do not split the input into lines
$_ = <>; # read the entire input
s/(\w)(([^.?!I]+|[.?!]+\S|\BI|I\B)+)/($a=$2)=~tr|A-Z|a-z|,$1.$a/eg;
print; # print the results
smithln@honeydew.cs.rochester.edu (Neil Smithline) (09/05/90)
In article <3245@nems.dt.navy.mil>, science@nems (Mark Zimmermann) writes: >does anybody have (a pointer to) an Emacs way to convert ALL CAPITAL >LETTER TEXT into nice mixed-case text? There are obviously many >degrees of sophistication in doing this; I'd like to be able to give >the system a file of words to be capitalized and/or words to be >lowercased and/or words to be all caps (NASA, USA, etc.), to >customize, and it should try to find and capitalize the first word of >each sentence. Any suggestions?? Is this better done in C or awk >than in Emacs?? Tnx for help! - ^z - science@nems.dt.navy.mil Good idea - just wrote a function to do it. The function FIX-CASE takes two arguments or the region and fixes the case. It also will make the words of FIX-CASE-SPECIALS appear as they do in the list. I have included only the word "I" in the list - other good ideas would be your name, organization, etc... Feel free to mail mail me any comments/suggestions/bugs - Neil ================================================================ ;;; ;;; Fix the case of a region of text. ;;; Capitalizes first word of sentences leaving all other words lowercase. ;;; fix-case-specials is a list of words (strings with no whitespace) ;;; that are to be special cased. ;;; Kinda disgusting lisp code but... ;;; Neil Smithline - Tue Sep 4 15:26:35 1990 ;;; (defvar fix-case-specials '("I") "Special words to deal with.") (defun fix-case (start end) "Fix the case of text between point and mark or START and END if called from a program." (interactive "r") (save-excursion (downcase-region start end) ; start with lowercase region (goto-char start) ; skip initial whitespace (skip-chars-forward " \C-i\C-j" end) (setq start (point)) ;; Handle first sentence specially to detect sentence beginning at ;; word at buffer position START (let ((cur (point))) (forward-char 1) (backward-sentence 1) (when (= cur (point)) ; first char should be capitalized (capitalize-word 1))) (goto-char start) ; goto start of next sentence (forward-sentence 1) (let (cur) (while (< (point) end) ; loop through region (setq cur (point)) ; capitalizing each sentence start (capitalize-word 1) (goto-char cur) (forward-sentence 1) (skip-chars-forward " \C-i\C-j" end))) (save-restriction (narrow-to-region start end) ; now fix FIX-CASE-SPECIALS (let ((case-fold-search t) (case-replace t) (specials fix-case-specials)) (while specials ; loop through list of specials (goto-char (point-min)) (replace-string (car specials) (car specials) t) (setq specials (cdr specials))))))) -- ======================================================================== Neil Smithline ARPA: smithln@cs.rochester.edu UUCP: ..!rutgers!rochester!smithln Mail: CS Dept., University of Rochester, Rochester NY 14627 -- ======================================================================== Neil Smithline ARPA: smithln@cs.rochester.edu UUCP: ..!rutgers!rochester!smithln
inc@tc.fluke.COM (Gary Benson) (09/12/90)
In article <MDB.90Sep1111521@kosciusko.ESD.3Com.COM> mdb@ESD.3Com.COM (Mark D. Baushke) writes: MarkB: On 1 Sep 90 13:06:34 GMT, in article <3245@nems.dt.navy.mil> posted MarkB: to comp.emacs, science@nems.dt.navy.mil (Mark Zimmermann) writes: MarkZ: does anybody have (a pointer to) an Emacs way to convert ALL CAPITAL MarkZ: LETTER TEXT into nice mixed-case text? There are obviously many MarkZ: degrees of sophistication in doing this; I'd like to be able to give MarkZ: the system a file of words to be capitalized and/or words to be MarkZ: lowercased and/or words to be all caps (NASA, USA, etc.), to MarkZ: customize, and it should try to find and capitalize the first word of MarkZ: each sentence. Any suggestions?? Is this better done in C or awk MarkZ: than in Emacs?? Tnx for help! - ^z - science@nems.dt.navy.mil MarkB: The perl script after my .signature was posted to comp.lang.perl some MarkB: time ago. With a little work it should be possible to add an exception MarkB: list for some words like NASA, USA, etc. to be output in all caps. #!/usr/bin/perl # This program copies its input to STDOUT, converting uppercase characters # to lowercase, except the first letter of each sentence and the word 'I'. # DEFINITION: a 'sentence' begins with an alphanumeric and ends at the end # of the input file or at the first terminator (period, question mark, or # exclamation point) which is followed by white space. $/ = "\177"; # do not split the input into lines $_ = <; # read the entire input s/(\w)(([^.?!I]+|[.?!]+\S|\BI|I\B)+)/($a=$2)=~tr|A-Z|a-z|,$1.$a/eg; print; # print the results -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_ We had a similar need, but with a slightlight different twist. We needed to convert from: A LINE THAT LOOKED LIKE THIS to: A Line That Looked Like This Here's how we did the exception list, followed by the subroutine that actually performed the substitution: # SPECIALS # words to translate by lookup - the left string must be lower-case # the right string is the replacement, ie. any case (or word even). $specials{"a"} = "a"; $specials{"an"} = "an"; $specials{"and"} = "and"; $specials{"as"} = "as"; $specials{"ascii"} = "ASCII"; $specials{"at"} = "at"; $specials{"but"} = "but"; $specials{"by"} = "by"; $specials{"for"} = "for"; $specials{"from"} = "from"; $specials{"in"} = "in"; $specials{"is"} = "is"; $specials{"it"} = "it"; $specials{"nor"} = "nor"; $specials{"of"} = "of"; $specials{"on"} = "on"; $specials{"onto"} = "onto"; $specials{"or"} = "or"; $specials{"out"} = "out"; $specials{"so"} = "so"; $specials{"to"} = "to"; $specials{"the"} = "the"; $specials{"with"} = "with"; $specials{"yet"} = "yet"; # INITIAL CAPITALS SUBROUTINE # inputs 1 argument: a string, returns the string with initial-caps except # words which appear in $specials are translated by look up. sub initcaps { local($out,$in,$word,$lcw,$found); $in = $_[0]; $out=''; while (length($in) > 0) { $in =~ s/^( *)([^ ]*)//; $out .= $1; # transfer the white space before words $word = $2; # setup the next word ($lcw=$word) =~ tr/A-Z/a-z/; # make a lower-case version $found = $specials{$lcw}; # is the word on the special list? if ($found && $out !~ /^ *$/) { # exception word but not first in line $word = $found; # exception words as specified. } elsif ($word =~ /^[a-z]*$/i){ # word is only alphabetics? $word =~ tr/a-z/A-Z/; # make an upper-case version $word = substr($word,0,1) . substr($lcw,1,999); # combine lc version } $out .= $word; } $out; } -- Gary Benson -=[ S M I L E R ]=- -_-_-_-inc@fluke.com_-_-_-_-_-_-_-_-_-_- Many a bum show has been saved by the flag. -George M. Cohan
lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (09/12/90)
Here's what Peter Yee uses to translate the NASA articles he posts. No doubt he has a more up-to-date exception list by now. Larry Wall lwall@jpl-devvax.jpl.nasa.gov #!/bin/sh : make a subdirectory, cd to it, and run this through sh. echo 'If this kit is complete, "End of kit" will echo at the end' echo Extracting unuc sed >unuc <<'!STUFFY!FUNK!' -e 's/X//' X#!/usr/bin/perl X Xprint STDERR "Loading proper nouns...\n"; Xopen(DICT,"/usr/dict/words") || die "Can't find /usr/dict/words: $!\n"; Xwhile (<DICT>) { X if (/^[A-Z]/) { X chop; X ($lower = $_) =~ y/A-Z/a-z/; X $proper{$lower} = $_; X } X} Xclose DICT; Xprint STDERR "Loading exceptions...\n"; X Xopen(PATS,"unuc.pats") || die "Can't find unuc.pats: $!\n"; X X$prog = <<'EOT'; Xwhile (<>) { X next if /[a-z]/; X y/A-Z/a-z/; X s/(\w+)/$proper{$1} ? $proper{$1} : $1/eg; X s/^(\s*)([a-z])/$1 . (($tmp = $2) =~ y:a-z:A-Z:,$tmp)/e; X s/([-.?!]["']?(\n\s*| \s*)["']?)([a-z])/$1 . (($tmp = $3) =~ y:a-z:A-Z:,$tmp)/eg; X s/\b([b-df-hj-np-tv-xz]+)\b/(($tmp = $1) =~ y:a-z:A-Z:,$tmp)/eg; X s/([a-z])'([SDT])\b/$1 . "'" . (($tmp = $2) =~ y:A-Z:a-z:,$tmp)/eg; XEOT Xwhile (<PATS>) { X chop; X next if /^$/; X next if /^#/; X if (! /;$/) { X $foo = $_; X $foo =~ y/A-Z/a-z/; X print STDERR "Dup $_\n" if $proper{$foo}; X $foo =~ s/([^\w ])/\\$1/g; X $foo =~ s/ /(\\s+)/g; X $foo = "\\b" . $foo if $foo =~ /^\w/; X $foo .= "\\b" if $foo =~ /\w$/; X $i = 0; X ($bar = $_) =~ s/ /'$' . ++$i/eg; X $_ = "s/$foo/$bar/gi;"; X } X $prog .= ' ' . $_ . "\n"; X} Xclose PATS; X$prog .= "}\ncontinue {\n print;\n}\n"; X X$/ = ''; X#print $prog; Xeval $prog; die $@ if $@; !STUFFY!FUNK! echo Extracting unuc.pats sed >unuc.pats <<'!STUFFY!FUNK!' -e 's/X//' XA.M. XAir Force XAir Force Base XAir Force Station XAmerican XApr. XAriane XAug. XAugust XBureau of Labor Statistics XCIT XCaltech XCape Canaveral XChallenger XChina XCorporation XCrippen XDaily News in Brief XDaniel Quayle XDec. XDiscovery XEdwards XEndeavour XFeb. XFord Aerospace XFri. XGeneral Dynamics XGeorge Bush XHeadline News XHOTOL XI XII XIII XIV XIX XInstitute of Technology XJPL XJan. XJul. XJun. XKennedy Space Center XLDEF XLong Duration Exposure Facility XLong March XMar. XMarch XMartin XMartin Marietta XMercury XMon. Xin May Xs/\bmay (\d)/May $1/g; Xs/\boffice of (\w)/'Office of ' . (($tmp = $1) =~ y:a-z:A-Z:,$tmp)/eg; XNational Science Foundation XNASA Select XNew Mexico XNov. XOMB XOct. XOffice of Management and Budget XPresident XPresident Bush XRichard Truly XRocketdyne XRussian XRussians XSat. XSep. XSoviet XSoviet Union XSoviets XSpace Shuttle XSun. XThu. XTue. XU.S. XUnion of Soviet Socialist Republics XUnited States XVI XVII XVIII XVice President XVice President Quayle XWed. XWhite Sands XKaman Aerospace XAerospace Daily XAviation Week XSpace Technology XWashington Post XLos Angeles Times XNew York Times XAerospace Industries Association Xpresident of XJohnson Space Center XSpace Services XInc. XCo. XHughes Aircraft XCompany XOrbital Sciences XSwedish Space XArnauld XNicogosian XMagellan XGalileo XMir XJet Propulsion Laboratory XUniversity XDepartment of Defense XOrbital Science XOMS XUnited Press International XUnited Press XUPI XAssociated Press XAP XCable News Network XCape York XZenit XSYNCOM XEastern XWestern XTest Range XJcsat XJapanese Satellite Communications XDefence Ministry XDefense Ministry XSkynet XFixed Service Structure XLaunch Processing System XAsiasat XLaunch Control Center XEarth XCNES XGlavkosmos XPacific XAtlantic !STUFFY!FUNK! echo "" echo "End of kit" : I do not append .signature, but someone might mail this. exit