moe@starnet.uucp (Moe S.) (05/10/91)
I appreciate any help on these questions: 1. If I have a large (500+ messages) in a mailbox-format file, what is the best way to mail a file to everyone of the 500+? (using elm or any other way). 2. If I have a file containing some names and email addresses such as this: xyz@jkjk.jkyu.reyui (John J. Doe) Mark L. Lost <apple!mark@eee.dfsjk.jkj> Joe!!! jjj@jhdf.434r.er (Such a file can be obtained by doing grep "^From:" mailbox_file ) How can I re-organize the file (using awk, sed, etc...) so that the email addresses are the first field in every line in the file? Note that all addresses will contain at least one of the two characters: "@" or "!". Let's say the file is too big to make manual editing practical. Thanks again. Moe
rickert@mp.cs.niu.edu (Neil Rickert) (05/12/91)
In article <1991May10.064610.25802@starnet.uucp> moe@starnet.uucp (Moe S.) writes: >1. If I have a large (500+ messages) in a mailbox-format file, > what is the best way to mail a file to everyone of the 500+? > (using elm or any other way). 500 is probably stretching the capabilities of much software. Most mailers pass the message to the transport (MTA) as arguments, and 500 may exceed the max allowed. If you are running sendmail as an MTA, the easiest way may be to extract the 'From:' lines and make each into a 'Bcc:' line for the new message which you then feed into 'sendmail' with the '-t' option (which implies that the recipient addresses come from 'To:' 'Cc:' and 'Bcc:' headers.) You message can also include a 'To:' with a group name - 'To: multiple_recipients:;' to make sure an 'Apparently-To:' in not generated. >2. If I have a file containing some names and email addresses such as > this: > xyz@jkjk.jkyu.reyui (John J. Doe) > Mark L. Lost <apple!mark@eee.dfsjk.jkj> > Joe!!! jjj@jhdf.434r.er > How can I re-organize the file (using awk, sed, etc...) so that > the email addresses are the first field in every line in the file? Very difficult. Probably beyond the abilities it awk, sed, etc. If Larry Wall happens to be reading this he may suggest perl. The trouble is that the syntax of RFC822 addresses is quite complex, and as X.400 gateways become more common the extreme cases of RFC822 addresses are increasingly likely to show up. -- =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*= Neil W. Rickert, Computer Science <rickert@cs.niu.edu> Northern Illinois Univ. DeKalb, IL 60115 +1-815-753-6940
russell@ccu1.aukuni.ac.nz (Russell J Fulton;ccc032u) (05/13/91)
rickert@mp.cs.niu.edu (Neil Rickert) writes: >In article <1991May10.064610.25802@starnet.uucp> moe@starnet.uucp (Moe S.) writes: >>2. If I have a file containing some names and email addresses such as >> this: >> xyz@jkjk.jkyu.reyui (John J. Doe) >> Mark L. Lost <apple!mark@eee.dfsjk.jkj> >> Joe!!! jjj@jhdf.434r.er >> How can I re-organize the file (using awk, sed, etc...) so that >> the email addresses are the first field in every line in the file? > Very difficult. Probably beyond the abilities it awk, sed, etc. If >Larry Wall happens to be reading this he may suggest perl. The trouble is >that the syntax of RFC822 addresses is quite complex, and as X.400 gateways >become more common the extreme cases of RFC822 addresses are increasingly >likely to show up. Yes, I would be inclined to use perl or icon. Icon is probably the more powerful for this sort of thing. You need something with very flexible pattern matching. Your approach depends on whether it is 'one off' or if it is going to turn into a routine job. If the former it *may* be quicker to grit your teeth and do it by hand. If not the you may find the time taken to write a perl or icon script worthwhile. As for choosing between perl and icon, I would say if you are already well versed in regular expressions and the like go for perl, otherwise icon. Cheers, Russell -- Russell Fulton, Computer Center, University of Auckland, New Zealand. <rj_fulton@aukuni.ac.nz>
felps@convex.com (Robert Felps) (05/13/91)
In <1991May10.064610.25802@starnet.uucp> moe@starnet.uucp (Moe S.) writes: >I appreciate any help on these questions: >1. If I have a large (500+ messages) in a mailbox-format file, > what is the best way to mail a file to everyone of the 500+? > (using elm or any other way). Sounds like your #2 questions is will answer this. Strip the addresses out then send the message to them. The problem is the limit of 550+ addresses by most mailers. I'd try writing a script to strip the addresses put them in a file, split the file, then feed the files as arguments to ucb/mail with a -s "subject" argument too. >2. If I have a file containing some names and email addresses such as > this: > > xyz@jkjk.jkyu.reyui (John J. Doe) > Mark L. Lost <apple!mark@eee.dfsjk.jkj> > Joe!!! jjj@jhdf.434r.er > (Such a file can be obtained by doing > grep "^From:" mailbox_file ) > How can I re-organize the file (using awk, sed, etc...) so that > the email addresses are the first field in every line in the file? > Note that all addresses will contain at least one of the two > characters: "@" or "!". > Let's say the file is too big to make manual editing practical. Hmmm. I noticed others posted suggestions of PERL or icon. I don't see this as much a language question as I do a standard format and the unexpected non-conformance to that standard. Someone touched on this with the reference to the RFC. Here's is quick shot that gets a large percentage of the addresses but it doesn't handle all of the off the wall cases. for example, I ran it through my $MAIL file and had a blank line in the output. When I looked at what caused it I had a message with the line, From: Roger Rabbit is expecting to see you later today..... from a wonderful secretary that could care less if the mailer uses the From: header. So those are going to be difficult to message out or catch. Here's the code, unfortunately it uses nawk because of the Field Separator: --------------------------------- cut here ---------------------------------- nawk 'BEGIN { FS="[ ()<>:]" # space, tab, left/right paren, left/rigth angles, colon } /^F[rR][oO][mM]:/ { for ( i = 1; i <= NF; i++ ) { if ( index($i,"@") ) { print $i break } else if ( index($i,"!") ) { print $i break } } # print "i="i " NF="NF # print if ( i > NF ) if ( length($2) ) print $2 else print $3 }' $MAIL --------------------------------- cut here ---------------------------------- If you don't have nawk and you don't know awk send me mail and I'll convert it to awk (old awk). >Thanks again. >Moe Hope it helps, Robert Felps I do not speak for felps@convex.com Convex Computer Corp Convex and I seldom Product Specialist 3000 Waterview Parkway speak for myself. Tech. Assistant Ctr Richardson, Tx. 75080 VMS? What's that? 1(800) 952-0379
clewis@ferret.ocunix.on.ca (Chris Lewis) (05/14/91)
In article <1991May12.022641.18961@mp.cs.niu.edu> rickert@mp.cs.niu.edu (Neil Rickert) writes: >In article <1991May10.064610.25802@starnet.uucp> moe@starnet.uucp (Moe S.) writes: >>1. If I have a large (500+ messages) in a mailbox-format file, >> what is the best way to mail a file to everyone of the 500+? >> (using elm or any other way). > 500 is probably stretching the capabilities of much software. Most mailers >pass the message to the transport (MTA) as arguments, and 500 may exceed >the max allowed. > If you are running sendmail as an MTA, the easiest way may be to >extract the 'From:' lines and make each into a 'Bcc:' line for the >new message which you then feed into 'sendmail' with the '-t' >option (which implies that the recipient addresses come from 'To:' >'Cc:' and 'Bcc:' headers.) You message can also include a 'To:' >with a group name - 'To: multiple_recipients:;' to make sure >an 'Apparently-To:' in not generated. You don't have to resort to all this wierdness. If you're using sendmail, smail 2.5 or smail 3.1, plus probably many other MTA's, what you really want to do is prepare a file containing all of the recipients, and use the "include" mechanism in the "aliases" file for the MTA (in sendmail and smail 2.5, don't know about smail 3.1) it's /usr/lib/aliases. Ie, for the ferret mailing list I have: ferret-list-out :include:/u/clewis/ferrets/mail-list :include:/u/clewis/ferrets/anon-list The files are in the following format: e-mail-address (full name) The full name is optional. Then, if you send mail to "ferret-list-out", the MUA (mush/elm/mailx/Mail etc.) doesn't even know about the alias - none of the mail headers have any of the names in the subscription list. The MTA, *not* the MUA, does the expansion in memory, and parcels out groups of the addresses in the command lines to multiple invocations of uux or tcpip etc. Smail 2.5 doesn't appear to have any limit on the number of recipients other than available memory (it mallocs the entries into a linked list) In fact, it's often better to invoke the MTA directly rather than using the MUA to send it, because you have a bit better control of what the headers will look like. For example, you want the "From:" line to refer to the logical address for sending in individual items. This is a copy of the shell script I use to send out mailing list items: # Takes one argument - the item to be sent. if [ ! -r "$1" ] then echo "No such article" exit fi # Check that I've not buggered up the numbering scheme if [ -r articles/$1 -o -r articles/$1.Z ] then echo "Article Clash $1" exit fi # Construct Envelppe echo "Subject: Issue $1" > /tmp/$$ echo "From: ferret-list@ferret.ocunix.on.ca (Ferret Mailing List)" >> /tmp/$$ echo "To: ferret-list@ferret.ocunix.on.ca" >> /tmp/$$ echo "" >> /tmp/$$ # Send it cat /tmp/$$ $1 | smail -R ferret-list-out rm -f /tmp/$$ # Archive what I just sent mv $1 articles compress articles/$1 Notice that I construct the Subject:, From: and To: lines myself, tack on a blank line, and then concatenate article itself and shove thru smail directly. The destination is the command line argument to smail, not the To: line. (With sendmail you may have to use an option to inhibit To: line expansion.) (The -R option to smail 2.5 tells it to reroute all of the addresses instead of trying to send directly then discovering most of the addresses are not full paths, and then rerouting. This is an efficiency concern, plus the fact that without the -R, smail 2.5 won't multicast unless the addresses are full bang path. Multicast is more than one recipient per uux invocation. REAL important with a list of 500 entries! The ferret list is about 75, and it multicasts down to 17 individual uux invocations) >>2. If I have a file containing some names and email addresses such as >> this: >> xyz@jkjk.jkyu.reyui (John J. Doe) >> Mark L. Lost <apple!mark@eee.dfsjk.jkj> >> Joe!!! jjj@jhdf.434r.er >> How can I re-organize the file (using awk, sed, etc...) so that >> the email addresses are the first field in every line in the file? > Very difficult. Probably beyond the abilities it awk, sed, etc. If >Larry Wall happens to be reading this he may suggest perl. The trouble is >that the syntax of RFC822 addresses is quite complex, and as X.400 gateways >become more common the extreme cases of RFC822 addresses are increasingly >likely to show up. Very difficult only if the input is entirely arbitrary and you actually have to parse the addresses. However, the first and second addresses are already in a "standard" form, the first being what you can use directly (at least in a smail 2.5 alias file). The second is simple to convert. Then, it's a matter of converting all of the other formats into the first one. The third example is a difficult one to handle simply because a simple sed script can't tell which of the two tokens is the actual address because they both have mailing metacharacters. So, you make a simplifying assumption, and assume that a token with a "@" is a real email address, and alternately, a token of the form something!something is the real email address as long as there aren't more than one adjacent !. This sed script works for the above sample plus some other forms: sed -e '/^\(.*\)<\(.*\)>\(.*\)$/s//\2 \1 \3/' \ -e '/^\(.*\)(\(.*\))\(.*\)$/s//\1 \2 \3/' \ -e '/^\(.*\) \([^ ][^ ]*@[^ ][^ ]*\)$/s//\2 \1/' \ -e '/^\(.*\) \([^! ][^ ]*![^! ][^ ]*\)$/s//\2 \1/' \ -e 's/^ *//' \ -e 's/ *$//' \ -e 's/ */ /g' \ -e 's/ / (/' \ -e 's/$/)/' 1) convert <> forms to address first. 2) remove () from addr (name) forms 3) Move all tokens with something@something to the beginning 4) Move all tokens with something!something to the beginning (don't do this for tokens with !!!* ) 5, 6, 7) Strip extraneous blanks 8, 9) put the () back in. Yes, it would be a bit easier to program in perl, and easier to get fancier. But not particularly necessary. You'll probably end up with a few it didn't parse correctly, but you can fix them manually. -- Chris Lewis, Phone: (613) 832-0541, Domain: clewis@ferret.ocunix.on.ca UUCP: ...!cunews!latour!ecicrl!clewis; Ferret Mailing List: ferret-request@eci386; Psroff (not Adobe Transcript) enquiries: psroff-request@eci386 or Canada 416-832-0541. Psroff 3.0 in c.s.u soon!