[comp.binaries.ibm.pc.d] Seeking "smarter" UUDECODE.C type utility

tomr@ashtate (Tom Rombouts) (01/22/91)

First of all, thanks to those on the net who have provided the
current UUDECODE.C.  It works like a marvel, and I have used it
on some massive files to date with no problems.  

However, like all end users, I have a "wish list" item.  Thus:

Has anyone made a modified UUDECODE that will skip through
"junk" between sections of multi-part posts?  Essentially, I
can easily save several files to a large text file on disk.
However, I then have to go in and edit out the various
"cut here" or "snip - snip" lines (and any other extra "junk"
between pieces) before I can then UUDECODE it.

Since every line (except the final one?) in the coded section
seems to start with "M", has anyone modified UUDECODE so 
that it would:
   1.  Skip all text until if finds "begin" in the left 
       column.  (I think it does this already.)
   2.  Only translate lines that begin with an "M" in the
       left column.
	  If a line begins with "M", test to see if the
	  next character is lower case, and back out if so.
          (To prevent "translating" English lines that 
	   happen to start with "M" between pieces.  
	   This would probably catch the majority of them.)
   3.  Ignore the "starting with M" rule on the final
       line, which I belive is indicated after "begin" how many
       encoded lines there are.

With such a utility, I could then save all the pieces from a
multi-part post into a single "mailbox format" text file, and
then uudecode it directly.  (Thereby skipping the editing step
entirely.)  Of course, this may not be foolproof, but even if
it worked 80% of the time it would be a big improvement from 
the current method.  (And one could always use the "old"
method if it choked from time to time.)

By the way, I am in no way suggesting this to replace the
current UUDECODE.  It would be an alternate utility that one
would use at one's own risk.

Does something like this exist out there?  Is there some simple
reason I have overlooked why this is impossible?  I could try to
hack it out myself, but suspect many others have looked at this
problem already.

(Alternate solution:  an AWK or CSH script that automatically
deletes non-uuencoded lines.)


Tom Rombouts  Torrance 'Tater  tomr@ashtate.A-T.com  V:(213)538-7108

Jono_Moore@mindlink.UUCP (Jono Moore) (01/29/91)

There is a uuencode/decode written for MSDOS by Richard Marks that is quite
intelligent.  You don't have to strip message headers or anything.  The only
thing that you have to do is to make sure that multi-part files are numbered
consecutively.

I think that the latest version is 4.13 and should be able to be found on many
FTP sites as UUEXE413.(ZIP|ARC|ZOO|...)
--

+------------------------------------------------+--------------------------+
| cthulhu@{arkham.UUCP|arkham.wimsey.bc.ca}      | He who has had has been  |
| {uunet|ubc-cs}!van-bc!cynic!arkham!cthulhu     | but he who has not been  |
| jono_moore@{mindlink.UUCP|cc.sfu.ca|sfu.bitnet}|      has been had.       |
+------------------------------------------------+--------------------------+

jstone@world.std.com (Jeffrey R Stone) (01/30/91)

tomr@ashtate (Tom Rombouts) writes:

>Has anyone made a modified UUDECODE that will skip through
>"junk" between sections of multi-part posts?

  [ stuff deleted ]

>(Alternate solution:  an AWK or CSH script that automatically
>deletes non-uuencoded lines.)

Try this simple csh script:

  awk '/^BEGIN/,/^END/' $1 | grep -v "cut here" | uudecode

-jeff-

felton@eng3.UUCP (Ed Felton) (01/31/91)

In article <1991Jan22.020001.15847@ashtate> tomr@ashtate (Tom Rombouts) writes:
}
}First of all, thanks to those on the net who have provided the
}current UUDECODE.C.  It works like a marvel, and I have used it
}on some massive files to date with no problems.  
}
}However, like all end users, I have a "wish list" item.  Thus:
}
   [ Wish List deleted for brevity... ]
}
}(Alternate solution:  an AWK or CSH script that automatically
}deletes non-uuencoded lines.)
}
}Tom Rombouts  Torrance 'Tater  tomr@ashtate.A-T.com  V:(213)538-7108

Tom, ( and everyone else :)

I don't have any answers for your wish list, but here is an alternate 
solution.

Below is an AWK script I wrote specifically for the CBIP postings.
I call it strip.awk
It will pass through only the lines between
"BEGIN--cut here--cut here"
 and
"END--cut here--cut here"
To use it, concatenate all the pieces together in the correct order,
and do:

      awk strip < [infile] > [outfile]

Good luck.
-----------8<--------8<--------8<--------8<--------8<--------8<-----------
BEGIN { FS = " "; do_it = 0 }
$1 == "END--cut"   && $2 == "here--cut" && $3 == "here" { do_it = 0 }
do_it == 1 { print $0 }
$1 == "BEGIN--cut" && $2 == "here--cut" && $3 == "here" { do_it = 1 }
-----------8<--------8<--------8<--------8<--------8<--------8<-----------

davidsen@sixhub.UUCP (Wm E. Davidsen Jr) (01/31/91)

In article <1991Jan22.020001.15847@ashtate> tomr@ashtate (Tom Rombouts) writes:

| Has anyone made a modified UUDECODE that will skip through
| "junk" between sections of multi-part posts?  Essentially, I
| can easily save several files to a large text file on disk.

  This problem has been solved many time in many ways, some of which are
in the informational postings which go out every ten days. Other
solutions are uucut.awk, uucat.c, and eek.c. Of course the simple
solution is in the informational postings, so I won't bother to post it here.
-- 
bill davidsen - davidsen@sixhub.uucp (uunet!crdgw1!sixhub!davidsen)
    sysop *IX BBS and Public Access UNIX
    moderator of comp.binaries.ibm.pc and 80386 mailing list
"Stupidity, like virtue, is its own reward" -me

davidsen@sixhub.UUCP (Wm E. Davidsen Jr) (01/31/91)

In article <4617@mindlink.UUCP> Jono_Moore@mindlink.UUCP (Jono Moore) writes:

| I think that the latest version is 4.13 and should be able to be found on many
| FTP sites as UUEXE413.(ZIP|ARC|ZOO|...)

  Was v09i045 when it was posted here, if people have it by number.
-- 
bill davidsen - davidsen@sixhub.uucp (uunet!crdgw1!sixhub!davidsen)
    sysop *IX BBS and Public Access UNIX
    moderator of comp.binaries.ibm.pc and 80386 mailing list
"Stupidity, like virtue, is its own reward" -me

pinard@IRO.UMontreal.CA (Francois Pinard) (01/31/91)

In article <1991Jan22.020001.15847@ashtate> tomr@ashtate (Tom Rombouts) writes:

   However, like all end users, I have a "wish list" item.  Thus: [...]

On another end, you might consider replacing uu{en,de}code altogether,
if you can afford this for your personnal use.  Take a look at Brad
Templeton's abe/dabe package.  For me, it is such a better replacement
that I'm surprised it did not receive a wider acceptance yet.  Like
everybody, I know how widespread uu{en,de}code is.  If everybody knew
the virtues of abe/dabe, the switch would have been done already :-).
--
Franc,ois Pinard          ``Vivement GNU!''         pinard@iro.umontreal.ca
(514) 588-4656    cp 886 L'Epiphanie (Qc) J0K 1J0    ...!uunet!iros1!pinard

dave@tygra.UUCP (David Conrad) (02/01/91)

In article <1991Jan22.020001.15847@ashtate> tomr@ashtate (Tom Rombouts) writes:
>
>Has anyone made a modified UUDECODE that will skip through
>"junk" between sections of multi-part posts?  Essentially, I
>can easily save several files to a large text file on disk.
>However, I then have to go in and edit out the various
>"cut here" or "snip - snip" lines (and any other extra "junk"
>between pieces) before I can then UUDECODE it.
>

I have a program written in Turbo Pascal 5.5 which filters out all lines
in a file not between lines beginning with:

"BEGIN--cut" and "END--cut" (cbip)
"-------------- Part" and "---------- End" (SIMTEL20)
"--- start" and "--- end" (??? Did I get this from garbo.uwasa.fi, Timo?)

Note that another problem with storing all parts in one file is that you
can't verify the brik checksums.  I store each part in a separate file,
brik -cv them, then concatenate and cut them, then uudecode and looz.

I would be more than happy to release cut to the public domain.  If you want
to post cut.exe, Bill, email me.  If you aren't Bill Davidson and you
want to see the program here, email him and tell him (that way *his*
mailbox gets flooded and not *mine*...heh heh heh).  If there's enough
interest, I'll also post the source to comp.lang.pascal.

>   3.  Ignore the "starting with M" rule on the final
>       line, which I belive is indicated after "begin" how many
>       encoded lines there are.
>

The number after "begin" does NOT indicate how many encoded lines there
are, but there will probably be eighteen other posts telling what it
*does* mean.  It is an invariant "644" in the three versions of uuencode
I have, and I've only run into one uuencoded file with a different number
ever (it was "600").  I think it's a version number.

>Tom Rombouts  Torrance 'Tater  tomr@ashtate.A-T.com  V:(213)538-7108

David Conrad
tygra!dave@uunet.uu.net
tygra!dave@sharkey.cc.umich.edu
(If at first you don't succeed, be persistant and creative,
as we are having some problems lately with email to this site.)
-- 
=  CAT-TALK Conferencing Network, Computer Conferencing and File Archive  =
-  1-313-343-0800, 300/1200/2400/9600 baud, 8/N/1. New users use 'new'    - 
=  as a login id.  AVAILABLE VIA PC-PURSUIT!!! (City code "MIDET")        =
   E-MAIL Address: dave@DDMI.COM

dave@tygra.UUCP (David Conrad) (02/01/91)

I didn't realize there were already so many solutions out there,
so...nevermind.  If anyone really wants the source cut.pas, mail me.
--
David Conrad
tygra!dave@uunet.uu.net
tygra!dave@sharkey.cc.umich.edu
-- 
=  CAT-TALK Conferencing Network, Computer Conferencing and File Archive  =
-  1-313-343-0800, 300/1200/2400/9600 baud, 8/N/1. New users use 'new'    - 
=  as a login id.  AVAILABLE VIA PC-PURSUIT!!! (City code "MIDET")        =
   E-MAIL Address: dave@DDMI.COM

doron@chanter.cs.cornell.edu (Leor Doron) (02/02/91)

In article <1991Feb1.070051.26557@tygra.UUCP> dave@tygra.UUCP (David Conrad) writes:
>In article <1991Jan22.020001.15847@ashtate> tomr@ashtate (Tom Rombouts) writes:
>>   3.  Ignore the "starting with M" rule on the final
>>       line, which I belive is indicated after "begin" how many
>>       encoded lines there are.
>>
>
>The number after "begin" does NOT indicate how many encoded lines there
>are, but there will probably be eighteen other posts telling what it
>*does* mean.  It is an invariant "644" in the three versions of uuencode
>I have, and I've only run into one uuencoded file with a different number
>ever (it was "600").  I think it's a version number.
>
>>Tom Rombouts  Torrance 'Tater  tomr@ashtate.A-T.com  V:(213)538-7108
>
>David Conrad

This is the first (to my knowledge) of the eighteen posts correcting you ;-).

The number after 'begin' indicates the Unix permissions code of the original
file.  It is included so that the recreated file will retain the permissions
of the original.  Such permissions are necessary for (half-)secure multiuser
systems.

The permissions code is in octal (yes, octal!) notation.  The binary
translation of each digit is interpreted as 'rwx': read, write, and execute
access.  The three digits refer to 'ugo': user who owns the file, group and
other.  Thus, 640 is 110 100 000, which means the user (owner) can read and
write to the file; members of the group assigned to the file can only read it;
and everyone else can't do a thing with it.  A listing of the file in long
format on unix would look like this:

-rw-r-----  1 doron    dgroup          4 Feb  1 19:34 foo

Non-Unix UUENCODEs have permissions hard-coded because none are available
from the operating system, so a reasonable choice is made for compatibility.
644 keeps your file from being tampered with, while 600 keeps it private.
Naturally, a user can manually change the permissions on the file, if he/she
desires/remembers to do so -- these are just relatively safe initial values.

Hope this helped,

--Lee

==doron@cs.cornell.edu=========================================================
= ".Sig!  .Sig a .sog!  .Sig it loud; .sig it .strog!"                        =
========= -- Karen Carpenter with a head cold =================================