[comp.lang.icon] UUXXCODE

tenaglia@mis.mcw.edu ("Chris Tenaglia - 257-8765") (09/15/90)

I have some files that I'd like to UUENCODE or UUDECODE. Being on a VMS
system without a C compiler, having the sources still doesn't help. I've
attempted a port to Icon of UUENCODE and UUDECODE. They are close, but not
quite right. Would someone care to look them over and offer suggestions or
corrections? Thanx.

Chris Tenaglia (System Manager)         about 160 lines follow
Medical College of Wisconsin
8701 W. Watertown Plank Rd.
Milwaukee, WI 53226
(414)257-8765
tenaglia@mis.mcw.edu, mcwmis!tenaglia

------------------- uuencode.icn ---------------------------
##################################################################
#                                                                #
# UUENCODE.ICN           09/14/90          BY TENAGLIA           #
#                                                                #
# UUENCODE BINARY FILES FOR EMAIL TRANSFER                       #
#                                                                #
##################################################################
procedure main(param) 

  source := param[1]        | input("_Source:")
  target := param[2]        | input("_Target:")
  (in  := open(source))     | stop("Can't open ",source)
  (out := open(target,"w")) | stop("Can't open ",target)

  write("\fUUENCODE FROM ",source," TO ",target)
  write(out,"begin 0600 ",source)
  while line := reads(in,45) do
    {
    writes(*line,", ")
    writes(out,char(*line+32))
    every i := 1 to *line by 3 do output(out,line[i+:3])
    write(out,"")
    }
  write(out,"end")
  close(in) ; close(out)
  end

#
# THIS PROCEDURE TAKES AN OUTPUT FILE PARAMETER AND A 3 BYTE STRING
# OF BINARY DATA. IT WRITES OUT 4 BYTES OF PRINTABLE/MAILABLE ASCII
#
procedure output(fo,str)
  c1 := ord(str[1])*4
  c2 := ior( iand(ord(str[1])/16,8r060) , iand(ord(str[2])*16,8r017) )
  c3 := ior( iand(ord(str[2])/4, 8r074) , iand(ord(str[3])*64,8r003) )
  c4 := iand(ord(str[3]),8r077)
  writes(fo,char(enc(c1)),
            char(enc(c2)),
            char(enc(c3)),
            char(enc(c4)))
  end

#
# ENCRYPTION ASCIIZER ROUTINE
#
procedure enc(n)
  return iand(n,8r0077)+32
  end

#
# PROMPT AND TAKE AN INPUT
#
procedure input(prompt)       
  writes(prompt)
  return read()
  end
----------------------- uudecode.icn ------------------------------
##################################################################
#                                                                #
# UUDECODE.ICN           09/14/90          BY TENAGLIA           #
#                                                                #
# UUDECODE BINARY FILES FOR EMAIL TRANSFER                       #
#                                                                #
##################################################################
procedure main(param)

  source := param[1]          | input("_Source:")
  target := param[2]          | input("_Target:")
  (in    := open(source))     | stop("Can't open ",source)
  (out   := open(target,"w")) | stop("Can't open ",target)

  write("\fUUDECODE FROM ",source," TO ",target)
  until match("begin",(line := read(in)))
  write("Found ",parse(line,' ')[3])
  while line := read(in) do
    {
    writes(*line,", ")
    p     := 0
    bytes := ord(line[1]) - 32
    if bytes = 64 then bytes := 0
    if (bytes=0) | match("end",line) then break
    count := integer(real(bytes)/3.0 + 0.9) * 4
    buf   := ""
    every i := 2 to count by 4 do
      {
      x1 := ord(line[i])   - 32
      if x1 = 64 then x1 := 0
      x2 := ord(line[i+1]) - 32
      if x2 = 64 then x2 := 0
      x3 := ord(line[i+2]) - 32
      if x3 = 64 then x3 := 0
      x4 := ord(line[i+3]) - 32
      if x4 = 64 then x4 := 0
      if p < bytes then
        {
        p +:= 1
        buf ||:= char(x2 / 16 + x1 * 4)
        }
      if p < bytes then
        {
        p +:= 1
        buf ||:= char(x3 / 4 + (x2 % 16) * 16)
        }
      if p < bytes then
        {
        p +:= 1
        buf ||:= char(x4 + (x3 % 4) * 64)
        }
      }
    writes(out,buf)
    writes("(",*buf,"), ")
    }
  close(in) ; close(out)
  end

#
# PARSE A STRING WITH RESPECT TO A DELIMITER CSET
#
procedure parse(line,delims)
  static chars
  chars  := &cset -- delims
  tokens := []
  line ? while tab(upto(chars)) do put(tokens,tab(many(chars)))
  return tokens
  end

#
# THIS PROCEDURE TAKES AN OUTPUT FILE PARAMETER AND A 3 BYTE STRING
# OF BINARY DATA. IT WRITES OUT 4 BYTES OF PRINTABLE/MAILABLE ASCII
#
procedure output(fo,str)
  c1 := ord(str[1])*4
  c2 := ior( iand(ord(str[1])/16,8r060) , iand(ord(str[2])*16,8r017) )
  c3 := ior( iand(ord(str[2])/4, 8r074) , iand(ord(str[3])*64,8r003) )
  c4 := iand(ord(str[3]),8r077)
  writes(fo,char(enc(c1)),
            char(enc(c2)),
            char(enc(c3)),
            char(enc(c4)))
  end

#
# ENCRYPTION ASCIIZER ROUTINE
#
procedure enc(n)
  return iand(n,8r0077)+32
  end

#
# PROMPT AND TAKE AN INPUT
#
procedure input(prompt)
  writes(prompt)
  return read()
  end

yost@DPW.COM (David A. Yost) (09/18/90)

In article <0093CBF0039A4D60.20400E83@mis.mcw.edu> tenaglia@mis.mcw.edu ("Chris Tenaglia - 257-8765") writes:
>
>I've attempted a port to Icon of UUENCODE and UUDECODE.

Now there's a couple of programs I bet are really slow in Icon.
I'd love to see how fast they run with the Icon compiler,
compared to the interpreter and compared to the original
C version.

If the compiler can really do a good job on this sort of repetitive
low-level stuff that kills Icon performance, it would be the dawning
of a new day!  Imagine: we could use Icon instead of C for everything!

Ken, can we see a benchmark on this?

 --dave yost
   yost@dpw.com or uunet!esquire!yost
   Please don't use other mangled forms you may see
   in the From or Reply-To fields above.

kwalker@CS.ARIZONA.EDU ("Kenneth Walker") (09/20/90)

	Date: 18 Sep 90 16:13:58 GMT
	From: esquire!yost@nyu.edu

	In article <0093CBF0039A4D60.20400E83@mis.mcw.edu> tenaglia@mis.mcw.edu
        ("Chris Tenaglia - 257-8765") writes:
	>
	>I've attempted a port to Icon of UUENCODE and UUDECODE.

	Now there's a couple of programs I bet are really slow in Icon.
	I'd love to see how fast they run with the Icon compiler,
	compared to the interpreter and compared to the original
	C version.

	If the compiler can really do a good job on this sort of repetitive
	low-level stuff that kills Icon performance, it would be the dawning
	of a new day!  Imagine: we could use Icon instead of C for everything!

	Ken, can we see a benchmark on this?

I tried 3 runs of each program using the Unix time command and got a range
of results. The uuencode.icn I used has a fix that is not in the version
Chris posted. I also removed the write() expressions that print the number
of bytes in each group processed.

 uuencode:
     compiled is 4.0 - 5.8 times faster than interpreted
     system version is 21 - 40 times faster than compiled

 uudecode
      compiled is 2.4 - 2.9 times faster than interpreted
      system version is 22 - 36 times faster than compiled

While the time command is clearly not a very accurate measure of program
speed (I used the same data on all 3 runs), it does give a feeling
for how much the compiler improves speed and how much work is left to
do to get programs like these to run as fast as those coded in C.

  Ken Walker / Computer Science Dept / Univ of Arizona / Tucson, AZ 85721
  +1 602 621-4324  kwalker@cs.arizona.edu {uunet|allegra|noao}!arizona!kwalker

goer@quads.uchicago.edu (Richard L. Goerwitz) (09/20/90)

Kwalker@CS.ARIZONA.EDU ("Kenneth Walker") writes:
>
> uuencode:
>     compiled is 4.0 - 5.8 times faster than interpreted
>     system version is 21 - 40 times faster than compiled
>
> uudecode
>      compiled is 2.4 - 2.9 times faster than interpreted
>      system version is 22 - 36 times faster than compiled
>
>While the time command is clearly not a very accurate measure of program
>speed (I used the same data on all 3 runs), it does give a feeling
>for how much the compiler improves speed and how much work is left to
>do to get programs like these to run as fast as those coded in C.

For short programs like this, it would probably be better to use C.
The real advantage of using Icon is in how it speeds up development
time, and aids program maintenance by simply cutting down on the amount
and complexity of the code.

I guess what I'm trying to say is that getting a two to five-fold
speed increase looks pretty good to people like me who can only think
with horror about what some of our programs would look like - or how
hard it would be to write/maintain them - if they had to be done in
C.

Don't be too deferential about the gap between compiled Icon and C.
What you and Janalee have done is terriffic.

-Richard

goer@quads.uchicago.edu (Richard L. Goerwitz) (09/20/90)

Tenaglia@mis.mcw.edu ("Chris Tenaglia") writes:
>
>I have some files that I'd like to UUENCODE or UUDECODE. Being on a VMS
>system without a C compiler, having the sources still doesn't help. I've
>attempted a port to Icon of UUENCODE and UUDECODE. They are close, but not
>quite right. Would someone care to look them over and offer suggestions or
>corrections?

No C compiler?  How do you exist?  :-)

Here are a couple of Icon uuXXcode functions.  They should be pretty much
compatible with the latest BSD version.  Notes are offered on how to make
them work the same as the "old" version, though the two versions are com-
patible.

I guess I'll post uuencode first (I call it iiencode).  Iidecode will come
in a subsequent posting.

-Richard

############################################################################
#
#	Name:	 iiencode.icn
#
#	Title:	 iiencode (port of the Unix/C uuencode program to Icon)
#
#	Author:	 Richard L. Goerwitz
#
#	Version: 1.2
#
############################################################################
#
#     This is an Icon port of the Unix/C uuencode utility.  Since
#  uuencode is publicly distributable BSD code, I simply grabbed a
#  copy, and rewrote it in Icon.  The only basic functional change I
#  made to the program was to simplify the notion of file mode.
#  Everything is encoded with 0644 permissions.  Operating systems
#  differ so widely in how they handle this sort of thing that I
#  decided just not to worry about it.
#
#      Usage is the same as the Unix uuencode command, i.e. a first
#  (optional) argument gives the name the file to be encoded.  If this
#  is omitted, iiencode just uses the standard input.  The second and
#  final argument gives the name the encoded file should be given when
#  it is ultimately decoded:
#
#         iiencode [infile] remotefilename
#
#  BUGS:  Slow.  I decided to go for clarity and symmetry, rather than
#  speed, and so opted to do things like use ishift(i,j) instead of
#  straight multiplication (which under Icon v8 is much faster).  Note
#  that I followed the format of the newest BSD release, which refuses
#  to output spaces.  If you want to change things back around so that
#  spaces are output, look for the string "BSD" in my comments, and
#  then (un)comment the appropriate sections of code.
#
############################################################################
#
#  See also: iidecode.icn
#
############################################################################

procedure main(a)

    local in, filename

    # optional 1st argument
    if *a = 2 then {
	filename := pop(a)
	if not (in := open(filename, "r")) then {
	    write(&errout,"Can't open ",a[1],".")
	    exit(1)
	}
    }
    else in := &input

    if *a ~= 1 then {
	write(&errout,"Usage:  iiencode [infile] remotefile")
	exit (2)
    }

    # This generic version of uuencode treats file modes in a primitive
    # manner so as to be usable in a number of environments.  Please
    # don't get fancy and change this unless you plan on keeping your
    # modified version on-site (or else modifying the code in such a
    # way as to avoid dependence on a specific operating system).
    writes("begin 644 ",a[1],"\n")

    encode(in)

    writes("end\n")
    exit(0)

end

procedure encode(in)

    # Copy from in to standard output, encoding as you go along.

    local line

    # 1 (up to) 45 character segment
    while line := reads(in, 45) do {
	writes(ENC(*line))
	line ? {
	    while outdec(move(3))
	    pos(0) | outdec(left(tab(0), 3, " "))
	}
	writes("\n")
    }
    # Uuencode adds a space and newline here, which is decoded later
    # as a zero-length line (signals the end of the decoded text).
    # writes(" \n")
    # The new BSD code (compatible with the old) avoids outputting
    # spaces by writing a ` (see also how it handles ENC() below).
    writes("`\n")

end

procedure outdec(s)

    # Output one group of 3 bytes (s) to standard output.  This is one
    # case where C is actually more elegant than Icon.  Note well!

    local c1, c2, c3, c4

    c1 := ishift(ord(s[1]),-2)
    c2 := ior(iand(ishift(ord(s[1]),+4), 8r060),
	      iand(ishift(ord(s[2]),-4), 8r017))
    c3 := ior(iand(ishift(ord(s[2]),+2), 8r074),
	      iand(ishift(ord(s[3]),-6), 8r003))
    c4 := iand(ord(s[3]),8r077)
    every writes(ENC(c1 | c2 | c3 | c4))

    return

end

procedure ENC(c)

    # ENC is the basic 1 character encoding procedure to make a char
    # printing.

    # New BSD code doesn't output spaces...
    return " " ~== char(iand(c, 8r077) + 32) | "`"
    # ...the way the old code does:
    # return char(iand(c, 8r077) + 32)

end

goer@quads.uchicago.edu (Richard L. Goerwitz) (09/20/90)

############################################################################
#
#	Name:	 iidecode.icn
#
#	Title:	 iidecode (port of the Unix/C uudecode program to Icon)
#
#	Author:	 Richard L. Goerwitz
#
#	Version: 1.2
#
############################################################################
#
#     This is an Icon port of the Unix/C uudecode utility.  Since
#  uudecode is publicly distributable BSD code, I simply grabbed a
#  copy, and rewrote it in Icon.  The only basic functional change I
#  made to the program was to simplify the notion of file mode.
#  Everything is encodedwith 0644 permissions.  Operating systems
#  differ so widely in how they handle this sort of thing that I
#  decided just not to worry about it.
#
#      Usage is the same as the Unix uudecode command, i.e. a first
#  (optional) argument gives the name the file to be decoded.  If this
#  is omitted, iidecode just uses the standard input:
#
#         iidecode [infile] remotefilename
#
#      Even people who do not customarily use Unix should be aware of
#  the uuen/decode program and file format.  It is widely used, and has
#  been implemented on a wide variety of machines for sending 8-bit
#  "binaries" through networks designed for ASCII transfers only.
#
#  BUGS:  Slow.  I decided to go for clarity and symmetry, rather than
#  speed, and so opted to do things like use ishift(i,j) instead of
#  straight multiplication (which under Icon v8 is much faster).
#
############################################################################
#
#  See also: iiencode.icn
#
############################################################################

procedure main(a)

    local in, filename, dest

    # optional 1st (and only) argument
    if *a = 1 then {
	filename := pop(a)
	if not (in := open(filename, "r")) then {
	    write(&errout,"Can't open ",a[1],".")
	    exit(1)
	}
    }
    else in := &input

    if *a ~= 0 then {
	write(&errout,"Usage:  iidecode [infile] remotefile")
	exit (2)
    }

    # Find the "begin" line, and determine the destination file name.
    !in ? {
	tab(match("begin ")) &
	tab(many(&digits))   &	# mode ignored
	tab(many(' '))       &
	dest := tab(0)
    }

    # If dest is null, the begin line either isn't present, or is
    # corrupt (which necessitates our aborting with an error msg.).
    if /dest then {
	write(&errout,"No begin line.")
	exit(3)
    }

    # Tilde expansion is heavily Unix dependent, and we can't always
    # safely write the file to the current directory.  Our only choice
    # is to abort.
    if match("~",dest) then {
	write(&errout,"Please remove ~ from input file begin line.")
	exit(4)
    }
       
    out := open(dest, "w")
    decode(in, out)		# decode checks for "end" line
    if not match("end", !in) then {
	write(&errout,"No end line.\n")
	exit(5)
    }
    exit(0)

end



procedure decode(in, out)
    
    # Copy from in to out, decoding as you go along.

    local line, chunk

    while line := read(in) do {

	if *line = 0 then {
	    write(&errout,"Short file.\n")
	    exit(10)
	}

	line ? {
	    n := DEC(ord(move(1)))

	    # Uuencode signals the end of the coded text by a space
	    # and a line (i.e. a zero-length line, coded as a space).
	    if n <= 0 then break
	    
	    while (n > 0) do {
		chunk := move(4) | tab(0)
		outdec(chunk, out, n)
		n -:= 3
	    }
	}
    }
    
    return

end



procedure outdec(s, f, n)

    # Output a group of 3 bytes (4 input characters).  N is used to
    # tell us not to output all of the chars at the end of the file.

    local c1, c2, c3

    c1 := iand(
	       ior(
		   ishift(DEC(ord(s[1])),+2),
		   ishift(DEC(ord(s[2])),-4)
		   ),
	       8r0377)
    c2 := iand(
	       ior(
		   ishift(DEC(ord(s[2])),+4),
		   ishift(DEC(ord(s[3])),-2)
		   ),
	       8r0377)
    c3 := iand(
	       ior(
		   ishift(DEC(ord(s[3])),+6),
		   DEC(ord(s[4]))
		   ),
	       8r0377)

    if (n >= 1) then
	writes(f,char(c1))
    if (n >= 2) then
	writes(f,char(c2))
    if (n >= 3) then
	writes(f,char(c3))

end	



procedure DEC(c)

    # single character decode
    return iand(c - 32, 8r077)

end

tenaglia@mis.mcw.edu ("Chris Tenaglia - 257-8765") (09/20/90)

I'd like to thank Ken Walker for his corrective advice in getting the
uuencode fixed. Apparently the uudecode was ok. Thanks also for the
alternative iiencode/iidecode. It sure generated a stir. I was looking
for a little unix-like functionality, but it seems to be of great
interest to benchmarkers too. The uudecode is probably a pretty shabby
example, since I converted it almost literally from a PC BASIC program.

For VMS folks, uuencode and uudecode are nifty ways of converted binary
data to printable/mailable ascii and back again. It expands 3 binary
bytes into 4 printable bytes. This 33% increase is more compact than
vms dump data. Unfortunately vms has oodles of file types, and not all
will work. VMS Executables (.EXE) will compress and decompress just fine.
Icon executables work too (.ICX). Object files (.OBJ) won't work at all.
Text files almost work. Sometimes there is a little trailing garbage on
the end of the file (or is that a bug in my program?)

Is there any interest in a reposting of the finished programs?

Chris Tenaglia (System Manager)
Medical College of Wisconsin
8701 W. Watertown Plank Rd.
Milwaukee, WI 53226
(414)257-8765
tenaglia@mis.mcw.edu, mcwmis!tenaglia