[comp.archives.admin] ftpmail data

vixie@decwrl.dec.com (Paul A Vixie) (06/04/91)

you can really see from these numbers when the bitftp server was shut off.

the length of the log files, in bytes, for each month of ftpmail's operation:

-rw-rw-r--  1 root       147850 Apr  7 01:01 .log.90-12
-rw-rw-r--  1 root       457698 Apr  7 01:01 .log.91-01
-rw-rw-r--  1 root       542694 Apr  7 01:01 .log.91-02
-rw-rw-r--  1 root       959640 Apr  7 01:01 .log.91-03
-rw-rw-r--  1 root       984924 May 25 19:23 .log.91-04
-rw-rw-r--  1 root      2137269 Jun  4 01:41 .log.91-05

summary of those log files is as follows:

Log File     From                  To                   ftps    parts    kbytes
-------------------------------------------------------------------------------
.log.90-12   12/05/90 08:17:16     12/31/90 19:00:53     740     2012    100758
.log.91-01   01/01/91 03:30:45     01/31/91 23:30:40    2214     7080    365393
.log.91-02   02/01/91 00:00:16     02/28/91 23:52:02    2510     7734    399244
.log.91-03   03/01/91 00:14:29     03/31/91 21:32:31    4414    13148    681933
.log.91-04   04/01/91 05:30:25     04/30/91 23:45:54    4842    10236    494517
.log.91-05   05/01/91 00:35:08     05/31/91 23:38:33   10356    20293    933187

that is not a typographical error.  it explains the multiday delays in may;
ftpmail has a limit of how much network traffic it will generate per unit
time, and the requests have been coming in at (well) over that rate.

cheers,
--
Paul Vixie
DEC Western Research Lab	<vixie@pa.dec.com>	<paul@vixie.sf.ca.us>
Palo Alto, California, USA	...!decwrl!vixie	...!vixie!paul

lance@motcsd.csd.mot.com (lance.norskog) (06/05/91)

vixie@decwrl.dec.com (Paul A Vixie) writes:

>you can really see from these numbers when the bitftp server was shut off.

> many large numbers ommitted

Perhaps you could add a GIF-rejecter?

Lance Norskog

vixie@decwrl.dec.com (Paul A Vixie) (06/05/91)

>> Perhaps you could add a GIF-rejecter [to ftpmail]?
>> 
>> Lance Norskog

are all files whose names end in .gif to be considered junque?
--
Paul Vixie
DEC Western Research Lab	<vixie@pa.dec.com>	<paul@vixie.sf.ca.us>
Palo Alto, California, USA	...!decwrl!vixie	...!vixie!paul

lmb@sat.com (Larry Blair) (06/06/91)

In article <VIXIE.91Jun5034053@fork-city.pa.dec.com> vixie@decwrl.dec.com (Paul A Vixie) writes:
=>> Perhaps you could add a GIF-rejecter [to ftpmail]?
=>> 
=>> Lance Norskog
=
=are all files whose names end in .gif to be considered junque?

Gifs are definitely not junk.  They are just large.  From the Usenet point of
view, having a distribution method that allows selective distribution is
preferable to the broadcast method employed by alt.*.pictures.

The problem for ftpmail is that the size becomes a problem.  Actually not the
size, but the fact that while I might get 1 bash-1.08.tar.Z (768K), I might
get 20 or 30 256K gifs.  If you don't mind the bandwidth and system load, have
at it.  But don't be surprised.

One other note: many gifs available on the Internet are actually copyrighted
works illegally being distributed.  I don't know if you would be responsible
(legally) or not.
-- 
Larry Blair   lmb@sat.com   {apple,decwrl}!sat!lmb

lance@motcsd.csd.mot.com (lance.norskog) (06/06/91)

>>> Perhaps you could add a GIF-rejecter [to ftpmail]?

>are all files whose names end in .gif to be considered junque?

All files of the form <female_name>.gif :-)

Seriously, I recently recommended the same thing for moderating
comp.graphics.  Just have a list of known suffixes and canned responses.
Force them to send you a mail message claiming that 'dontstop.au' 
isn't really an obscene sound sample...


Lance

xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) (06/06/91)

 vixie@decwrl.dec.com (Paul A Vixie) writes:
>> Perhaps you could add a GIF-rejecter [to ftpmail]?

>> Lance Norskog

> are all files whose names end in .gif to be considered junque?

Unfortunately not, since it is quite as possible to
archive in a *.gif an interesting cell cross-section
or scanning electron micrograph for ftp as a scanned
girlie mag photo. Contrariwise, not all the latter
travel as *.gif files; there are over a hundred
various archive formats in common use on the net, of
which GIFs are only one.

You might want to count the *.gif by number and
volume being fetched, if it is easy to do, just to
check whether they become a part of the volume
eliminating which would bring your server back
within cost goals.

Kent, the man from xanth.
<xanthian@Zorch.SF-Bay.ORG> <xanthian@well.sf.ca.us>

garym@cognos.uucp@uunet.uu.net (Gary Murphy) (06/06/91)

Visual information is no more or less catagorically rejectable than
anything else on-line but it is a good point that while I might take
only one of a 700k archive and then take a week to digest it, .GIF's
are likely to be downloaded in large numbers and consumed in an
instant.  The only hope I've seen so far is to store on-line images
as very low-quality JPEG files and make those interested in obtaining
the real image arrange the transfer themselves.

While JPEG might save upwards of 80% on the image bandwidth (and
provide images suitable for 90% of the requests), the downside is most
prohibitive: the only vendor to date only supplies shareware decoders
($US20 and a whopping $US65 for the deluxe encoder) and it is
unreasonable to expect EVERYONE to buy one.  As with .ZIP files, which
previously saved us the bandwidth of pure shar/tar files, JPEG will
probably never take off until the decoders become public domain.




















--
 Gary Murphy - Cognos Incorporated  | "You think you're human.
       P.O.Box 9707 Ottawa K1G 3N3  |  But what if you've made a mistake,
              (613) 738-1338 x5537  |  As humans sometimes do,
 garym%cognos.uucp@ccs.carleton.ca  |  And you is an Angel, instead?"
                uucp: cognos!garym  |                     -- Sun Ra

de5@ornl.gov (Dave Sill) (06/07/91)

In article <GARYM.91Jun6093906@cognos.uucp@uunet.uu.net>, garym@cognos.uucp@uunet.uu.net (Gary Murphy) writes:
>
>The only hope I've seen so far is to store on-line images
>as very low-quality JPEG files and make those interested in obtaining
>the real image arrange the transfer themselves.

That's an interesting idea.  I don't know anything about JPEG, but I
think Jef Poskanzer's PBM toolkit could be used to provide scaled-down
or reduced-number-of-colors versions of large GIF's.  Most people
gobbling up the girlie pictures would be happy to preview them first,
I think.  (I have no first-hand knowledge, of course :-)

>While JPEG might save upwards of 80% on the image bandwidth (and
>provide images suitable for 90% of the requests), the downside is most
>prohibitive: the only vendor to date only supplies shareware decoders
>($US20 and a whopping $US65 for the deluxe encoder) and it is
>unreasonable to expect EVERYONE to buy one.

Of course, PBM is free, and the preview versions would still be GIF or
whatever the original was.

-- 
Dave Sill (de5@ornl.gov)	  Tug on anything in nature and you will find
Martin Marietta Energy Systems    it connected to everything else.
Workstation Support                                             --John Muir

rrichter@rphroy.ph.gmr.com (Roy Richter) (06/07/91)

In article <GARYM.91Jun6093906@cognos.uucp@uunet.uu.net>, garym@cognos.uucp@uunet.uu.net (Gary Murphy) writes:
|>
|>The only hope I've seen so far is to store on-line images
|>as very low-quality JPEG files and make those interested in obtaining
|>the real image arrange the transfer themselves.

Unworkable, though.  Where are you going to put them?  FTP site? Anon UUCP?
Mail them out?  All of these will overload a particular site.
There are large number of people who want these.
USENET may very well be the most efficient method of propogating them.

--
Roy Richter                  Internet: rrichter@ph.gmr.com
Physics Dept, GM Research    UUCP:     {umich,cfctech}!rphroy!rrichter

suitti@ima.isc.com (Stephen Uitti) (06/08/91)

In article <1991Jun7.144706.11588@cs.utk.edu> Dave Sill <de5@ornl.gov> writes:
>In article <GARYM.91Jun6093906@cognos.uucp@uunet.uu.net>, garym@cognos.uucp@uunet.uu.net (Gary Murphy) writes:
>>
>>The only hope I've seen so far is to store on-line images
>>as very low-quality JPEG files and make those interested in obtaining
>>the real image arrange the transfer themselves.
>
>That's an interesting idea.  I don't know anything about JPEG, but I
>think Jef Poskanzer's PBM toolkit could be used to provide scaled-down
>or reduced-number-of-colors versions of large GIF's.  Most people
>gobbling up the girlie pictures would be happy to preview them first,
>I think.  (I have no first-hand knowledge, of course :-)
>
>>While JPEG might save upwards of 80% on the image bandwidth (and
>>provide images suitable for 90% of the requests), the downside is most
>>prohibitive: the only vendor to date only supplies shareware decoders
>>($US20 and a whopping $US65 for the deluxe encoder) and it is
>>unreasonable to expect EVERYONE to buy one.
>
>Of course, PBM is free, and the preview versions would still be GIF or
>whatever the original was.
>
>-- 
>Dave Sill (de5@ornl.gov)	  Tug on anything in nature and you will find
>Martin Marietta Energy Systems    it connected to everything else.
>Workstation Support                                             --John Muir

First, some facts.

GIF is an image format that allows storage of images that have
at most 8 bits of colors or grays.  It has a color lookup table that
allows any of the 256 colors to be any 24 bit color.  It uses LZW
for compression.  It is a fairly simple format.  GIF reading is quick.

JPEG is a format that does frequency analysis on small chunks of
an image in an effort to convert the image to a small number of
differant numbers. Then compression is applied.  The computations
can take a long time.  There are many compression options.
As I understand it, neither reading or writing of JPEG is fast.

When GIF was developed, 8 bit (256 color) screens were getting common,
24 bit (16 million color) screens were not.  Gif is a real win
in speed for an 8 bit screen.  It preserves all you can see anyway,
and you don't have to perform color quantization.

8 bits per pixel is generally good enough for grayscale work.
GIF is a good format for this type of image.  GIF also handles
4 bit per pixel and 1 bit per pixel images well.

Color images are typicaly scanned with 24 bit per pixel scanners.
To convert such and image to GIF means that it must be quantized.
Quantization removes information.  For this type of image, GIF
is not lossless.  For each pixel 24 bits is compressed to 8 bits.
Thus the image is 1/3rd the original size.  Then it is compressed
via LZW, yielding a file that is roughly 70% of that. So, GIF files
can be expected to be about 20% to 25% of the original size.

A 24 bit video card for the 640x480 Mac II color monitor can be
had for $450 these days.  I might get one for home.  We have 24
bit color on Macs here at work.  GIF now gets it the way of actually
using the new capability.

JPEG images retain the number of colors, but loose some part of
the spatial resolution.  For most pictures (you started the image
in a camera), JPEG compression yields a better looking image than
GIF on a 24 bit screen.  JPEG files are often less than 10% of
the original size.

Quantizing a 24 bit image to 8 bits introduces high frequency
noise into the image.  Thus, JPEG may reduce a 24 bit image to
something smaller and better looking than it would the same file
that had been converted to GIF.

I don't know if JPEG handles 8 bit gray, 4 bit and 1 bit per pixel
images well.

One feature of JPEG is that software for it has not yet settled
into an easily available and stable form yet.  There is a group
on the net colaborating on a public version of source for conversions.

Observations:

1) Images are binaries.

2) Binaries are large.

3) To prevent huge bandwidth consumption, binaries should only
   be posted via moderated groups.  This increases quaility,
   reduces duplication, etc.

4) GIF files are already reduced-number-of-colors.  Reducing
   colors further won't buy much.  A 100x100 image is 1/4 the
   size of a 200x200 image.  It often produces an image better
   than reducing an 8 bit image to 4 bits - which only saves half.

5) One need not resort to lowest quality JPEG to achieve
   substantially better compression than GIF.

6) JPEG was designed to handle images that are large.

7) Netnews is good for wide distribution.  Mail and FTP are
   not as good.  It does not matter how big or small the data
   is.

Conclusions:

1) If you want to reduce network bandwidth, use a moderator.

2) If you want to archive images, it is possible to treat it
   as a seperate problem.  For example:

   The moderator posts original images in whatever form.

   The moderator keeps images at an FTP site, or automatic mail
   access for the month following an image's posting.

   An archive site retains 128x128 4 bit gray thumbnails (8,192
   bytes, uncompressed) in a well known format, available via ftp
   & automatic mail.  Also, a text archive index is avilable.

   Some or all full images are available.  Since one archive site
   does not have the resources to burn, a distributed archive
   is set up.  The central archive has pointers to other archives.

   An archive site devotes a fixed maximum amount of online resources
   to archiving.

   Rules should allow an archiver to change their mind.  The data
   is transfered to a new archiver, or archivers.  The pointers
   are updated.

phil@ux1.cso.uiuc.edu (Phil Howard KA9WGN) (06/11/91)

How about this:

Make the ftpmail program classify what it is transferring by some means,
such as "gif" or other strings appearing in the file or any directory
name means it is an image.  Subclassifications might also be possible.

Set the limits on daily transfer based on the classes.  The root limit
can be less than the total class limit if desired.

Thus you can make all the requests for images and gifs compete against
each other, and not against those people simply trying to get a copy
of the latest neat free public domain software.
-- 
 /***************************************************************************\
/ Phil Howard -- KA9WGN -- phil@ux1.cso.uiuc.edu   |  Guns don't aim guns at  \
\ Lietuva laisva -- Brivu Latviju -- Eesti vabaks  |  people; CRIMINALS do!!  /
 \***************************************************************************/

blowfish@triton.unm.edu (rON.) (06/11/91)

In article <1991Jun10.223857.17858@ux1.cso.uiuc.edu> phil@ux1.cso.uiuc.edu (Phil Howard KA9WGN) writes:
>How about this:
>Make the ftpmail program classify what it is transferring by some means,
>such as "gif" or other strings appearing in the file or any directory
>name means it is an image.  Subclassifications might also be possible.

Standard GIF files start with the characters GIF right at the front. They
can also be 'seen' uuencoded, as they all start with:
M1TE


rON. (blowfish@triton.unm.edu!ariel.unm.edu)
"It is only with the heart that one see rightly;
 what is essential is invisible to the eye."