[comp.unix.sysv386] New version of my ftpd is now available

cmf851@anu.oz.au (Albert Langer) (06/16/91)
(Relevance to comp.unix.sysv386 and comp.os.coherent explained at end.
Follow ups only to comp.archives.admin please.)

The CHANGE file in ftpd.5.60.tar.Z mentions that with the new version of
ftpd for BSD systems, ftp can no longer directly run ls.

I have been working on some software (originally by Tony Moraes,
supplied by Ed Vielmetti) which relies on using ls -alR to obtain a
recursive directory listing. (I believe it could be used to greatly reduce
internet traffic, especially at peak times and on international links.
Cheap unix 386 boxes or very cheap Coherent unix lookalikes could 
provide a "transparent" cache/relay service.) Should I assume 
that ftp use of ls -alR will soon be broken on most BSD systems?

I have the impression that Ed's comp.archives listings may also rely
on use of ls for the verification, and perhaps also archie. Also use of 
Tim Cook's (timcc@admin.viccol.edu.anu) "dls" package with ftp to
provide one line descriptions of files instead of an ordinary ls listing
relies on ftpd calling a local bin/ls within the chroot (though I think
that is unchanged).

To avoid possible disruption of important services I hope those concerned
will checkout whether any changes will be needed.

While I am at it, here's an update on the software I have been
working on, in case anybody wants to do something with it, as I won't
be able to do much more. Sysv386 and Coherent users may especially
be interested in the possibilities for conveniently overcoming the
problem of long filenames when transferring files from BSD systems,
and for providing adequate ftpmail gateways to UUCP sites that are
not on the internet.

(Thanks for sending Tony's stuff Ed, sorry about the delay
getting back to you on it.) 

The unreleased collection of shell scripts from Tony Moraes included:

1. ftpfilter. This takes the output of an ftp (or other) ls -AlR command
and converts it into a format like that used by "find" so that grepping
the output can extract a full path as well as the required file names.
(File size and a numeric date are also included on each line - with
conversion of the different formats ls uses for older and recent dates.)

2. ftpgrep. This greps through compressed output files from ftpfilter
stored in a file named after each host, and results in an output where
each full path is prefixed by a hostname and colon in the style of rcp.

3. grabfiles. This uses an rcp/ftpgrep style input to actually get the
files with ftp.

The main changes I have made are:

1. Horrendous butchery of the code, for which nobody else is responsible.

2. Ability to handle paths with multiple leading, trailing and embedded 
spaces (mainly to get at collections of Mac software mounted for NFS access).

3. Porting to work on Sys V 3.2 (ISC 386ix 2.0.2) as well as on SunOs 4.1
(and to work for ftp in either direction between the two).

4. Item 3 required an optional conversion of BSD directory and filenames that
are longer than 14 characters into Sys V equivalents to avoid collisions
between truncated names. Rather than use the "shrink" package that produces
unintelligible names I decided to preserve the full original names but
converted into filepaths with each segment less than 14 characters. This
wastes some inodes and near empty blocks but the files can always be linked to
another name later anyway. Also host names were converted to a sub-directory
for each domain.

5. Items 3 & 4 were so "hairy" that I had to build in comprehensive
testing which compares the files originally requested with those actually
obtained and keeps detailed logs and mails up to 4 lists to the user:
	a) Files ok and of length specified.
	b) Files obtained for which no length was specified.
	c) Files obtained but length different from request.
	d) Files requested but not obtained.
The conversion "works" pretty reliably now but I find the mail very
reassuring when I just run a long request in background. If the ftp
connection gets lost in the middle of a file I will be told (by the
wrong file size) and there is no need to manually review any transcripts
to see if there were ftp failures. This greatly simplifies life. It
could easily be extended to reprocess list d) when a request has been
diverted to a cache or shadow/mirror site and try again elsewhere
(and also to automatically recover from dropped ftp sessions etc).

6. A script was added to process the output from archie prog commands
into the same rcp/ftpgrep style format, using the cliplines.c
included in the log_archie package recently distributed in alt.sources.
This can easily be developed into a similar facility for comp.archives
messages.

7. A single directory tree of domains and hosts (with each site's own directory
trees underneath) was established for use by all components so that once a 
file has been obtained it's local availability can easily be
confirmed and future requests could be directed there. (A simple cron job
to find and delete files that have not been accessed for a certain period
makes this an effective cache).

8. Where more access to a site is available with a specific username
and password than "anonymous", this is recorded in the database tree
and automatically used. The same mechanism can be used for other
special processing (e.g. variant ls commands).

Anyway, it seems to be doing more or less what I want it to in heavy
duty file transfers between two machines. It is horribly slow (in
offline processing), undocumented and clumsily written, but I believe 
a more competent programmer could easily develop it into a "production" 
package that could be released and would also:

a) Batch ftp requests to handle them during offpeak hours while still
providing immediate and reliable feedback about availability.

b) Automatically divert requests that have previously been filled to
fill them from a cache on the same machine.

c) Extend the above to easy maintenance of shadow or mirror archives.

d) Automatically divert requests to other local cache or shadow/mirror
sites and follow up with further requests if unsuccessful.

e) Do all this on a cheap sysv386 system or very cheap Coherent system
(as well as on any other Sys V or BSD unix). By sticking to simple
shell scripts and doing filename conversion the hard way I think I
have made sure a port to Coherent would not be too difficult. A
Coherent box could provide cache services to UUCP users, even if it
had to obtain its files by UUCP from a cooperating BSD system that
had ftp access to the internet  (but only maintained a small cache
of its own with a short timeout and was not interested in providing
modem access to others). With a port of TCP/IP etc to Coherent, the
Coherent box could do the whole job (with filename conversions so
that the 14 character converted names appeared as full BSD names
from outside). Likewise it should be feasible to port to MSDOS
with some further filename manipulations. This could be handy for
ftp requests to BSD systems from MSDOS users.

Unnecessary internet traffic could be greatly reduced by simply providing
a (fairly trivial) utility to process comp.archives messages into the
required form but with the automatic diversions to nearby cache or
shadow/mirror sites.

Users would only notice the convenience without noticing the diversion
(whereas asking users to "check local cache and mirror archives first", 
as is done in Australia, seems somewhat optimistic.

Peak traffic could be reduced further with an option to delay the
request until offpeak times. While that WOULD be noticed by the user,
it could be made much more acceptable by immediate mailed confirmation
of the request and subsequent mail notification of success.

Gateways like ftpmail and netfetch could easily enforce both diversion
to cache and mirror sites and offpeak use for mail requests (while
again providing immediate feedback).

Further substantial reductions in traffic could be achieved by providing
larger capacity cache and mirror sites and locating them more at national and
regional gateways that have expensive links to the rest of the internet.

Ultimately this should be designed into a new internet protocol for a
cached and delayed ftp service that uses network store and forward
resources for file transfers with substantial storage times as well as for
millisecond delay packet switching.

In the meantime an application layer kludge seems well worthwhile.
Disk space is now only USD $2 per MB and the Coherent operating system 
is only USD $100. It seems absurd to pays tens of thousands of dollars per
month for higher speed international links insead of providing adequate
caching.

There was a good deal of dicsussion in news.admin recently about
problems caused to mail relays by the BITFTP gateway and problems to
UUCP sites that are not part of the internet caused by the closing
down of that gateway. Installation of cheap caches 
should substantially relieve the problems of mail relays.
If necessary it should not be too difficult to develop an accounting
systems to pay for the disk space (and modem traffic) by charges
to the users.

Well, if anybody wants to take this up I'll be happy to pass on
code that does what I need and could easily be developed into a
"production" system with the extra features mentioned above. I'm
just too rusty at awk and shell programming to finish the job in a reasonable
amount of time and I have to get on with other priorities, but I'm sure 
anybody reasonably competent would have little trouble producing a worthwhile
releaseable package quite quickly.

(And whoever takes it on can also do the worrying about how to get
recursive directory listings if ftpd is changing the access to ls :-)

--
Opinions disclaimed (Authoritative answer from opinion server)
Header reply address wrong. Use cmf851@csc2.anu.edu.au