cmf851@anu.oz.au (Albert Langer) (06/16/91)
(Relevance to comp.unix.sysv386 and comp.os.coherent explained at end. Follow ups only to comp.archives.admin please.) The CHANGE file in ftpd.5.60.tar.Z mentions that with the new version of ftpd for BSD systems, ftp can no longer directly run ls. I have been working on some software (originally by Tony Moraes, supplied by Ed Vielmetti) which relies on using ls -alR to obtain a recursive directory listing. (I believe it could be used to greatly reduce internet traffic, especially at peak times and on international links. Cheap unix 386 boxes or very cheap Coherent unix lookalikes could provide a "transparent" cache/relay service.) Should I assume that ftp use of ls -alR will soon be broken on most BSD systems? I have the impression that Ed's comp.archives listings may also rely on use of ls for the verification, and perhaps also archie. Also use of Tim Cook's (timcc@admin.viccol.edu.anu) "dls" package with ftp to provide one line descriptions of files instead of an ordinary ls listing relies on ftpd calling a local bin/ls within the chroot (though I think that is unchanged). To avoid possible disruption of important services I hope those concerned will checkout whether any changes will be needed. While I am at it, here's an update on the software I have been working on, in case anybody wants to do something with it, as I won't be able to do much more. Sysv386 and Coherent users may especially be interested in the possibilities for conveniently overcoming the problem of long filenames when transferring files from BSD systems, and for providing adequate ftpmail gateways to UUCP sites that are not on the internet. (Thanks for sending Tony's stuff Ed, sorry about the delay getting back to you on it.) The unreleased collection of shell scripts from Tony Moraes included: 1. ftpfilter. This takes the output of an ftp (or other) ls -AlR command and converts it into a format like that used by "find" so that grepping the output can extract a full path as well as the required file names. (File size and a numeric date are also included on each line - with conversion of the different formats ls uses for older and recent dates.) 2. ftpgrep. This greps through compressed output files from ftpfilter stored in a file named after each host, and results in an output where each full path is prefixed by a hostname and colon in the style of rcp. 3. grabfiles. This uses an rcp/ftpgrep style input to actually get the files with ftp. The main changes I have made are: 1. Horrendous butchery of the code, for which nobody else is responsible. 2. Ability to handle paths with multiple leading, trailing and embedded spaces (mainly to get at collections of Mac software mounted for NFS access). 3. Porting to work on Sys V 3.2 (ISC 386ix 2.0.2) as well as on SunOs 4.1 (and to work for ftp in either direction between the two). 4. Item 3 required an optional conversion of BSD directory and filenames that are longer than 14 characters into Sys V equivalents to avoid collisions between truncated names. Rather than use the "shrink" package that produces unintelligible names I decided to preserve the full original names but converted into filepaths with each segment less than 14 characters. This wastes some inodes and near empty blocks but the files can always be linked to another name later anyway. Also host names were converted to a sub-directory for each domain. 5. Items 3 & 4 were so "hairy" that I had to build in comprehensive testing which compares the files originally requested with those actually obtained and keeps detailed logs and mails up to 4 lists to the user: a) Files ok and of length specified. b) Files obtained for which no length was specified. c) Files obtained but length different from request. d) Files requested but not obtained. The conversion "works" pretty reliably now but I find the mail very reassuring when I just run a long request in background. If the ftp connection gets lost in the middle of a file I will be told (by the wrong file size) and there is no need to manually review any transcripts to see if there were ftp failures. This greatly simplifies life. It could easily be extended to reprocess list d) when a request has been diverted to a cache or shadow/mirror site and try again elsewhere (and also to automatically recover from dropped ftp sessions etc). 6. A script was added to process the output from archie prog commands into the same rcp/ftpgrep style format, using the cliplines.c included in the log_archie package recently distributed in alt.sources. This can easily be developed into a similar facility for comp.archives messages. 7. A single directory tree of domains and hosts (with each site's own directory trees underneath) was established for use by all components so that once a file has been obtained it's local availability can easily be confirmed and future requests could be directed there. (A simple cron job to find and delete files that have not been accessed for a certain period makes this an effective cache). 8. Where more access to a site is available with a specific username and password than "anonymous", this is recorded in the database tree and automatically used. The same mechanism can be used for other special processing (e.g. variant ls commands). Anyway, it seems to be doing more or less what I want it to in heavy duty file transfers between two machines. It is horribly slow (in offline processing), undocumented and clumsily written, but I believe a more competent programmer could easily develop it into a "production" package that could be released and would also: a) Batch ftp requests to handle them during offpeak hours while still providing immediate and reliable feedback about availability. b) Automatically divert requests that have previously been filled to fill them from a cache on the same machine. c) Extend the above to easy maintenance of shadow or mirror archives. d) Automatically divert requests to other local cache or shadow/mirror sites and follow up with further requests if unsuccessful. e) Do all this on a cheap sysv386 system or very cheap Coherent system (as well as on any other Sys V or BSD unix). By sticking to simple shell scripts and doing filename conversion the hard way I think I have made sure a port to Coherent would not be too difficult. A Coherent box could provide cache services to UUCP users, even if it had to obtain its files by UUCP from a cooperating BSD system that had ftp access to the internet (but only maintained a small cache of its own with a short timeout and was not interested in providing modem access to others). With a port of TCP/IP etc to Coherent, the Coherent box could do the whole job (with filename conversions so that the 14 character converted names appeared as full BSD names from outside). Likewise it should be feasible to port to MSDOS with some further filename manipulations. This could be handy for ftp requests to BSD systems from MSDOS users. Unnecessary internet traffic could be greatly reduced by simply providing a (fairly trivial) utility to process comp.archives messages into the required form but with the automatic diversions to nearby cache or shadow/mirror sites. Users would only notice the convenience without noticing the diversion (whereas asking users to "check local cache and mirror archives first", as is done in Australia, seems somewhat optimistic. Peak traffic could be reduced further with an option to delay the request until offpeak times. While that WOULD be noticed by the user, it could be made much more acceptable by immediate mailed confirmation of the request and subsequent mail notification of success. Gateways like ftpmail and netfetch could easily enforce both diversion to cache and mirror sites and offpeak use for mail requests (while again providing immediate feedback). Further substantial reductions in traffic could be achieved by providing larger capacity cache and mirror sites and locating them more at national and regional gateways that have expensive links to the rest of the internet. Ultimately this should be designed into a new internet protocol for a cached and delayed ftp service that uses network store and forward resources for file transfers with substantial storage times as well as for millisecond delay packet switching. In the meantime an application layer kludge seems well worthwhile. Disk space is now only USD $2 per MB and the Coherent operating system is only USD $100. It seems absurd to pays tens of thousands of dollars per month for higher speed international links insead of providing adequate caching. There was a good deal of dicsussion in news.admin recently about problems caused to mail relays by the BITFTP gateway and problems to UUCP sites that are not part of the internet caused by the closing down of that gateway. Installation of cheap caches should substantially relieve the problems of mail relays. If necessary it should not be too difficult to develop an accounting systems to pay for the disk space (and modem traffic) by charges to the users. Well, if anybody wants to take this up I'll be happy to pass on code that does what I need and could easily be developed into a "production" system with the extra features mentioned above. I'm just too rusty at awk and shell programming to finish the job in a reasonable amount of time and I have to get on with other priorities, but I'm sure anybody reasonably competent would have little trouble producing a worthwhile releaseable package quite quickly. (And whoever takes it on can also do the worrying about how to get recursive directory listings if ftpd is changing the access to ls :-) -- Opinions disclaimed (Authoritative answer from opinion server) Header reply address wrong. Use cmf851@csc2.anu.edu.au