[gnu.emacs.gnus] Compatibility problem with GNUS and 1.5.10 NNTP

stealth@caen.engin.umich.edu (arrakis) (09/07/90)

When we cut over to NNTP 1.5.10 from 1.5.8, GNUS stopped operating
properly.  It gets through the group selection process fine, but when
you start to read a newsgroup, it gets stuck at the "0% of headers
recieved" point.  Changing nntp-maximum-request to a low number doesn't
fix the problem, which is an implied solution to such a problem.
We're running GNUS 3.13.  We're currently operating by having a 1.5.8
server on a separate port to which the GNUS reader can connect.

Any ideas?  I'll post a summary.

--
Michael V. Pelletier            | "We live our lives with our hands on the
 CAEN UseNet News Administrator |  rear-view mirror, striving to get a better
 Systems Group Programmer       |  view of the road behind us.  Imagine what's
                                |  possible if we look ahead and steer..."

aglew@crhc.uiuc.edu (Andy Glew) (09/12/90)

GNUS users at UIUC were plagued by the same problem - GNUS 3.13 or
3.12 hanging when our news server updated to nntp 1.5.10.  A bit of
experimentation showed that any value of nntp-maximum-request > 2
produced this hang.  I set nntp-maximum-request back to 1, and am
using GNUS fine now.

--
Andy Glew, a-glew@uiuc.edu [get ph nameserver from uxc.cso.uiuc.edu:net/qi]

stealth@caen.engin.umich.edu (Mike Pelletier) (09/14/90)

Add a line "setbuf(stdin, NULL);" at line 255 of server/serve.c
   (just before the "for(;;) {" line)
and the problem will go away.  This may have some performance
costs, but it's by far the simplest solution that's been sent to
me thus far.
--
Michael V. Pelletier            | "We live our lives with our hands on the
 CAEN UseNet News Administrator |  rear-view mirror, striving to get a better
 Systems Group Programmer       |  view of the road behind us.  Imagine what's
                                |  possible if we look ahead and steer..."

sdk@shadow.twinsun.com (Scott D Kalter) (09/15/90)

So far two solutions have been mentioned to the GNUS vs. NNTP 1.5.10
problem.  

1. (setq nntp-maximum-request 1) which makes emacs and the NNTP server
work in a synchronous lock-step -- emacs makes one request, NNTP
answers, emacs makes one request etc.  It seems intuitive to allow
emacs to make several requests without making it wait for NNTP to
respond.

2. setbuf(stdin,NULL) which solves the problem by forcing NNTP to not
ignore buffered requests (by eliminating the buffer).  This isn't so
hot a solution since it forces NNTP to do a read() for each and every
input character!

I have spent some time looking at both ends of this problem.  My
current opinion is that NNTP has some problems in trying to use both
select() and buffered I/O through fgets().  After staring at this code
for an hour I realized that there are a couple of things to note:

1.  The select is used as nothing more than a timeout mechanism to
decide when the server has been idle too long and should be shut down.

2.  There is an evil bug lurking in this code in that if a client
should send a string with no carriage return and then stops sending
anything it will hang the NNTP server and that select() isn't going to
help one bit (it will hang in the fgets()).

Granted (2) is much more unlikely than having a well behaved client
that only makes complete requests and then sits idle for two hours
(the standard timeout for select here) indicating the server should
just give up.

However, I see three options to solve the problem:

1. Remove the use of select and simply assume that clients will not
sit around idle for two or more hours.

2. Put in an explicit test (before the select call) to see if there is
something still in the buffer and don't make the select call if there
is.  This could be done (non-portably) by poking in  _iobuf->_cnt
described in stdio.h or by building one's own buffering scheme
(portably). 

3. Give up on using select and use an alarm instead.

I believe the 3rd option makes the most sense given that we basically
are trying to implement something like a watchdog timer.  If the timer
goes off, just shut down this server process (which is what select
does if it times out).

Apparently the NNTP author is working on a fix but any of the above
could be implemented without too much difficulty and without costing
all newsreaders with a setbuf(stdin, NULL);.

-sdk

pcg@cs.aber.ac.uk (Piercarlo Grandi) (09/18/90)

On 14 Sep 90 17:33:08 GMT, sdk@shadow.twinsun.com (Scott D Kalter) said:

sdk> So far two solutions have been mentioned to the GNUS vs. NNTP 1.5.10
sdk> problem.  

	[ ... both are bad, because setting nntp-maximum-request to 1
	prevents batching of requests, and thus leads to many small one
	line IPC transactions, and unbuffering stdio means that fgets
	must read from the socket one char at a time ... ]

sdk> I have spent some time looking at both ends of this problem.  My
sdk> current opinion is that NNTP has some problems in trying to use both
sdk> select() and buffered I/O through fgets(). [ ... ]

Correct -- select(2) can only know that there are bytes waiting at the
socket, does not know about bytes waiting in the stdio buffer.

sdk> However, I see three options to solve the problem:

sdk> 1. Remove the use of select and simply assume that clients will not
sdk> sit around idle for two or more hours.

Best quick solution! Just do not define 'TIMEOUT' in "common/conf.h" --
this is by far the easiest and most efficient solution. It is also
catered for in the configuration options, requires no modifications, and
removes only a very limited use facility, analogous to the c-shell
'autologin'.

sdk> 2. Put in an explicit test (before the select call) to see if there is
sdk> something still in the buffer and don't make the select call if there
sdk> is. [ ... ]

As you say, this is very unpalatable. It requires perforating
abstraction layers, or rewriting parts of stdio functionality, and also
it is very unportable.

sdk> 3. Give up on using select and use an alarm instead.

sdk> I believe the 3rd option makes the most sense given that we basically
sdk> are trying to implement something like a watchdog timer.

Yes, I was going to implement it like that, but there is a comment
that SIGPIPE is explicitly ignored because in any case we get to know
about a severed connection when trying to read. I have also noticed
that EINTR is ignored when reading from the socket.

This leads me to think: we may want to hang up either because the
connection has been dropped, or because the client has been inactive,
while keeping the connection alive. I think it is not appropriate to
implement the later option ('TIMEOUT'); if the client has been inactive,
without dropping the connection, that means that its machine is still up
and it is still "logged in". If the client has to be auto-logged out,
let's leave the task to the client's machine.

sdk> Apparently the NNTP author is working on a fix but any of the above
sdk> could be implemented without too much difficulty and without costing
sdk> all newsreaders with a setbuf(stdin, NULL);.

Well, I have just disabled TIMEOUT. I think that is the best all around
solution. Granted, if you want to implement auto-logout on a live but
inactive socket in the server, alarm(3) is probably the best way.
--
Piercarlo "Peter" Grandi           | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

leres@ace.ee.lbl.gov (Craig Leres) (09/22/90)

The TIMEOUT code in nntpd used to use ALRM. But the ALONE code also
uses ALRM so you couldn't use TIMEOUT and ALONE. Since I wanted to do
this, I rewrote serve() to use select() and submitted the new code to
Stan.

Another reason for the rewrite is that there are other things I want to
run off a timer. The version of nntpd I'm currently running has the
following timers:

    - The standard idle timer; close the connection and exit after
    TIMEOUT seconds of idle time. The reason I do this is to release
    resources that are not in use.

    - A cnews batch check timer; launch a partial batch file after
    BATCHCHECK seconds of idle time. This works really well when
    you're being fed by a nntplink site.

    - A /etc/nologin check; look for /etc/nologin every LOGINCHECK
    seconds and shutdown the connection if its found. This at least
    gives news readers the option of gracefully handling a shutdown of
    the news server.

These timers are run from a generic interface. It's easy to add more
and there aren't any restrictions on which combinations you can use.

Anyway, back to the problem; if fgets() reads two lines from the remote
side, fgets() returns the first line, buffers the second one and we
deadlock at the next call to select().

The solution I like the best (and the one I'm currently testing) is to
look and see if there are characters in the stdio buffer:

    #define BUFFERED_DATA(f) ((f)->_cnt > 0)

This probably isn't 100% portable but I believe that it'll work on most
systems that have select(). And I'm sure we can come up with something
equivalent for those other systems.

		Craig