stevo@elroy.jpl.nasa.gov (Steve Groom) (08/03/90)
A little over a year ago there was some talk here about ways to improve nntpxmit, and in particular, reducing the disk I/O spent in searching articles for message-id's, which is all for nothing if the article is subsequently rejected by the remote. We worked on it then, but just recently got things cleaned up and suitable for distribution. Better late than never? Attached below is an unoffical patch to nntpxmit (1.5.9) to improve the efficiency of nntpxmit, especially for feeds which have high article rejection rates. This patch has been submitted for possible inclusion in NNTP 1.5.10, but in the meantime, Stan Barber has OK'd an unofficial distribution of it. This patch allows nntpxmit to use message-id's found in the article batch file (such as generated by C news' "n" sys file flag) instead of opening each article to extract the message-id. This approach greatly reduces the disk I/O required to run through a batch of mostly-rejected articles, as the article is never opened unless it is to be sent (or if there is no message-id found in the batch file). This can eliminate almost all of the disk I/O for rejected articles, which can sometimes account for as much as 90% or more of a large feed. There is also a small change to allow systems which support memory mapping of files via mmap() (e.g. SunOS >=4.0) to help reduce the system overhead of actually reading and transmitting an article. By mapping the entire article into memory, multiple small disk reads are unnecessary and avoided. Also, when using normal read()'s (even buffered ones), the kernel must copy data from its internal buffers out to the user's address space. When using mmap(), this extra memory movement is avoided. These mods were originally written over a year ago to help salvage what was left of an aging Sun 2/170's performance. They did a pretty good job of that by decreasing by quite a bit the amount of disk I/O done. Since then, the machine has been replaced by a 4/370, and the patched nntpxmit continues to function well. I finally got around to cleaning things up and organized in the form of a patch after I upgraded to 1.5.9. I'd appreciate hearing about any comments or bugs. -steve stevo@elroy.jpl.nasa.gov p.s. yes, I know about nntplink, but we haven't gotten into that - yet. ========================================================================= # This is a shell archive. Remove anything before this line, # then unpack it by saving it in a file and typing "sh file". # # Wrapped by elroy!stevo on Thu Aug 2 13:18:47 PDT 1990 # Contents: README.xmit nntpxmit.patch echo x - README.xmit sed 's/^@//' > "README.xmit" <<'@//E*O*F README.xmit//' These patches allow nntpxmit to use batch files containing both the filename and message-id for each article, and avoid opening the article at all unless it's actually going to be sent. The only difference in functionality is that if the batch file line for an article contains a message-id after the filename, nntpxmit will use that message-id instead of opening the article to find it. Additionally, we have added the ability to take advantage of the SunOS 4.x mmap() system call to map article files into memory, which increases the efficiency of article reads by avoiding doing multiple small reads. The sendfile() routine, which does (buffered) reads and writes of one character at a time, now maps the whole article into memory, and writes the article by walking through memory instead of getting individual characters. I'm not sure what other systems might support mmap(), but the changes are small and #ifdef'd. These were originally written to salvage what was left of a Sun 2/170's performance, and did a pretty good job of that by decreasing by quite a bit the amount of disk I/O done. Since then, the machine has been upgraded to a 4/370, and the patched nntpxmit continues to function well. The patches below are current as of NNTP 1.5.9. They have been submitted for possible inclusion in 1.5.10. Apply the patch to the NNTP top level directory. If your system supports mmap()'ing and you want to enable that, uncomment the #define MMAP in common/conf.h. Then cd to xmit and make. Steve Groom stevo@elroy.jpl.nasa.gov 7/16/90 @//E*O*F README.xmit// chmod u=rw,g=r,o=r README.xmit echo x - nntpxmit.patch sed 's/^@//' > "nntpxmit.patch" <<'@//E*O*F nntpxmit.patch//' *** common/conf.h.dist Thu Jul 5 00:11:16 1990 --- common/conf.h Mon Jul 16 17:14:59 1990 *************** *** 172,177 **** --- 172,183 ---- # define PASSFILE "/etc/nntp.sys" #endif AUTH + /* SunOS 4.x supports mmap()'ing of files directly into a process' address + ** space. I have no idea what other systems might support this (SVR4?). + ** If you can use it, it will speed up article reading considerably. + */ + /* #define MMAP /* system supports mmap'ing of files */ + /* * System V compatability */ *** xmit/nntpxmit.h.1.5.9 Sun Jan 14 23:37:52 1990 --- xmit/nntpxmit.h Mon Jul 16 17:19:57 1990 *************** *** 6,11 **** --- 6,16 ---- ** to your liking, look them over carefully. */ + #ifdef MMAP + #include <sys/mman.h> + #include <sys/stat.h> + #endif MMAP + #ifdef SIGRET #undef SIGRET #endif *** xmit/nntpxmit.c.1.5.9 Tue Jul 10 15:14:07 1990 --- xmit/nntpxmit.c Mon Jul 16 13:56:41 1990 *************** *** 3,8 **** --- 3,21 ---- #endif /* nntpxmit - transmit netnews articles across the internet with nntp ** + ******* + ** This version nntpxmit has been modified to handle message-id's in the + ** article batch file after the article file name. If they are present, + ** they are used instead of opening each article file to extract it's + ** Batch files of this format can be generated, for instance, by C news' + ** "n" sys file flag. + ** + ** David Robinson (david@elroy.jpl.nasa.gov) + ** Steve Groom (stevo@elroy.jpl.nasa.gov), June 30 1989. + ** + ** updated to include patches for nntp through 1.5.9, 7/10/90 SLG + ******* + ** ** This program is for transmitting netnews articles between sites ** that offer the NNTP service, internet style. There are two forms ** of article transmission that can be used in this environment, since *************** *** 93,102 **** #include "../common/nntp.h" #include "llist.h" ! #define MAXFNAME BUFSIZ /* maximum filename size - big enough? */ #define FCLOSE(fp) (void) fclose(fp); (fp) = (FILE *)NULL ! FILE *getfp(); char *errmsg(); void requeue(); SIGRET catchsig(); --- 106,115 ---- #include "../common/nntp.h" #include "llist.h" ! #define MAXALINE BUFSIZ /* maximum article line size - big enough? */ #define FCLOSE(fp) (void) fclose(fp); (fp) = (FILE *)NULL ! char *extract_id(); char *errmsg(); void requeue(); SIGRET catchsig(); *************** *** 119,125 **** char *Host; /* current remote host */ char *Qfile; /* current queue file we're operating on */ FILE *Qfp; /* the (FILE *) for above */ ! char Article[MAXFNAME]; /* current article filename */ /* ** Some flags, toggled by arguments --- 132,140 ---- char *Host; /* current remote host */ char *Qfile; /* current queue file we're operating on */ FILE *Qfp; /* the (FILE *) for above */ ! char *Aline; /* current article line */ ! char savedir[1024]; /* directory at startup */ ! int dirlen; /* length of spool directory name */ /* ** Some flags, toggled by arguments *************** *** 253,259 **** } } else { /* one-shot */ - (void) strcpy(Article, Qfile); exit(sendnews(Host, transport, Qfile, isQfile) ? EX_OK : EX_TEMPFAIL); } } --- 268,273 ---- *************** *** 331,336 **** --- 345,351 ---- int transport, isQfile; { register FILE *fp; + char *getart(); #ifdef FTRUNCATE char *mode = "r+"; /* so we can use ftruncate() */ #else *************** *** 360,366 **** ** Open a connection to the remote server */ if (hello(host, transport) == FAIL) { ! FCLOSE(Qfp); return(FALSE); } --- 375,382 ---- ** Open a connection to the remote server */ if (hello(host, transport) == FAIL) { ! if (isQfile) ! FCLOSE(Qfp); return(FALSE); } *************** *** 367,374 **** if (isQfile) { /* ** We're sending a batch queue: ! ** open article ! ** get message-ID ** send "IHAVE <message-ID>" to remote ** read their reply ** send article if appropriate --- 383,390 ---- if (isQfile) { /* ** We're sending a batch queue: ! ** get message-ID from batch line if present, ! ** open article to get it if not ** send "IHAVE <message-ID>" to remote ** read their reply ** send article if appropriate *************** *** 376,392 **** */ catchsig(interrupted); ! while((fp = getfp(Qfp, Article, sizeof(Article))) != (FILE *)NULL) { ! if (!sendarticle(host, fp)) { ! (void) fclose(fp); ! requeue(Article); ! Article[0] = '\0'; cleanup(); goodbye(DONT_WAIT); restsig(); return(TRUE); } - (void) fclose(fp); } cleanup(); --- 392,421 ---- */ catchsig(interrupted); ! /* change to spool directory where news articles are normally ! * kept. This allows file names in the batch queue to be ! * relative to the spool directory and shortens the directory ! * search path. ! */ ! if(getwd(savedir) == -1){ ! fprintf(stderr,"%s: getwd: %s\n", Pname, errmsg(errno)); ! exit(EX_OSFILE); ! } ! if(chdir(SPOOLDIR) == -1){ ! fprintf(stderr,"%s: chdir: %s\n", Pname, errmsg(errno)); ! exit(EX_OSFILE); ! } ! dirlen = strlen(SPOOLDIR); ! ! while((Aline = getart(Qfp)) != (char *) NULL) { ! if (!sendarticle(host, Aline)) { ! requeue(Aline); ! Aline[0] = '\0'; cleanup(); goodbye(DONT_WAIT); restsig(); return(TRUE); } } cleanup(); *************** *** 395,413 **** return(TRUE); } else { /* ! ** Qfp is a netnews article - this is a one-shot ** operation, exit code dependent upon remote's ** acceptance of the article */ register int retcode; ! retcode = sendarticle(host, Qfp); ! FCLOSE(Qfp); goodbye(retcode ? WAIT : DONT_WAIT); return(retcode && Stats.accepted == 1 && Stats.failed == 0); } } /* ** Perform one transfer operation: ** Give IHAVE command --- 424,459 ---- return(TRUE); } else { /* ! ** "file" is a netnews article - this is a one-shot ** operation, exit code dependent upon remote's ** acceptance of the article */ register int retcode; ! retcode = sendarticle(host, file); goodbye(retcode ? WAIT : DONT_WAIT); return(retcode && Stats.accepted == 1 && Stats.failed == 0); } } + /* extract filename from an article line */ + getfilename(aline,filename) + register char *aline,*filename; + { + register char *ap,*fp; + + ap = aline; + fp = filename; + while((*ap != '\0') && (isspace(*ap))) /* skip whitespace */ + ap++; + if((strncmp(ap,SPOOLDIR,dirlen) == 0) && (ap[dirlen] == '/')){ + ap = ap+dirlen+1; + } + while((*ap != '\0') && (!isspace(*ap))) /* copy filename */ + *fp++ = *ap++; + *fp = '\0'; /* terminate it */ + } + /* ** Perform one transfer operation: ** Give IHAVE command *************** *** 417,441 **** ** Watch all network I/O for errors, return FALSE if ** the connection fails and we have to cleanup. */ ! sendarticle(host, fp) char *host; ! FILE *fp; { register int code; char buf[BUFSIZ]; char *e_xfer = "%s xfer: %s"; ! switch(code = ihave(fp)) { case CONT_XFER: /* ** They want it. Give it to 'em. */ if (!sendfile(fp)) { sprintf(buf, e_xfer, host, errmsg(errno)); log(L_NOTICE, buf); Stats.failed++; return(FALSE); } /* ** Did the article transfer OK? ** Stay tuned to this same socket to find out! --- 463,567 ---- ** Watch all network I/O for errors, return FALSE if ** the connection fails and we have to cleanup. */ ! sendarticle(host, aline) char *host; ! char *aline; { + register FILE *fp = (FILE *)NULL; register int code; char buf[BUFSIZ]; char *e_xfer = "%s xfer: %s"; + register char *id; + char *getmsgid(); + char filename[BUFSIZ]; + char *mode = "r"; ! /* Figure out if "aline" contains a message id after the filename. ! ** If so, give it to ihave(). If not, call getmsgid() to open the ! ** article file to extract it, then give it to ihave(). ! */ ! id = extract_id(aline); ! if ((id != NULL) && (!msgid_ok(id))) { ! /* bogus msgid on article line, try the file. */ ! sprintf(buf, "%s: bad message-id in batch file!", aline); ! log(L_DEBUG, buf); ! id = NULL; ! } ! if (id == NULL) { ! /* extract filename from article line */ ! getfilename(aline,filename); ! ! /* Try getting the id from the article. Once the file is ! ** open, leave it open for sendfile() to use. ! */ ! if ((fp = fopen(filename, mode)) == (FILE *)NULL) { ! /* can't open article file, skip it and keep going */ ! sprintf(buf, E_fopen, filename, mode, errmsg(errno)); ! log(L_WARNING, buf); ! return(TRUE); ! } ! ! if ((id = getmsgid(fp)) == (char *)NULL || *id == '\0') { ! /* ! ** something botched locally with the article ! ** so we don't send it, but we don't break off ! ** communications with the remote either. ! */ ! sprintf(buf, "%s: message-id missing from article!", ! aline); ! log(L_DEBUG, buf); ! FCLOSE(fp); ! return(TRUE); ! } ! if (!msgid_ok(id)) { ! sprintf(buf, "%s: message-id syntax error: %s", ! aline, id); ! log(L_DEBUG, buf); ! FCLOSE(fp); ! return(TRUE); ! } ! } ! /* fp is open if we had to go to the file for the id, NULL if not */ ! ! switch(code = ihave(id)) { case CONT_XFER: /* ** They want it. Give it to 'em. */ + if(fp == (FILE *) NULL) { + /* extract filename from article line */ + getfilename(aline,filename); + + if ((fp = fopen(filename, mode)) == (FILE *)NULL) { + /* + ** can't open article file, skip it and + ** keep going + */ + sprintf(buf, E_fopen, filename, mode, + errmsg(errno)); + log(L_WARNING, buf); + /* + ** Tell the other end that there really is + ** no article, we were just kidding. Send a + ** null article, the other side will just + ** ignore it. + */ + sprintf(buf,"."); + (void) converse(buf,sizeof(buf)); + Stats.failed++; + return(TRUE); + } + } else { + rewind(fp); /* make sure we start from beginning */ + } if (!sendfile(fp)) { sprintf(buf, e_xfer, host, errmsg(errno)); log(L_NOTICE, buf); Stats.failed++; + FCLOSE(fp); return(FALSE); } + FCLOSE(fp); /* ** Did the article transfer OK? ** Stay tuned to this same socket to find out! *************** *** 455,462 **** return(FALSE); } if (ReQueue_Fails && code != ERR_XFERRJCT) { ! requeue(Article); ! Article[0] = '\0'; } } break; --- 581,588 ---- return(FALSE); } if (ReQueue_Fails && code != ERR_XFERRJCT) { ! requeue(aline); ! aline[0] = '\0'; } } break; *************** *** 477,487 **** log(L_NOTICE, buf); } } else { ! sprintf(buf, "%s improper response to IHAVE: %d while offering %s", host, code, Article); log(L_WARNING, buf); } return(FALSE); } return(TRUE); } --- 603,617 ---- log(L_NOTICE, buf); } } else { ! sprintf(buf, "%s improper response to IHAVE: %d while offering %s", host, code, aline); log(L_WARNING, buf); } + if (fp != (FILE *) NULL) + FCLOSE(fp); return(FALSE); } + if (fp != (FILE *) NULL) + FCLOSE(fp); return(TRUE); } *************** *** 578,583 **** --- 708,738 ---- return((char *)NULL); /* EOF, we failed */ } + char * + extract_id(aline) + char *aline; + { + static char buf[MAXALINE]; + char *id; + + /* Figure out if "aline" contains a message id after the filename. + ** We do this by checking for a string enclosed by < and > + */ + id = index(aline,'<'); /* guess at where msgid starts */ + + if (id == NULL) + return ((char *)NULL); + + (void)strcpy (buf, id); + id = index(buf,'>'); /* guess at where msgid ends */ + + if (id == NULL) + return ((char *)NULL); + + *(++id) = '\0'; + return (buf); + } + #ifdef notdef /* nobody obeys the triply damned protocol anyway! */ /* ** Special characters, see RFC822, appendix D. *************** *** 718,750 **** #endif notdef /* ! ** Read the header of a netnews article, snatch the message-id therefrom, ! ** and ask the remote if they have that one already. */ ! ihave(fp) ! FILE *fp; { register int code; - register char *id; char buf[BUFSIZ]; - if ((id = getmsgid(fp)) == (char *)NULL || *id == '\0') { - /* - ** something botched locally with the article - ** so we don't send it, but we don't break off - ** communications with the remote either. - */ - sprintf(buf, "%s: message-id missing!", Article); - log(L_DEBUG, buf); - return(ERR_GOTIT); - } - - if (!msgid_ok(id)) { - sprintf(buf, "%s: message-id syntax error: %s", Article, id); - log(L_DEBUG, buf); - return(ERR_GOTIT); - } - again: sprintf(buf, "IHAVE %s", id); Stats.offered++; --- 873,887 ---- #endif notdef /* ! ** Using the message id handed to us, ask the remote if they have ! ** that one already. */ ! ihave(id) ! register char *id; { register int code; char buf[BUFSIZ]; again: sprintf(buf, "IHAVE %s", id); Stats.offered++; *************** *** 752,758 **** switch(code = converse(buf, sizeof(buf))) { case CONT_XFER: Stats.accepted++; - rewind(fp); return(code); case ERR_GOTIT: Stats.rejected++; --- 889,894 ---- *************** *** 768,820 **** } /* ! ** Given that fp points to an open file containing filenames, ! ** open and return a file pointer to the next filename in the file. ! ** Don't you love indirection? ! ** ! ** Returns a valid FILE pointer or NULL if end of file. */ ! FILE * ! getfp(fp, filename, fnlen) register FILE *fp; - char *filename; - register int fnlen; { - register FILE *newfp = (FILE *)NULL; register char *cp; ! char *mode = "r"; ! while(newfp == (FILE *)NULL) { ! if (fgets(filename, fnlen, fp) == (char *)NULL) ! return((FILE *)NULL); /* EOF, tell caller */ - filename[fnlen - 1] = '\0'; /* make sure */ - /* if fgets() ever forgets the '\n', we're fucked */ ! if (*(cp = &filename[strlen(filename) - 1]) == '\n') *cp = '\0'; - - if (filename[0] == '\0') - continue; - - if ((newfp = fopen(filename, mode)) == (FILE *)NULL) { - /* - ** The only permissible error is `file non-existant' - ** anything else indicates something is seriously - ** wrong, and we should go away to let the shell - ** script clean up. - */ - if (errno != ENOENT) { - char buf[BUFSIZ]; - - sprintf(buf, E_fopen, filename, mode, errmsg(errno)); - log(L_WARNING, buf); - goodbye(DONT_WAIT); - exit(EX_OSERR); - } - } } ! return(newfp); } /* --- 904,933 ---- } /* ! ** Get the next line out of a file containing article filenames (and ! ** possibly message-id's). ! ** Returns a ptr to the next line, or NULL on EOF. */ ! char * ! getart(fp) register FILE *fp; { register char *cp; ! static char ALINE[BUFSIZ]; ! register int len = sizeof(ALINE); ! ALINE[0] = '\0'; ! while(ALINE[0] == '\0'){ ! if (fgets(ALINE, len, fp) == (char *)NULL){ ! return((char *) NULL); ! } ! ALINE[len - 1] = '\0'; /* make sure */ /* if fgets() ever forgets the '\n', we're fucked */ ! if (*(cp = &ALINE[strlen(ALINE) - 1]) == '\n') *cp = '\0'; } ! return(ALINE); } /* *************** *** 826,831 **** --- 939,957 ---- if (Qfp == (FILE *)NULL || Qfile == (char *)NULL) return; + /* change back to batch directory to re-open the batch file. + ** it won't be necessary to change anywhere after this, as this + ** routine is only called on the way out. + */ + if(chdir(savedir) == -1){ + char buf[BUFSIZ]; + /* well, we tried */ + sprintf(buf, "chdir(\"%s\"): %s", savedir, errmsg(errno)); + log(L_WARNING, buf); + requeue((char *)NULL); /* reset */ + return; + } + if ((ReQueue_Fails && Stats.failed > 0) || !feof(Qfp)) { rewrite(); } else { *************** *** 1018,1024 **** #endif RELSIG sprintf(buf, "%s signal %d", Host, sig); log(L_NOTICE, buf); ! requeue(Article); cleanup(); if (Report_Stats) logstats(); --- 1144,1150 ---- #endif RELSIG sprintf(buf, "%s signal %d", Host, sig); log(L_NOTICE, buf); ! requeue(Aline); cleanup(); if (Report_Stats) logstats(); *** xmit/remote.c.1.5.9 Sun Jan 14 23:53:48 1990 --- xmit/remote.c Mon Jul 16 14:09:31 1990 *************** *** 4,9 **** --- 4,19 ---- /* ** remote communication routines for NNTP/SMTP style communication. ** + ************ + ** This version has been modified to support mmap()'ing of article files + ** on systems that support it. + ** + ** David Robinson (david@elroy.jpl.nasa.gov) and + ** Steve Groom (stevo@elroy.jpl.nasa.gov), June 30, 1989. + ** + ** updated to include patches to nntp through 1.5.9, 7/10/90 SLG + ************ + ** ** sendcmd - return TRUE on error. ** ** readreply - return reply code or FAIL for error; *************** *** 391,411 **** --- 401,451 ---- register FILE *remote = rmt_wr; register int nl = TRUE; /* assume we start on a new line */ + #ifdef MMAP + register char *mbufr,*mptr; + long msize; + long offset; + struct stat sbuf; + char buf[BUFSIZ]; + #endif MMAP + /* ** I'm using putc() instead of fputc(); ** why do a subroutine call when you don't have to? ** Besides, this ought to give the C preprocessor a work-out. */ + #ifdef MMAP + #define PUTC(c) if (putc(c, remote) == EOF) {\ + (void) munmap (mbufr, sbuf.st_size);\ + return(FALSE); } + #else !MMAP #define PUTC(c) if (putc(c, remote) == EOF) return(FALSE) + #endif !MMAP if (fp == (FILE *)NULL) return(FALSE); + #ifdef MMAP + /* map the article into memory */ + (void) fstat(fileno(fp), &sbuf); + mbufr = mmap (0, sbuf.st_size, PROT_READ, MAP_PRIVATE, fileno(fp), 0); + if(mbufr == (char *) -1){ + sprintf(buf, "sendfile: mmap failed: %s", errmsg(errno)); + log(L_NOTICE, buf); + return(FALSE); + } + + mptr = mbufr; /* start of article in memory */ + msize = sbuf.st_size; /* size of article (bytes) */ + while(msize-- > 0) { + c = *mptr++; + #else !MMAP /* ** the second test makes no sense to me, ** but System V apparently needed it... */ while((c = fgetc(fp)) != EOF && !feof(fp)) { + #endif !MMAP switch(c) { case '\n': PUTC('\r'); /* \n -> \r\n */ *************** *** 429,433 **** --- 469,476 ---- PUTC('\r'); PUTC('\n'); } + #ifdef MMAP + (void) munmap (mbufr, sbuf.st_size); + #endif MMAP return( !(sendcmd(".") || ferror(fp)) ); } @//E*O*F nntpxmit.patch// chmod u=rw,g=r,o=r nntpxmit.patch exit 0 Steve Groom, Jet Propulsion Laboratory, Pasadena, CA stevo@elroy.jpl.nasa.gov {ames,usc}!elroy!stevo