[news.software.nn] looking for fulltext indexing software for archived news

martelli@cadlab.sublink.ORG (Alex Martelli) (04/24/91)

We have recently started archiving news, as they expire, onto
cheap, large, removable media (rewritable opto-magnetical disks).

1) To enhance usefulness of such an archive, I would like to build
some sort of keyword index for it - I was thinking of a fulltext
index, since I have space to spare on the media, maybe with a
second similar index based on just subject/keywords fields...
Surely somebody has already met a similar problem?  I would
appreciate any hints/suggestions/indications on where to find
software to handle this, also any design suggestions if I have
to build my own.

2) We normally use nn 6.4.12 for newsreading; what would be the most
clever way to use nn to read archived news, particularly to read
a set of apparently 'unrelated' articles found as a result of
some search as above?  Since the archived news are going to get
VERY, VERY large (many hundreds of megabyte per removable disk),
and the disks in questions are not very fast for random seeking
and writing, would it make sense to just cat all articles found
into a temporary folder, and nn that?  Should a custom nnmaster
be occasionally run on the expired-news tree, or would such a
large database exceed its design parameters and cause intolerably
slow performance?  The nnmaster itself could of course run at
night, it's more the reader I'm wondering about.
Again, any suggestions gratefully accepted.

Please feel free to use either these groups, for the separately
appropriate parts of the discussion, or e-mail to me, and I will
of course summarize the latter to the appropriate groups if
requested.  Thanks in advance.
-- 
Alex Martelli - CAD.LAB s.p.a., v. Stalingrado 53, Bologna, Italia
Email: (work:) martelli@cadlab.sublink.org, (home:) alex@am.sublink.org
Phone: (work:) ++39 (51) 371099, (home:) ++39 (51) 250434; 
Fax: ++39 (51) 366964 (work only), Fidonet: 332/401.3 (home only).