[comp.graphics] SUMMARY: Electronic document archiving

saja@ujocs.joensuu.fi (Jorma Sajaniemi) (10/09/90)

I asked in comp.graphics about electronic document archiving systems
that enable one to store documents as scanned images and
to retrieve and display them on the screen. The main assumptions
were that the documents will be, e.g., pictures and hand-written texts,
and that the number of documents will be   v e r y   large.

Here is a summary of the answers I got. Thanks for everybody that replied.

Jorma Sajaniemi
University of Joensuu, Finland
Department of Computer Science
saja@ujocs.joensuu.fi

======================================================================

I know Kodak Australia sells such a system.

If you are interested I could get you some contact names at Kodak.

Hiren Patel                       Phone ISD: +61 3 587 1444
Design Engineer                         Fax: +61 3 580 5581
Labtam Information Systems P/L         Telex: LABTAM AA33550
43 Malcolm Road                     Internet: hiren@labtam.oz.au
Braeside                        ACSNET/CSNET: hiren@labtam.oz
Victoria 3195                           ARPA: hiren%labtam.oz@uunet.uu.net
Australia                              JANET: labtam.oz!hiren@ukc
                                        UUCP: ...!uunet!munnari!labtam.oz!hiren

======================================================================

        Sorry if this sound like a product plug,  however this is an area that
Intergraph has spent alot of time developing products in,  we currently
sell hardware configurations based round servers,workstations,scanners and
large optical disk jukeboxes.  The software to manage large amounts of data
is provided by NFM (Network File Manager) and DMANDS (Document Management
And Distribution System).  Obviously the UK office cannot provide you with
sales information,  however here is the phone number of the Finnish office

        804-554744
  Nik Simpson  UUCP :  uunet!ingr!swindon!st_nik!nik                        
  Senior Systems Engineer.     Intergraph UK Ltd.                           

======================================================================

I spent a long time "pre-mastering" CD ROMS.  Typical issues were crammed
with the capacity of 650 Megabytes.  Text was all shipped to the orient
for formatted input.  Images were scanned with a monster scanner (something
like 15 pages per minute of 2 bit graphics), and then stored on an
"optical juke box".  This beast stored 2 terrabytes on 12 inch optical
platters, and was controlled by its own node of a VAX cluster.
This seems like a significant volume.

During a consultation with Diner's Club, I got to know their optical system
which is a somewhat smaller version (to store the thousands of charge
tickets that flow in.)

Both these systems are available commercially, but the reality is that a
fast system can be assembled off the shelf.  Before beginning your project
make certain that you have very realistic ideas about growth and acceptable
speed of response.  Both systems suffered mightily, and were upgraded in
million dollar increments regularly (always a bit behind the actual need,
however.)

Please feel free to contact me about such systems.

Mark Richard-Fogg, principal designer
Fogg Design & Manufacturing Groups
Woodside House
1644 Emerson Street
Denver, CO  80218
(303) 839-9296 fax

======================================================================

I don't know about unix-based systems but for Macintosh you could try
 Micro Dynamics (301) 589-6300 in Maryland (they are working on Sun versions)
and for PC
 ViewStar (415) 841-8565

Both of these systems use the Sony WORM JukeBox to hold 50*6.4 GigaBytes,
and provide for keyword searching, OCR conversion, as well as storage of the
scanned images.

Gary White
gwhite@inetg1.arco.com

======================================================================

The company I work for builds and sells a product that does something
very similar to what you describe.  We can store the images in an
optical disk jukebox, with caching to ordinary magnetic hard disks.
The image and indexing server system is Unix based.

What other information could I provide you?

   ...Chris Johnson          chris@c2s.mn.org   ..uunet!bungia!com50!chris
 Com Squared Systems, Inc.   St. Paul, MN USA   +1 612 452 9522

======================================================================

Please contact Bill Turner at wrt@cornellc.cit.cornell.edu
The library system at Cornell is experimenting with digtal
preservation of deteriorating books.  Xerox is providing
equipment to digitize 1000 books.  They are going to keep
the pages as images, rather than turning them into text via
optical character recognition.

Mike Oltz
MYK@cornella.bitnet

======================================================================

I just talked to somebody here at Stanford about this very subject.
Talk to Andy Cargile at the Imaging Project in the Data Center.  His
phone number is (415) 725-0613, and email is gq.ajc@forsythe.stanford.edu.
The Imaging Project is doing an evaluation of system that is just
going commercial, produced by Image Business Ssytems.  Currently, the
IBS system is IBM-PC-based, using 3 main system elements: a server to
hold images in compressed form (an IBM RT), a scan station (a PC
hooked up to a scanner), and a print/FAX station (also a PC, hooked up
to an HP ink jet).
        From what I understand, the scan station has a software
implementation of three compressing algorithims, CCITT Group 3 and
Group 4, and an IBM algorithm, MMR.  The CCITT algorithims are what
FAX machines use, hence the capability of the print station to send
FAXes.  According to Andy, the CCITT algorithms can reach best-case
compression of a 1Meg image (one-bit) into 50-100K.  (That must
explain why FAXes can work so fast, even with a 9600 baud modem.)  The
server stores the images in compressed form, and a print station
equipped with the proper decompression algorithm can view and print
the image.
        The system currently runs under Microsoft Windows 2.86.  In
addition, an image information database is under construction on
Forsythe, using the Spires/PRISM database.  Keywords and information
about the images will be stored, with links to the images themselves
on the server.  Also according to Gary, a Mac front-end is on the
horizon.  I hope this helped!

Jesse Ellenbogen
elbow@jessica.stanford.edu

======================================================================

My company has a product which sounds  e x a c t l y  like what you
want.  Our distributor in Scandinavia is

        Capture Technology AB
        Box 81017
        Hammarbyvagen 27 B
        104 81 Stockholm
        Sweden

        Telephone 08-702 96 50
        Fax       08-41 11 21

Contact name is Hans-Gosta Ricknell (sales director)

I have passed your name to him.

Graham Underwood
graham@advent.co.uk

======================================================================

Jorma, we don't have any experience but we are moving in this
direction.  We produce about 30 million pages of material a year and
this has become an incredible hassle when it comes to storage etc.
Plus our design and manipulations costs are outrageous.  Thus, we are
attempting to move our entire design/development/storage and production
systems into the electronic age.  We now use Sun platforms to do the
visual design and we have just been told that we'll be getting two
Xerox 5090 reprographic systems that allow us to take the Sun output
and get it all the way to the negative or plate stage and then these
machines will also do the printing.  From what I understand of your
problem, you want to get the input stage cleaned up better.

We don't have anything at the moment but we are trying to design and
implement a system that would permit multimedia inpt capture and
storage so that a variety of people can manipulate the information as
and when they want to .

Kevin "auric" Crocker Athabasca University
UUCP: ...!{alberta,ncc}!atha!kevinc
Inet: kevinc@cs.AthabascaU.CA

======================================================================