[comp.ai] Document Recognition

audit038@spacm1.spac.spc.com (02/04/91)
I recently posted a request for information on Document Recognition; following
is a summary of the responses I received:

Robert Goldman rex.cs.tulane.edu writes:
>I think what you need is a reference book on information retrieval.
>This is NOT my area, but I know that a recent textbook which covers
>the field is "Automatic Text Processing," by gerard salton,
>Addison-Wesley (1989 or 1990).

Marc Ringuette DAISY.LEARNING.CS.CMU.EDU writes:
>The current technology on your problem is a little messy and ill defined.
>Keyword-based systems are all that have been done very thoroughly.
>
>Text skimming is something you might consider.  Dejong's thesis was the
>original work, and Mauldin's is a follow-on.  The idea is to have
>skeletons ("sketchy scripts") of what the document might contain, and
>try to match those skeletons to the text by looking for words that fit
>in certain slots.
>
>G. Dejong, {\em Skimming Stories in Real Time:  An Experiment in Integrated
>Understanding,} Yale Ph.D. Thesis, TR 158, 1979.
>
>M. Mauldin, {\em Information Retrieval by Text Skimming}, CMU Ph.D. Thesis,
>CMU-CS-89-193, 1989.


Sanjiv K. Bhatia  fergvax.unl.edu!sanjiv writes:
>During IJCAI '89 in Detroit, I saw the demonstration of a package called TCS
>(Text Categorization Shell) by Carnegie Group that achieved exactly what you
>are intending.  Another source will be the package TOPIC by Verity, Inc. (an
>offshoot/subsidiary of ADS).
>
>I am curently working on similar stuff but my approach is to create a
>customized environment at a user level using knowledge acquisition approach.  I
>have a couple of papers on the subject and am very close to having an
>implementation in C.  My earlier implementations for testing were in Prolog and
>they were excruciatingly slow.

amit mukerjee  amit@cs.tamu.edu writes:
>Some results of a bibliography search with the keyword
>document-recognition.  This bibliography is maintained at Texas A&M
>University and has about 2000 papers in AI, robotics, and geometric
>modeling.  You can anonymous ftp it from csseq.tamu.edu (directory
>bib).


===========================================================================

Antonacci, F., M. Russo, M.T. Pazienza, P.Velardi; 1989
AI::NATURAL-LANGUAGE DOCUMENT-RECOGNITION	2IBM Italy/RomeUniv./Ancona U.
    A system for text analysis and lexical knowledge acquisition,
    Data and Knowledge Engineering, July 1989, v.4(1):1-20,

Dengel, Andreas; 1989
VISION::AI::DOCUMENT-RECOGNITION RECTANGLE		U.Stuttgart-CS
    Automatic visual classification of documents,
    Proceedings of Intl Workshop on Industrial Applications of Machine
	Intelligence and Vision (MIV-89), Tokyo, Japan, April 1988, p.276-281.
{
{ First, align the document by determining the "dominant screw angle"
{ Next, divide up the document into block segments (rectangles).  These
{ are then analyzed using a rule-based system.	Results show the system
{ to be extremely robust for the class of business letters.  -AM 7/89
{ ****	 Possible project for implementation with the spatial relations
{	 algebra.

Ejiri, Masakazu; 1988
IMAGE-PROC::DOCUMENT-RECOGNITION MAP INSPECTION SPATIAL-REASONING RECTANGLE
	Hitachi CRL,Tokyo
    Knowledge-based approaches to practical image processing,
    Proceedings of Intl Workshop on Industrial Applications of Machine
	Intelligence and Vision (MIV-89), Tokyo, Japan, April 1988, p.1-8.
{
{ Divide the document surface into different rectangular regions (title
{ area, author-name area etc.) using own language FDL (Form Definition
{ Language).  Now use this model as input to the vision system - was
{ used to set up system for Japanese birth document.  Also some
{ examples of tying maps to views from map locations etc.

Govindraju, Venu, Stephen W. Lam, Debashish Niyogi, David B. Sher,
 Rohini Srihari, Sargur N. Srihari, and Dacheng Wang; 1989
KBS::VISION DOCUMENT-RECOGNITION NATURAL-LANGUAGE SPATIAL	SUNY-Buffalo
    Newspaper image understanding,
    Knowledge Based Computer Systems, Narosa Publishing House, Bombay, India,
	Proceedings of the KBCS '89 conference, Bombay, December 1989,
	p.375-384.
{
{	Very powerful paper.  First, a block segmentation of the newspaper
{	to determine what part of the paper corresponds to what - news,
{	photo, title, dateline, etc.  All are _rectangular blocks_, and
{	this analysis is done without reading any of the contents in the
{	block - based on the characteristics of the document itself.  Next,
{	within the appropriate blocks, the characters are recognized using
{	a set of features, such as the strokes, a concavity, a hole, etc.
{
{	The most interesting part is the caption-based picture
{	understanding.	Based on a machine parsing of the figure caption
{	and a block segmentation of the image itself, the program labels
{	the portions of the image corresponding to interesting objects.
{	For example, faces are recognized by characteristics of the frontal
{	shape - downwardly converging lines, etc.  Sample outputs display
{	the face portions of two persons in an image with a caption-
{	"Wearing their new Celtics sunglasses are Joseph Crowley, standing
{	with the pennant, and seated from the left, Paul Cotter, John Webb
{	and David Buck."  This work reported in "Extracting visual
{	information from text: using caption to label human faces in
{	newspaper photographs", in CVPR '89.  The reference list points to
{	a bunch of earlier stufdf from Srihari's group.	 - AM 2/90

Kasturi, Rangachar; Sing T. Bow; Wassim El-Masri; Jayesh Shah;
 James R. Gattiker; and Umesh B. Mokate; 1990
VISION::RECTANGLE DOCUMENT-RECOGNITION OCR SHAPE 2D SPATIAL-RELATIONS CURVED
 		PennStateU/++
    A system for interpretation of line drawings,
    IEEE PAMI, v.12(10):978-992
{ 
{ "An automatic graphics recognition system which can generate a
{ succinct description of various graphical objects and their spatial
{ relationships has many applications."  The premise is that artificial
{ images, made up of blocks, text, and geometrical shapes, can be
{ analyzed automatically and symbolic descriptors generated. The first
{ step is to create smallest enclosing rectangules covering intensity
{ changes.  Aspect ratios of rectangles are used to identify text vs
{ graphics areas, but this is a blurred area, so histograms do not work
{ very well (**** FUZZY).
{ 
{ "Collinear component grouping" is performed next (**** tangency and
{ alignment) in the Hough transform domain with multi-scale resolution.
{ A significant part of the effort is in determining which parts of the
{ image are text, and which parts not, with the eventual objective of
{ removing all text portions from the image, leaving only the line
{ drawings. Gradually various parts of the image are removed using
{ "known shape" models such as trapezoid (model based on vertex P, L1,
{ L2, H, theta1, theta2), quasi-hexagon etc.
{ 
{ Also does flowchart analysis.  - AM 12/90

Koons, David B.; 1988
VISION::AI::HYPERMEDIA::DOCUMENT-RECOGNITION SPATIAL-REASONING	    TAMU-CS
    A model for the representation and extraction of visual knowledge from
	illustrated texts,
    Master's thesis, also Technical report TAMU-88-010, Computer Science
	Dept, TAMU, August 1988, 99 pages.
{
{ Relating illustrative diagrams to text portions referring to the
{ diagram; based on a neuroanatomy text with diagrams and text on
{ facing pages.	 Constructs a dictionary for natural language phrases
{ such as "emerges from", "above", "attaches to"; uses these together
{ with partial models of the objects to construct predicate logic
{ representations; at this stage the figure-analysis was mostly
{ manual.  A powerful concept, but one whose time is surely coming.
{ Can apply some of the ideas from [Mukerjee & Joe 89].	 -AM 7/89

Srihari, Sargur N.; 1986
VISION::DOCUMENT-RECOGNITION			SUNY Buffalo-CS
    Document image understanding,
    FJCC 1986, p.87-96.

Srihari, Sargur N.; Ching-Huei Wang; Paul W. Palumbo; and Jonathan J. Hull; 1987
AI::VISION::DOCUMENT-RECOGNITION SHAPE RECTANGLE 		SUNY-Buff
    Recognizing address blocks on mail piece: Specialized tools and
    	problem-solving architecture,
    AI Magazine, v.8(4):25-40, Winter 1987.
{ 
{ Divides up the initial image into 3x3 grid, and identifies the address
{ block area based on a set of five heuristics, which are attenuated
{ through segmentation and thresholding.  Some of the rules relate to
{ interpreting block types.  e.g.
{ 
{ Rule MSEGR1:
{     If block A's aspect ration. length, and height and if the number of
{     lines in the block are within the acceptable range for
{     machine-generated address labels, then increase evidence fraction
{     that this is a machine generated destination address label (by .4 for
{     destination address, .3 for return address, and .2 for advertising
{     text).
{ 
{ Precursor to the much more thorough [Wang and Srihari 89].  - AM 12/90

Wang, Dacheng; and Sargur N. Srihari; 1989
AI::VISION::IMAGE-PROC DOCUMENT-RECOGNITION TEXTURE FILTER RECTANGLE 
	SUNY-Buf
    Classification of newspaper image blocks using texture analysis,
    Computer Vision Graphics, and Image Processing, v.47:327-352, 1989.

Yashiro, Hiroshi, Tatsuya Murakami, Yoshihiro Shima, Yashiki Nakano,
 and Hiromichi Fujisawa; 1989
VISION::AI::DOCUMENT-RECOGNITION RECTANGLE		Hitachi-CRL,Tokyo
    A new method for document structure extraction using generic layout
	knowledge,
    Proceedings of Intl Workshop on Industrial Applcations of Machine
	Intelligence and Vision (MIV-89), Tokyo, Japan, April 1988, p.282-287.
{
{ Uses the Form Definition language as in [Ejiri 89] to define document
{ structures.
==========================================================================

Peter Bell computer-science.manchester.ac.uk!cpb writes:

>J. Kreich, A Luhn, and G. Maderlechner
>Knowledge-based Interpretation of Scanned Business Letters
>IAPR Workship on CV - Special Hardware and Industrial Applications,
>Oct 12-14, 1988, Tokyo
>pages 417-420.


Many of you expressed an interest in this subject, so I would like to explain
further what I'm attempting to do.  Document Recognition may not be the 
appropriate name for this process.  The system will classify Trust Agreements
using key phrases.

As an example, by analyzing 10 randomly selected Testamentary Trust Agreements,
with an expert trust auditor, we discovered the key phrases that make that
document type unique.  If we find "decedent's will" and either "as trustee" or
"in trust," the document is a testamentary trust agreement.  If "decedent's
will" is phrase type "will" and "as trustee" and "in trust" are phrase type
"trust", I can create the following rule:

	If will and trust are found then document is testamentary

Other rules will determine if "will" and "trust" exist by searching for their
synonyms (i.e. "as trustee" or "in trust" ) in the document.

This works because boilerplate text is used to create the trust agreements.

Thanks for your help!  
-- 
John Coffman