[comp.ai] Document Recognition - Information needed

audit038@spacm1.spac.spc.com (01/22/91)

I'm working on an expert system that classifies financial trust 
account documents based on the clauses found in the documents.

I'm using Neuron Data's Nexpert and Think C on a Macintosh.

The character recognition is no problem, there are several 
commercial packages available for that.

Does anyone know if document recognition has been written about
in any publications or books?  Or has anyone had personal
experience in this area? E-mail replies would be fine.

Thanks in advance.



-- 
John Coffman

ulf@gbg.infolog.se (Ulf Sundin) (01/25/91)

In article <2242.279c2400@spacm1.spac.spc.com> audit038@spacm1.spac.spc.com writes:
   >I'm working on an expert system that classifies financial trust 
   >account documents based on the clauses found in the documents.
   > [...]
   >Does anyone know if document recognition has been written about
   >in any publications or books?  Or has anyone had personal
   >experience in this area? E-mail replies would be fine.
   >
I am also interested in this area. Especially in software suited for
developing such systems.
Therefore, I would appreciate if you summarize responses to this
newsgroup.

   >John Coffman

Ulf Sundin

amit@cs.tamu.edu (Amitabha Mukerjee) (01/28/91)

Some results of a bibliography search with the keyword
document-recognition.  This bibliography is maintained at Texas A&M
University and has about 2000 papers in AI, robotics, and geometric
modeling.  You can anonymous ftp it from csseq.tamu.edu (directory
bib).


amit mukerjee
(amit@cs.tamu.edu)
===========================================================================

Antonacci, F., M. Russo, M.T. Pazienza, P.Velardi; 1989
AI::NATURAL-LANGUAGE DOCUMENT-RECOGNITION	2IBM Italy/RomeUniv./Ancona U.
    A system for text analysis and lexical knowledge acquisition,
    Data and Knowledge Engineering, July 1989, v.4(1):1-20,

Dengel, Andreas; 1989
VISION::AI::DOCUMENT-RECOGNITION RECTANGLE		U.Stuttgart-CS
    Automatic visual classification of documents,
    Proceedings of Intl Workshop on Industrial Applications of Machine
	Intelligence and Vision (MIV-89), Tokyo, Japan, April 1988, p.276-281.
{
{ First, align the document by determining the "dominant screw angle"
{ Next, divide up the document into block segments (rectangles).  These
{ are then analyzed using a rule-based system.	Results show the system
{ to be extremely robust for the class of business letters.  -AM 7/89
{ ****	 Possible project for implementation with the spatial relations
{	 algebra.

Ejiri, Masakazu; 1988
IMAGE-PROC::DOCUMENT-RECOGNITION MAP INSPECTION SPATIAL-REASONING RECTANGLE	Hitachi CRL,Tokyo
    Knowledge-based approaches to practical image processing,
    Proceedings of Intl Workshop on Industrial Applications of Machine
	Intelligence and Vision (MIV-89), Tokyo, Japan, April 1988, p.1-8.
{
{ Divide the document surface into different rectangular regions (title
{ area, author-name area etc.) using own language FDL (Form Definition
{ Language).  Now use this model as input to the vision system - was
{ used to set up system for Japanese birth document.  Also some
{ examples of tying maps to views from map locations etc.

Govindraju, Venu, Stephen W. Lam, Debashish Niyogi, David B. Sher, Rohini Srihari, Sargur N. Srihari, and Dacheng Wang; 1989
KBS::VISION DOCUMENT-RECOGNITION NATURAL-LANGUAGE SPATIAL	SUNY-Buffalo
    Newspaper image understanding,
    Knowledge Based Computer Systems, Narosa Publishing House, Bombay, India,
	Proceedings of the KBCS '89 conference, Bombay, December 1989,
	p.375-384.
{
{	Very powerful paper.  First, a block segmentation of the newspaper
{	to determine what part of the paper corresponds to what - news,
{	photo, title, dateline, etc.  All are _rectangular blocks_, and
{	this analysis is done without reading any of the contents in the
{	block - based on the characteristics of the document itself.  Next,
{	within the appropriate blocks, the characters are recognized using
{	a set of features, such as the strokes, a concavity, a hole, etc.
{
{	The most interesting part is the caption-based picture
{	understanding.	Based on a machine parsing of the figure caption
{	and a block segmentation of the image itself, the program labels
{	the portions of the image corresponding to interesting objects.
{	For example, faces are recognized by characteristics of the frontal
{	shape - downwardly converging lines, etc.  Sample outputs display
{	the face portions of two persons in an image with a caption-
{	"Wearing their new Celtics sunglasses are Joseph Crowley, standing
{	with the pennant, and seated from the left, Paul Cotter, John Webb
{	and David Buck."  This work reported in "Extracting visual
{	information from text: using caption to label human faces in
{	newspaper photographs", in CVPR '89.  The reference list points to
{	a bunch of earlier stufdf from Srihari's group.	 - AM 2/90

Kasturi, Rangachar; Sing T. Bow; Wassim El-Masri; Jayesh Shah; James R. Gattiker; and Umesh B. Mokate; 1990
VISION::RECTANGLE DOCUMENT-RECOGNITION OCR SHAPE 2D SPATIAL-RELATIONS CURVED 		PennStateU/++
    A system for interpretation of line drawings,
    IEEE PAMI, v.12(10):978-992
{ 
{ "An automatic graphics recognition system which can generate a
{ succinct description of various graphical objects and their spatial
{ relationships has many applications."  The premise is that artificial
{ images, made up of blocks, text, and geometrical shapes, can be
{ analyzed automatically and symbolic descriptors generated. The first
{ step is to create smallest enclosing rectangules covering intensity
{ changes.  Aspect ratios of rectangles are used to identify text vs
{ graphics areas, but this is a blurred area, so histograms do not work
{ very well (**** FUZZY).
{ 
{ "Collinear component grouping" is performed next (**** tangency and
{ alignment) in the Hough transform domain with multi-scale resolution.
{ A significant part of the effort is in determining which parts of the
{ image are text, and which parts not, with the eventual objective of
{ removing all text portions from the image, leaving only the line
{ drawings. Gradually various parts of the image are removed using
{ "known shape" models such as trapezoid (model based on vertex P, L1,
{ L2, H, theta1, theta2), quasi-hexagon etc.
{ 
{ Also does flowchart analysis.  - AM 12/90

Koons, David B.; 1988
VISION::AI::HYPERMEDIA::DOCUMENT-RECOGNITION SPATIAL-REASONING	    TAMU-CS
    A model for the representation and extraction of visual knowledge from
	illustrated texts,
    Master's thesis, also Technical report TAMU-88-010, Computer Science
	Dept, TAMU, August 1988, 99 pages.
{
{ Relating illustrative diagrams to text portions referring to the
{ diagram; based on a neuroanatomy text with diagrams and text on
{ facing pages.	 Constructs a dictionary for natural language phrases
{ such as "emerges from", "above", "attaches to"; uses these together
{ with partial models of the objects to construct predicate logic
{ representations; at this stage the figure-analysis was mostly
{ manual.  A powerful concept, but one whose time is surely coming.
{ Can apply some of the ideas from [Mukerjee & Joe 89].	 -AM 7/89

Srihari, Sargur N.; 1986
VISION::DOCUMENT-RECOGNITION			SUNY Buffalo-CS
    Document image understanding,
    FJCC 1986, p.87-96.

Srihari, Sargur N.; Ching-Huei Wang; Paul W. Palumbo; and Jonathan J. Hull; 1987
AI::VISION::DOCUMENT-RECOGNITION SHAPE RECTANGLE 		SUNY-Buff
    Recognizing address blocks on mail piece: Specialized tools and
    	problem-solving architecture,
    AI Magazine, v.8(4):25-40, Winter 1987.
{ 
{ Divides up the initial image into 3x3 grid, and identifies the address
{ block area based on a set of five heuristics, which are attenuated
{ through segmentation and thresholding.  Some of the rules relate to
{ interpreting block types.  e.g.
{ 
{ Rule MSEGR1:
{     If block A's aspect ration. length, and height and if the number of
{     lines in the block are within the acceptable range for
{     machine-generated address labels, then increase evidence fraction
{     that this is a machine generated destination address label (by .4 for
{     destination address, .3 for return address, and .2 for advertising
{     text).
{ 
{ Precursor to the much more thorough [Wang and Srihari 89].  - AM 12/90

Wang, Dacheng; and Sargur N. Srihari; 1989
AI::VISION::IMAGE-PROC DOCUMENT-RECOGNITION TEXTURE FILTER RECTANGLE 	SUNY-Buf
    Classification of newspaper image blocks using texture analysis,
    Computer Vision Graphics, and Image Processing, v.47:327-352, 1989.

Yashiro, Hiroshi, Tatsuya Murakami, Yoshihiro Shima, Yashiki Nakano, and Hiromichi Fujisawa; 1989
VISION::AI::DOCUMENT-RECOGNITION RECTANGLE		Hitachi-CRL,Tokyo
    A new method for document structure extraction using generic layout
	knowledge,
    Proceedings of Intl Workshop on Industrial Applications of Machine
	Intelligence and Vision (MIV-89), Tokyo, Japan, April 1988, p.282-287.
{
{ Uses the Form Definition language as in [Ejiri 89] to define document
{ structures.