marina@ai.toronto.edu (Marina Haloulos) (09/16/89)
FLASH ANNOUNCEMENT
(GB = Gailbraith Building, 35 St. George Street)
-------------------------------------------------------------
SYSTEMS SEMINAR
GB244, at 2:00 p.m., Tuesday 19 September 1989
Dr. Yasushi Nishimura
Artificial Intelligence Department, ATR Communication Systems Research Lab., Japan
Document Image Analysis
We propose a document image analysis method that can extract the
logical structure of scanned paper documents to obtain indices such as
titles, author names, etc. The aim of this research is to develop a
building block for the next-generation page readers, which will be able to
capture not only the character codes, but also the layout and
logical structure of documents.
A major problem in document image analysis is segmenting document
images to extract the components (indices). We implement the
segmentation process as the top-down, model- driven matching process
of the model and the input image. For this purpose, we introduce a tree
structure model to represent the layout of each document type (such as
title pages of IEEE Trans. papers). The model also describes elements
of the page such as the body and running heads. The body is further
divided into the text and other components.
Using the model in a top-down fashion, first the running heads and
running foots are extracted. Next, the components other than the text are
extracted. The model describes these components in the way they are
segmented, thus the segmentation result of the input image matches that of
the model. The text, the part where uniformity is frequently degraded,
is extracted as the remaining part of the page.
We also introduce a model building process which can be operated by a
novice user. In an experiment using 115 input documents from 38 types of
scientific paper title pages, every index in 85.2 of the input
documents was correctly extracted. In a comparison experiment using a
bottom-up segmentation process, the correct extraction rate was only 9.1.
Applications of the method include document entry without re-keying
for electronic publishing systems, document retrieval and automatic
indexing of scanned documents for document database systems.