[comp.text] OCR as a way to measure printer quality

npn@cbnewsl.att.com (nils-peter.nelson) (03/23/91)

At the Seybold Seminar, Adobe had an exhibit that included
several new printers with their RIP. One was especially
impressive-- the HP Series IIISi. Apparently, HP has a
proprietary board that does spot size and place adjustment.
The result  is a 300dpi printer that yields quality more
like 400 dpi.  I picked up a sample "Stock Report" page.
I also picked up a sample of the new Kodak 7016PS output,
which looked more like conventional 300 dpi. How could one
quantify the differences?
I ran both samples through an OCR board (this one from
Calera) and diff'ed the files.  Embarrassingly, the text was
*not* identical; apparently, Adobe had two different people
key in identical text, and they made errors. In addition,
one used FrameMaker and one PageMaker, so the hyphenation
was different.
I went through by hand and found:
	6 errors in 1728 characters on HP
	10 errors in 1716 characters on Kodak
(the errors are all in the OCR, *not* in the printer).

As a free plug to HP, whom we have never forgiven for
foisting PCL on the world, the IIISi, at 15ppm for
under $6,000 and outstanding quality, is absolutely
worth considering. If Adobe could supply a *standard*
page, we could automate the quality check.

jaap@mtxinu.COM (Jaap Akkerhuis) (03/26/91)

In article <1991Mar22.224813.7086@cbnewsl.att.com> npn@cbnewsl.att.com (nils-peter.nelson) writes:
 > How he used an OCR-reader and did an experiment which worked better on
 > a new HP printer then a new Kodak printer.

Hi Peter, I think your test was amusing, although hardly conclusive,
unless the only way you read text is via an OCR-reader.

One of the problems comparing printers this way, is that the toners
from various printers might have a different reflective spectrum,
which can influence the way machines read it much quicker then
humans. I've seen ordinary copiers turning output from one printer
useless after copying, due to the toner/exposure mismatch of the
printer.

An interesting experiment is to take output from various brands of
copiers (at least, if they really use different engines, something
which is not always the case), and feed it trough the competitors
version. A good time is garanteed for all!


	jaap

lee@sq.sq.com (Liam R. E. Quin) (03/27/91)

npn@cbnewsl.att.com (nils-peter.nelson) wrote:
> I went through by hand and found:
>	6 errors in 1728 characters on HP
>	10 errors in 1716 characters on Kodak
> (the errors are all in the OCR, *not* in the printer).
>
>As a free plug to HP, [...] the IIISi, at 15ppm for
>under $6,000 and outstanding quality, is absolutely
>worth considering.


Although I don't (yet) have an opinion on the HPIIISi (apart from disliking
the name!), I do wonder whether it is entirely reasonable to equate quality
with OCR recognition.

Typographical features such as the use of ligatures and kerning often make
life harder for an OCR program but easier for the human eye, to give a
simple example.

Since the documents in question were not produced by the same software,
it seems a little unfair to Kodak.  If this metric were universally
established, the best printers would be those that used the OCR fonts that
one sees on cheques, followed closely by monspaced fonts like Courier.

The error rates quoted seem slightly on the high side to me, but I am not
up to date with OCR software, and perhaps the text was in small sizes.

The people for whom this metric _would_ be useful are those who accomplish
file transfer by printing out a document and then scanning it in on another
system... and yes, I have seen this done!    :-(

Lee

-- 
Liam R. E. Quin,  lee@sq.com, SoftQuad Inc., Toronto, +1 (416) 963-8337
``Agree, for Law is costly. -- Very good advice to litigious Persons, founded
  upon Reason and Experience; for many Times the Charges of a Suit exceed the
  Value of the Thing in Dispute.''	Bailey's dictionary, 1636