[comp.sys.next] Hundreds of books on an optical disk

sarge@metapsy.UUCP (Sarge Gerbode) (10/31/88)

In article <5790@hoptoad.uucp> tim@hoptoad.UUCP (Tim Maroney) writes:

>Let's see, if it takes about two minutes to scan and convert a page,
>and the average book has 250 pages, then that's 500 minutes or over 8
>hours per book -- let's say ten hours to be conservative.  So it would
>take 6710 hours or about three and a third work years to scan in 671
>books.  And I think my two minutes a page estimate may be optimistic,
>not to mention extra costs for indexing and mastering.  Not a basement
>project, I'm afraid.

Well, this must be somewhat ameliorated by the fact that many
publishers surely have most or all of their books in electronic form;
and there are fairly decent full-text retrieval and indexing programs
that would make a normal index obsolete.  One in particular is a
product called "Elexir", from ThirdEye Software in Palo Alto, that is
currently in alpha testing, which allows one to do context searches
and a variety of other actions that an index alone cannot accomplish.
--------------------
Sarge Gerbode -- UUCP:  pyramid!thirdi!metapsy!sarge
Institute for Research in Metapsychology
950 Guinda St.  Palo Alto, CA 94301
-- 
--------------------
Sarge Gerbode -- UUCP:  pyramid!thirdi!metapsy!sarge
Institute for Research in Metapsychology
950 Guinda St.  Palo Alto, CA 94301

bill@bilver.UUCP (Bill Vermillion) (10/31/88)

In article <5790@hoptoad.uucp> tim@hoptoad.UUCP (Tim Maroney) writes:
>In article <3447@pt.cs.cmu.edu> ns@cat.cmu.edu (Nicholas Spies) writes:
>>In article <5772@hoptoad.uucp> tim@hoptoad.UUCP (Tim Maroney) writes:
>>>...
>>>And according to this estimate, a Next disk will hold 671 books at 256M.
>>
>>At $40/book that's $26,840.00 + $50.00 for the disc itself. Just the
>>author's royalties, figured at 15%, would make the disc cost $4,026 (after
>>all, why should the authors take a loss?). Therein lies the problem of very
>>dense media.
>
.... stuff deleted ....
>entirely.  Even with public domain books, the costs of scanning and
>character-recognizing are pretty large.  .......
........
>Let's see, if it takes about two minutes to scan and convert a page,
>and the average book has 250 pages, then that's 500 minutes or over 8
>hours per book -- let's say ten hours to be conservative.  So it would
>take 6710 hours or about three and a third work years to scan in 671
>books.  And I think my two minutes a page estimate may be optimistic,
>not to mention extra costs for indexing and mastering.  Not a basement
>project, I'm afraid.
>-- 

Let's take a step back and look at this again.   If a book is on disk we don't
neccesarily need to be able to read it on a character basis.  The idea is to
be able to READ Shakespeare, not to re-edit, re-create, re-print, etc.

I would suspect that it would be a bit difficult to get publishers to agree to
that form of distribution.

However - if we go to image storage we can still see the book on the screen,
we could have images from the book, we would be able to search through the
book (providing it was indexed - more in a later paragraph), we would be able
to do almost anything except re-edit, re-(etc.)....

So from 8 hours per pook at 2 minutes per page, we can go to 12.5 minutes per
book at 3 seconds per page.

Now before you say that can't be done - let me tell you I saw it.  I forgot
the company that makes it, but the system was a document storage and retreival
system using high speed scanners, fast photo-copy type printers, and 12" laser
disk media.  One of the options was a 12 video juke box.  I don't recall the
exact capacity, but it was large.

Let's just map this onto existing video technology.  In a CAV mode a 12 disk
can store approximate 55,000 frames per side.  When these disks are used for
data they are about 1.2 gigs.  That is about 4.5 times more than the 256meg
disk.  That means we should be able to get about 11,750 (rounded) pages per
256Meg disk.  Or 47 books per disk.   Media cost then is approximately $1.00
per book which puts it just above paper-back printing costs, but below
hard-bound.  And I would estimate it would cost you under $2.00 to ship the
disk first class, as opposed to $$$$ to ship 47 books that way.

The document storage/retrieval system also had software so that you would
index the document as you stored it.  Then anytime you needed the document you
would go to the index and get it.  On a large juke-box that could take 20 to
30 seconds to find the disk, place it, search and then display.  But on a
large juke-box that was finding 1 document out of FIVE MILLION.

THen at a touch of a button you had a full hard copy of the original, and the
company had information on the legal acceptability of such documents.  Quite
impressive.

So instead of 6700 books taking 3 years, we get 50 books taking 10 hours.
This seems a more reasonable route.

An aside - that relates to the above.  Before Sony and Phillip cross-licensed
their CD technology, Sony had developed a "digital audio disk".  They could
see no market for the disk.  Why.  Well they had this disk, 12" in diameter,
and they could not conceive of being able to market a record that played for
20 HOURS per side.   Phillips had a 4" (approx) disk.   Playing time was under
1 hour.  One of the favorite works of a Sony exec. was 73 minutes long, so the
disk was designed for that.  That is where the 12 cm disk came from.

It is probably better to waste space and have a marketable item, than to
achieve maximum capability and have no market at all.  Who - execept a library
would want 6700 disks on a volume.  And what about accesibility to the 6699
other books when someone has the disk to read 1 volume.

-- 
Bill Vermillion - UUCP: {uiucuxc,hoptoad,petsd}!peora!rtmvax!bilver!bill
                      : bill@bilver.UUCP

postmaster@mailcom.FIDONET.ORG (Bernard Aboba) (10/31/88)

Not to mention the copyright problems, which many publishing firms have 
already concluded to be insurmountable.  It is important to keep in mind 
that "higher technology" does not necessarily imply "higher profit."  In 
fact, it can be argued that the single largest force pushing the adoption 
of high technology is the desire to remain competitive -- i.e. if I don't 
develop the technology, someone else will.  This force is NOT operative 
in publishing -- if I own the copyright to informmation X, I'm the only 
one who can publish it, in whatever media.  At that point the question 
becomes "which media will generate the most profit?"  The answer, most 
assuredly is NOT optical, or CD-ROM, and may NEVER be.  Right now, the 
cost of time and materials for copying a $40 textbook of, say 500 pages, 
makes the project barely worthwhile at $0.05 per page.  However, the 
economics of ripping off an entire Encylopedia Brittanica or two is much 
better if the Encyclopedia is on an optical disk.  Plus, the deed could 
be done in a fraction of the time.  Is it so strange for publishers to 
conclude that the major benefactor of optical publishing would be 
pirates?  
The record industry has already concluded the same thing, which is why 
they have vehemently opposed DAT.  My own guess is that floptical drives 
may well sound the death nell not only for CD-ROM, but for much of the 
optical publishing industry, which right now exists almost exclusively to 
serve vertical markets.  In these markets, where you sell a few copies at 
high prices, piracy has devastating effects.  Imagine the damage that 
could be done, say if a volume of legal references were copied by 
virtually every student at a law school, who then took the pirated copies 
with them into their practices?  
You'd not only kill immediate sales, but sales of the product down the 
line.  The advent of eraseable optic media therefore shifts the development away 
from REFERENCE materials such as encyclopedias, to information with a 
TIME VALUE, such as stock price data.  



--  
------------------------------------------------------------------------------
FidoNet:   1:204/444    UUCP: ...!sun!sunncal!mailcom!bernard
INTERNET:  f444.n204.z1.Fidonet.org
US MAIL:   Bernard Aboba, 101 First St. #224, Los Altos, CA 94022

tim@hoptoad.uucp (Tim Maroney) (11/01/88)

In article <5790@hoptoad.uucp> tim@hoptoad.UUCP (Tim Maroney) writes:
>And I think my two minutes a page estimate may be optimistic,
>not to mention extra costs for indexing and mastering.

In article <557@metapsy.UUCP> sarge@metapsy.UUCP (Sarge Gerbode) writes:
>There are fairly decent full-text retrieval and indexing programs
>that would make a normal index obsolete.

I was referring to an automatically generated inverted index, not an
ordinary book index, which would be silly on a high-density optical
medium.  It would still require human checking in any case, just as
optical character recognition does, so the time would be noticeable.

Because of the slow seeks and large amounts of data, it is neccessary
to set up an index on an optical read-only medium at publication time;
run-time search algorithms are way too slow.
-- 
Tim Maroney, Consultant, Eclectic Software, sun!hoptoad!tim
"What's bad? What's the use of turning?
 In Hell I'll be there a-burning!
 Meanwhile, think of what I'm earning!
 All on account of my name." - Bill Sykes, "Oliver"

tim@hoptoad.uucp (Tim Maroney) (11/01/88)

In article <282@bilver.UUCP> bill@bilver.UUCP (Bill Vermillion) writes:
>If a book is on disk we don't
>neccesarily need to be able to read it on a character basis.  The idea is to
>be able to READ Shakespeare, not to re-edit, re-create, re-print, etc.

Wrong.  The idea is to be able to read Shakespeare, to copy and paste
relevant sections for critical essays, to print sections for reading at
leisure when away from the computer, to do word-frequency analyses, to
follow cross-reference chains among related keywords and topics, and so
on.  Computers are a terrible medium for leisure reading -- less text
shows on a screen than on a printed page, and the screen luminescence
leads to eye fatigue, not to mention the lack of physical portability.
If all you can do is read, what you have is far worse than a printed book.

And I have yet to see a stage show where the director didn't do some
editing of the script!

>However - if we go to image storage we can still see the book on the screen,
>we could have images from the book, we would be able to search through the
>book (providing it was indexed - more in a later paragraph), we would be able
>to do almost anything except re-edit, re-(etc.)....

Almost anything; except everything you would expect to be able to do with
computer text, such as copy and paste it, do keyword searches, etc.  You'd
be able to read it and print it out.  What an awesome improvement over the
printed page.

>So from 8 hours per pook at 2 minutes per page, we can go to 12.5 minutes per
>book at 3 seconds per page.

3 seconds a page?  Is that using clairvoyance or what?  Visualize the
process of positioning a book on a flat-bed scanner for a moment.  It
takes anywhere from five to twenty seconds.  Now add the scanning time,
which is at the minimum 3 seconds a page.

>Now before you say that can't be done - let me tell you I saw it.  I forgot
>the company that makes it, but the system was a document storage and retreival
>system using high speed scanners, fast photo-copy type printers, and 12" laser
>disk media.  One of the options was a 12 video juke box.  I don't recall the
>exact capacity, but it was large.

Perhaps you're referring to the Wang system that has gotten so much
publicity.  I don't see how it is well suited to mass distribution of
books; it is meant for keeping copies of receipts and so forth.

>The document storage/retrieval system also had software so that you would
>index the document as you stored it.  Then anytime you needed the document you
>would go to the index and get it.  On a large juke-box that could take 20 to
>30 seconds to find the disk, place it, search and then display.  But on a
>large juke-box that was finding 1 document out of FIVE MILLION.

That's a great approach for receipts.  For books, you're talking at least
two extra minutes per page, with a high error rate and an extremely
inconvenient interface requiring that you "lasso" the words being indexed.
You also have to type them out.

>THen at a touch of a button you had a full hard copy of the original, and the
>company had information on the legal acceptability of such documents.  Quite
>impressive.

And quite irrelevant.

>So instead of 6700 books taking 3 years, we get 50 books taking 10 hours.
>This seems a more reasonable route.

How about a trillion books for no money at all?  That's much more
attractive.  Coming soon to your Isuzu dealer.
-- 
Tim Maroney, Consultant, Eclectic Software, sun!hoptoad!tim
"Because there is something in you that I respect, and that makes me desire
 to have you for my enemy."
"Thats well said.  On those terms, sir, I will accept your enmity or any
 man's."
    - Shaw, "The Devil's Disciple"

sarge@metapsy.UUCP (Sarge Gerbode) (11/01/88)

In article <5799@hoptoad.uucp> tim@hoptoad.UUCP (Tim Maroney) writes:
>>There are fairly decent full-text retrieval and indexing programs
>>that would make a normal index obsolete.
>
>I was referring to an automatically generated inverted index, not an
>ordinary book index, which would be silly on a high-density optical
>medium.  It would still require human checking in any case, just as
>optical character recognition does, so the time would be noticeable.
>
>Because of the slow seeks and large amounts of data, it is neccessary
>to set up an index on an optical read-only medium at publication time;
>run-time search algorithms are way too slow.

I'm really out of my depth on this topic, but I believe one can
improve considerably on a mere inverted index.

Furthermore, All the indexing could be resident on the disk
(estimated about 1/3 or 1/2 the space of the text itself), and one
would not have to *create* the index at run time, merely *use* it, a
process which would take very little time (less than a second,
probably, for a fairly hefty search).
-- 
--------------------
Sarge Gerbode -- UUCP:  pyramid!thirdi!metapsy!sarge
Institute for Research in Metapsychology
950 Guinda St.  Palo Alto, CA 94301

nujohnso@ndsuvax.UUCP (Ceej) (11/01/88)

In article <5790@hoptoad.uucp> tim@hoptoad.UUCP (Tim Maroney) writes:
>
>Let's see, [...]
> So it would
>take 6710 hours or about three and a third work years to scan in 671
>books.  And I think my two minutes a page estimate may be optimistic,
>not to mention extra costs for indexing and mastering.

I would say that if you automated the process, it would cut that time down
to around 2500 hours.  By automating, I mean setting the process up so that
the pages are fed into the process continually and ~24 hours a day.  Note that
this estimate is certainly not conservative, and the time required to set up
this system is not included.  Actual requirements may vary.  Please consult
your CD-ROM handbook for details.

--
nujohnso@ndsuvax.bitnet   nujohnso@plains.NoDak.edu   ...!uunet!ndsuvax!nujohnso
                        i want a shoehorn with teeth

olsen@XN.LL.MIT.EDU (Jim Olsen) (11/02/88)

In article <300.236DAA95@mailcom.FIDONET.ORG>, Bernard Aboba writes:
>Not to mention the copyright problems, which many publishing firms have 
>already concluded to be insurmountable.

But there is a wealth of important works in the public domain, such as
government documents and works of pre-20th-century authors.  While one
would still have to recover scanning costs, these are small compared to
the costs of producing an original work, and will continue to decrease.
Much of the more recent material is already in digital form.

>Imagine the damage that could be done, say if a volume of legal
>references were copied by virtually every student at a law school, who
>then took the pirated copies with them into their practices?

Imagine the value to those law students of having, for modest cost,
the entire United States Code, Code of Federal Regulations, or United
States Reports (Supreme Court decisions) in their shirt pockets!

dmocsny@uceng.UC.EDU (daniel mocsny) (11/02/88)

In article <300.236DAA95@mailcom.FIDONET.ORG>, postmaster@mailcom.FIDONET.ORG (Bernard Aboba) writes:
> Not to mention the copyright problems, which many publishing firms have 
> already concluded to be insurmountable.

As electronic publishing methods mature and provide convenience and
capability far beyond printed media, we find our concepts of intellectual
property to be preventing us from taking advantage of these benefits. My
main quarrel is not with the publishers of the Brittanica, but with the
firms that profit from the sale and distribution of scholarly journals.
The authors of these works do not usually derive any royalty from them.
Furthermore, most of the work is publicly funded, and the authors want
to obtain the widest possible exposure. The system we have now, that of
relying on private companies to typeset, print, and disseminate the
journals, has worked well enough in the past. However, these companies
exist to serve the technical community, and not vice versa. If electronic
publishing can help the members of the technical community share results
with each other more effectively, then we must remove legal barriers
that interfere with it.

That does not have to mean bankruptcy for the technical publishers. If
they took the lead in organizing the infrastructure for electronic
dissemination of the research literature, they could provide better 
service for the same price that the average institution pays now
for its journal subscriptions. Their costs would be lower and their
profits higher. Instead, they will probably sit on the fence and
continue to render our information less available via paper, until
we take matters into our own hands, adopt markup-language standards,
and distribute our own literature free of charge over our own networks.

Mass-market publishers have a different sort of problem, because they
do not serve a community of peers. I.e., a real distinction exists
between producers (writers) and consumers. Since the writers are
profit-motivated, they need paper to defend their intellectual property
rights. At some point, however, the utility of printed information
must become so much lower than the utility of electronic information
that paper will lose its advantage.

Dan Mocsny

cramer@optilink.UUCP (Clayton Cramer) (11/02/88)

In article <5800@hoptoad.uucp>, tim@hoptoad.uucp (Tim Maroney) writes:
> In article <282@bilver.UUCP> bill@bilver.UUCP (Bill Vermillion) writes:
> Wrong.  The idea is to be able to read Shakespeare, to copy and paste
> relevant sections for critical essays, to print sections for reading at
> leisure when away from the computer, to do word-frequency analyses, to
> follow cross-reference chains among related keywords and topics, and so
> on.  Computers are a terrible medium for leisure reading -- less text
> shows on a screen than on a printed page, and the screen luminescence
> leads to eye fatigue, not to mention the lack of physical portability.
> If all you can do is read, what you have is far worse than a printed book.

Issac Asimov wrote a marvelous parody of _The_Double_Helix_ about these
wild, womanizing scientists at Oxford, a century or two from now, 
reinventing the book for exactly these reasons.

If you doubt it, consider how many people curl up with a good machine-
readable book and a computer at the end of long, busy day.  Also, the
number of people who bring along a laptop to sit in an open field and
read for the pleasure of it.

Anyone that wants to spend more time reading in front of a computer,
instead of a printed page, isn't working hard enough!
-- 
Clayton E. Cramer
..!ames!pyramid!kontron!optilin!cramer

bzs@encore.com (Barry Shein) (11/02/88)

On-line publishing will require new economics and ways of doing
business, the old ways may have to wither. What will drive it will be
watching competitors making profits. If they don't, then it was a
failure, if they do one will change their way of looking at their own
business to adapt (or die.) Current "problems" in the economics of
paper publishing cannot be viewed as insurmountable obstacles to
on-line publishing, only just what they are, the old order.

I have little doubt someone suggested automobiles would never catch on
due to the large investments buggy builders had in horse farms.

	-Barry Shein, ||Encore||

geb@cadre.dsl.PITTSBURGH.EDU (Gordon E. Banks) (11/02/88)

In article <5790@hoptoad.uucp> tim@hoptoad.UUCP (Tim Maroney) writes:
>
>Yep.  All I was talking about was how many would fit.  Whether it could
>ever be economically feasible to publish such a disk is another matter
>entirely.  Even with public domain books, the costs of scanning and
>character-recognizing are pretty large.

Not really.  You can estimate it by the cost of getting books from
University Microfilms.  They have to photocopy each page.  A normal
sized book is around $50.  This covers retrieving the book from
whatever library has it, and the labor of copying it.  Of course,
they expect to sell more copies of the microfilm later, but this
would apply in spades to optical disk versions.  OCR programs will
soon be sophisticated enough that it won't add much to the cost
of simply photocoping the book.  Compared to conventional publication
(typesetting) this cost is trivial.  If all books worth reading in
the public domain were done, it would be a wonderful thing.  I suspect
people will start doing this as soon as the market is large enough.
The real hang up is going to be with current books where royalties
will have to be paid.

vnend@ms.uky.edu (D. W. James -- Staff Account) (11/03/88)

In article <282@bilver.UUCP> bill@bilver.UUCP (Bill Vermillion) writes:
)Now before you say that can't be done - let me tell you I saw it.  I forgot
)the company that makes it, but the system was a document storage and retreival
)system using high speed scanners, fast photo-copy type printers, and 12" laser
)disk media.  One of the options was a 12 video juke box.  I don't recall the
)exact capacity, but it was large.
)Bill Vermillion - UUCP: {uiucuxc,hoptoad,petsd}!peora!rtmvax!bilver!bill

	If it was the same system that I saw written up in (I think) PC_WEEK
last year it's capacity at the limit was 1.2 TERABYTES.  Not a trivial 
amount of storage...

	
-- 
Vnend, posting from his other account, on a machine about 100 yards
horizontally, and 40 yards vertically, from the other one.
vnend@ms.uky.edu or vnend@ukma.bitnet or vnend@engr.uky.edu                          
"A few days later, I got a letter... advising me to forsake my sordid lifestyle and give all my hickies to the living Terim."  The Countess, CEREBUS #54

tim@hoptoad.uucp (Tim Maroney) (11/03/88)

In article <5790@hoptoad.uucp> tim@hoptoad.UUCP (Tim Maroney) wrote:
>Even with public domain books, the costs of scanning and
>character-recognizing are pretty large.

In article <1676@cadre.dsl.PITTSBURGH.EDU> geb@cadre.dsl.pittsburgh.edu
(Gordon E. Banks) has been writing:
>Not really.  You can estimate it by the cost of getting books from
>University Microfilms.  They have to photocopy each page.  A normal
>sized book is around $50.  This covers retrieving the book from
>whatever library has it, and the labor of copying it.

So $50*671 = $33,500.  Not a trivial investment.  This is the cost to the
publisher of making the book, though it would be spread out among the
individual copies.  And that's still not factoring in the OCR running and
proofreading, not to mention pre-mastering and mastering and duplication.
And promotion and....

>OCR programs will
>soon be sophisticated enough that it won't add much to the cost
>of simply photocoping the book.

Disagree.  It'll always take proofreading, and for 671 books that's quite
a lot of skilled labor to pay for.

>Compared to conventional publication (typesetting) this cost is trivial.

Agree provisionally; per book it's relatively trivial; for hundreds of books
it far exceeds the production cost of a single typeset book.

>If all books worth reading in
>the public domain were done, it would be a wonderful thing.  I suspect
>people will start doing this as soon as the market is large enough.
>The real hang up is going to be with current books where royalties
>will have to be paid.

Completely agree!  I hope it happens, but as someone who did a minor
feasibility study on doing it himself, I have to say it seems a long
way off.  The barriers are formidable.
-- 
Tim Maroney, Consultant, Eclectic Software, sun!hoptoad!tim
"The time is gone, the song is over.
 Thought I'd something more to say." - Roger Waters, Time

prem@andante.UUCP (Swami Devanbu) (11/03/88)

Bah humbug.

When I read a book, I want to curl up in a comfy chair, with a blanket
around me, a bowl of curried popcorn, and a pot of tea. 

A computer is a computer and a book is a book. 

Prem Devanbu 
Artificial Intelligence Principles Research Dept.,
(W) 201 582 2062 
(H) 201 757 3748 
MH 3C-438 AT&T Bell Laboratories
600 Mountain Ave,
Murray Hill NJ 07974, USA
prem%allegra@research.att.com
{ihnp4,ucbvax,vax135,decvax,....}!allegra!prem

bob@allosaur.cis.ohio-state.edu (Bob Sutterfield) (11/04/88)

In article <13203@andante.UUCP> prem@andante.UUCP (Swami Devanbu) writes:
>A computer is a computer and a book is a book. 

One of my pet peeves is the blatant misapplication of technology, in
the rush to make everything high-tech to sell to the American MTV
culture.  I could buy a refrigerator that has a microprocessor-
controlled front panel to dispense ice to me, and a microwave that's
preprogrammed for all kinds of things I never cook, and a Datsun that
talks to me.

Technology should be applied to advantage where appropriate, and
engineers should have enough restraint not to put a microprocessor in
every toaster oven they sell.  Similarly, there are many current
products of the publishing industry that are completely inappropriate
for mass electronic distribution.  This will be decided in the market
place, where it should be.

That said, I am in favor of making the computer a useful reference and
mind-amplification tool, again only where appropriate.  For example, I
use a semi-automated concordance for bible studies, and I use a
spelling checker for almost everything of consequence I write.  I look
forward to the information management capabilities that the NeXT
machine may someday put on my desk, but my wife would object if I
curled up with the cube for a little bedtime reading :-)
-=-
Zippy sez,								--Bob
Here I am in the POSTERIOR OLFACTORY LOBULE but I don't see CARL SAGAN
 anywhere!!

henry@utzoo.uucp (Henry Spencer) (11/04/88)

In article <300.236DAA95@mailcom.FIDONET.ORG> postmaster@mailcom.FIDONET.ORG (Bernard Aboba) writes:
>...The advent of eraseable optic media therefore shifts the development away 
>from REFERENCE materials such as encyclopedias, to information with a 
>TIME VALUE, such as stock price data.  

Or to material that has a mass market.  A CD-ROM encyclopedia that cost
$100, rather than thousands, would probably sell quite briskly and not be
too troubled by piracy.  Bulk copying of digital media becomes a problem
in the same situation where photocopying of books becomes a problem:
when the price far exceeds copying costs, i.e. when the publisher has
decided to gouge a small market rather than try for modest profits from
a large one.  For some types of material, the publisher doesn't have a
choice, since the market simply *is* small.  For things like encyclopedias,
though, simply dropping the price will expand the market.
-- 
The Earth is our mother.        |    Henry Spencer at U of Toronto Zoology
Our nine months are up.         |uunet!attcan!utzoo!henry henry@zoo.toronto.edu

dmocsny@uceng.UC.EDU (daniel mocsny) (11/04/88)

...then let's build a world our machines can work in. 100 years ago people
were trying to replace the horse with internal combustion engine-driven
vehicles. Now the obvious approach would have been to build some sort of
mechanical analog of the horse, strap an engine on it, and keep everything
transparent to the users. Since that was not possible, the next easiest
thing was to change the world to accommodate the strength/weakness mix
of the best way to run engines: on wheeled chassis. So we put $ billions
into paving over some of the best real estate in the country. Now we have
a world that accommodates motor vehicles, to some extent.

In article <5821@hoptoad.uucp>, tim@hoptoad.uucp (Tim Maroney) writes:
> So $50*671 = $33,500.  Not a trivial investment.  This is the cost to the
> publisher of making the book, though it would be spread out among the
> individual copies.  And that's still not factoring in the OCR running and
> proofreading, not to mention pre-mastering and mastering and duplication.
> And promotion and....
> 
> It'll always take proofreading, and for 671 books that's quite
> a lot of skilled labor to pay for.

Let's not forget that virtually every book that makes it into print these
days passes through a computer at some stage in its production. Most
authors use word processors (either directly or through secretaries),
most publishers use electronic typesetting, and some of us authors
dabble in both. So most of the work the CD-ROM publishers have to do
has already been done somewhere. Printing books degrades the utility that 
was present when that information was originally in electronic 
form. From the standpoint of
the CD-ROM vendors and potential users, publishers and authors who
release information in printed form exclusively are destroying wealth.
By refusing to establish and adhere to electronic document standards,
we are reducing the amount of information we can exploit and pass on
to our progeny. In other words, we are shooting ourselves in the foot.

A world optimized for horses was no good for automobiles. The latter
was useless until a new world was built. Similarly, a world optimized for
paper is no good for computers. To get the most benefit out of our
new technology, we need to change the way we do things.

Obviously the existing stock of printed information will not benefit 
from re-designing our world to match the strengths and weaknesses of
computers. But I would hesitate to say that OCR will _always_ require
proofreading. OCR is a hard problem, but certainly not an impossible
problem. It is only a mapping from the (very large) vector space of
possible letter bitmaps to the smaller space of letter codes and
font descriptions. The structure of that mapping is complex, but
not infinitely so, else we could not read. Connectionist approaches to
OCR are already showing great promise. In ten years it might be 
essentially a solved problem. A harder problem will be to have a
computer make sense of arbitrary figures and diagrams. But that
won't be necessary; the OCR machine can simply vectorize or
bitmap anything it can't otherwise interpret.

Give a smart OCR device, we could ``mine'' libraries for their 
information content. Just load the hopper with books, press the
button, and take the information out of those mouldering tombs and
put it in the hands of people who can go out and create wealth
with it.

Dan Mocsny

danscott@atari.UUCP (Dan Scott) (11/04/88)

in article <1147@xn.LL.MIT.EDU>, olsen@XN.LL.MIT.EDU (Jim Olsen) says:

> Imagine the value to those law students of having, for modest cost,
> the entire United States Code, Code of Federal Regulations, or United
> States Reports (Supreme Court decisions) in their shirt pockets!

I would have to agree.  I usually take what a lawyer tells me with a
grain of salt, but perhaps I would have more faith if I knew he had access to
all cases that set president for what I am dealing with.

Dan

lange@lanai.cs.ucla.edu (Trent Lange) (11/04/88)

In article <13203@andante.UUCP> prem@andante.UUCP (Swami Devanbu) writes:
>
>Bah humbug.
>
>When I read a book, I want to curl up in a comfy chair, with a blanket
>around me, a bowl of curried popcorn, and a pot of tea. 
>
>A computer is a computer and a book is a book. 
>
>Prem Devanbu 

Well, then, what you *really* want is a flat screen about the size of
a piece of notebook paper, and maybe about half an in deep.  With
perhaps a touch screen, for point and click operations.  Maybe
an infrared or radio connection, a *really* fast one, to the main
cube on your desk.

*Then* you could curl up in your comfy chair and blanket, and have
access to the complete works of Shakespeare.  And, of course, your
newspaper would be electronic, so you wouldn't have to worry about
getting grimy newsprint on your curried popcorn!

Someday...

Trent Lange

**********************************************************************
*          "UCLA:  The fifth best country in the Olympics"           *
**********************************************************************

sac@well.UUCP (Steve Cisler) (11/04/88)

Bernard Aboba is quite right; the print publishers are quite reticent
about seeing their libraries moved to electronic format. The Library of
Congress has a number of projects including a new one called AMERICAN
MEMORY where they hope to make available some of their 85,000,000 items
to the American people through some optical medium. While they house the
stuff, they don't have the rights and permissions.  The Faxon Company
recently did a survey of periodical publishers, and most had little
or no interest in seeing their print pubs in any electronic format.
A few did, but most of those were unsure if it would be profitable.

UMI has an experiment at Northwestern Univ. in Evanston where the images
of periodical pages are on CD-ROM. Evidently the publishers would not
allow just the ascii to be put on. The only pointers are to articles
not to a fragment of text (keyword in context).  But at least you get
to see the ads too!
Steve Cisler
Apple Library
408 974 3258

geb@cadre.dsl.PITTSBURGH.EDU (Gordon E. Banks) (11/05/88)

In article <5821@hoptoad.uucp> tim@hoptoad.UUCP (Tim Maroney) writes:
>Completely agree!  I hope it happens, but as someone who did a minor
>feasibility study on doing it himself, I have to say it seems a long
>way off.  The barriers are formidable.
>-- 

I think you will find that libraries, including the Library of
Congress will be doing this for us.  Book preservation is very
expensive and putting them all on CD while the actual copies get
stored in CO2 or such is one answer to this problem.  It may be
a lot cheaper for a library to give you electronic access to
its collection than actual access.  The only thing is, I like
to read in bed, and even a laptop gets heavy on my chest.

desnoyer@Apple.COM (Peter Desnoyers) (11/05/88)

>In article <5790@hoptoad.uucp> tim@hoptoad.UUCP (Tim Maroney) writes:
>> So it would
>>take 6710 hours or about three and a third work years to scan in 671
>>books.  And I think my two minutes a page estimate may be optimistic,
>>not to mention extra costs for indexing and mastering.

Unbind the book, first, then put it through a sheet feeder. I'm sure
there's a high-tech way to unbind a book, but zipping the binding off
on a good circular saw works fine. (I've seen it done to Inside Mac,
to loose-leaf bind it.) Should be ~5min per book, plus <5 sec. per page
for per-sheet paper handling. (Use the guts of a good copy machine.)

				Peter Desnoyers

bzs@encore.com (Barry Shein) (11/06/88)

Once again a thought virus...people's minds are running amuck.

The point of putting books on-line is not to ensure you never read in
bed again, it's to make them accessible to new generations of tools,
for the researchers, writers and curious of the world.

Here's a good one. A long time ago on a list far far away someone
claimed that men could breast feed under certain conditions citing
various second-hand accounts (eg. talks from La Leche League.) The
claim was that if the infant suckled a male breast long enough (I
suppose supplementary feeding is necessary) it would begin to produce
milk. Further, they claimed this was indeed not uncommon in those
apocryphal primitive tribes who are always doing these amazing things.

I had no idea, so I went to the University library to spend a few
hours seeing if I could track down something authoritative on the
subject. Try it, it's nearly impossible (although seemingly too
literal-minded I finally found some references in the Nursing
library.) I didn't quite get satisfaction but it appears to be untrue,
bordering on urban legend (I'd be glad to hear about anything
*authoritative* about it still, something more than you once heard it
was true or read it in a pamphlet somewhere.)

That's the kind of thing on-line libraries should get you, and it's
invaluable.

Let's not extrapolate ad nauseum just to appear to have poked a
hole in an idea.

	-Barry Shein, ||Encore||

P.S. This is not meant to criticize La Leche, in fact, I doubt they
ever claimed the above, it may have just been one of those things that
"goes around", using their name for seeming authority.

ns@cat.cmu.edu (Nicholas Spies) (11/06/88)

In article <1218@atari.UUCP> danscott@atari.UUCP (Dan Scott) writes:
>in article <1147@xn.LL.MIT.EDU>, olsen@XN.LL.MIT.EDU (Jim Olsen) says:
>
>> Imagine the value to those law students of having, for modest cost,
>> the entire United States Code, Code of Federal Regulations, or United
>> States Reports (Supreme Court decisions) in their shirt pockets!
>
>I would have to agree.  I usually take what a lawyer tells me with a
>grain of salt, but perhaps I would have more faith if I knew he had access to
>all cases that set president for what I am dealing with.
>
>Dan

More fun yet would be an expert system that would scarf through the legal
database looking for precedents relating to your current case, tie them
together into an easily-understood argument for your review, and not cost an
arm and a leg for each user. If laws are in any sense rational and analogies
between earlier and current cases hold any water, this should be possible in
principle. 

The extremely important thing to note is that this technology, once
developed, should not be privately owned but freely available to prosecutors
and defendents alike, as each citizen should be entitled to equal access to
the body of information that constitutes "the law", in practice as well as
theory. As this is most definitely _not_ the case now, and a great many
lawyers profit mightily under the present situation, chances are better than
even that legal AI would be made illegal--except for use by "qualified
professionals"...

-- 
Nicholas Spies			ns@cat.cmu.edu.arpa
Center for Design of Educational Computing
Carnegie Mellon University

barry@confusion.ads.com (Barry Lustig) (11/06/88)

In article <26543@tut.cis.ohio-state.edu> bob@allosaur.cis.ohio-state.edu (Bob Sutterfield) writes:
	
	...
	
	Technology should be applied to advantage where appropriate, and
	engineers should have enough restraint not to put a microprocessor in
	every toaster oven they sell.
	...

I was in Macy's the other day.  In the kitchen gadget department I saw
the following label on one of the tosters:

	Microchip toasting technology.  Microprocessor temperature
	controlled.

Barry Lustig
Advanced Decision Systems	barry@ADS.COM

wetter@cit-vax.Caltech.Edu (Pierce T. Wetter) (11/08/88)

> I think you will find that libraries, including the Library of
> Congress will be doing this for us.  Book preservation is very
> expensive and putting them all on CD while the actual copies get
> stored in CO2 or such is one answer to this problem.  It may be
> a lot cheaper for a library to give you electronic access to
> its collection than actual access.  The only thing is, I like
> to read in bed, and even a laptop gets heavy on my chest.
   
  The last time I was in the library of congress, they were scanning the
books at 300dpi and displaying them on special terminals. Clearly not the
most efficent way of doing this. What really needs to be done is to make a
standard for electronic books. Here's my quick draft of a storage method:

Every book is composed of a series of records. Each record consists of a header
followed by some data. There are three major types of records: formatting,
text and pictures. A format record contains formatting information for a
following record of text or pictures. (Formatting codes could be either TeX or
RichTextFormat or Postscript or something special.) Pictures are stored in
Postscript, GIF or Tiff format depending on their origin (line art or pictures)

Pierce
____________________________________________________________________________
You can flame or laud me at:
wetter@tybalt.caltech.edu or wetter@csvax.caltech.edu or pwetter@caltech.bitnet

Caution: All my postings are 100% accurate from my point of view. However, my
point of view rarely translates into english. Therefore any errors in my 
posting are your fault for not interpreting it correctly.

dmocsny@uceng.UC.EDU (daniel mocsny) (11/09/88)

In article <13203@andante.UUCP>, prem@andante.UUCP (Swami Devanbu) writes:
> When I read a book, I want to curl up in a comfy chair, with a blanket
> around me, a bowl of curried popcorn, and a pot of tea. 
> 
> A computer is a computer and a book is a book. 

An amateur historian is completing a study on the origins of networked
civilizations. She is working diligently in her favorite location --
in a small rowboat floating in a farmpond. Being something of a
traditionalist, she has so far resisted the temptation to get the
brain stem implants so many of her friends are raving about. She's
still sticking with the fast-obsoleting virtual workstation hardware.

She wears a translucent pair of wrap-around goggles. These display a
pair of binocular images to her, each with pixel and color resolution
matching her visual acuity. The goggles provide a field of view as wide
as her visual arc. Head-motion sensors in the goggles send
information to her pocket-sized 100 GIPS computer. The computer uses
this information to pan the displays to cancel her head motion, so she
has the convincing impression of being inside a virtual environment.
She can interact with objects in the environment by moving her hands
in a natural way. She is wearing a thin pair of gloves that report
her hand motion to the computer. The computer projects an image of
her hands in the virtual environment and adjusts virtual objects 
as she manipulates them. The goggles also track her eye motion, so
she can point to objects simply by looking at them and speaking commands
(the computer recognizes her speech).

To perform her study, she whispers to her computer, ``historical
archives.''  The computer creates an animated representation of
sailing over a city and landing before a large building. She floats
inside and settles at a wide mahogany table. She starts naming off
topics of interest, the corresponding virtual books float out of their
virtual shelves, glide to her, open themselves to the pages of interest,
and float before her. With a practiced flurry of glances and gestures,
she arranges them to her liking, scans a few documents, and begins to
dictate her thoughts.

As her essay ranges to other topics, her computer suggests additional
reference material. At one point she is reviewing an archived discussion
from the historically significant Usenet. She stumbles upon a thread
relating to the early efforts to place printed materials on optical
disks. She reads a few quotes and marvels at how quaint they sound
in retrospect. Imagine, real paper books!

She recalls seeing a few at a museum, carefully stored under nitrogen
beneath thick glass. How her predecessors must have struggled with
them...they looked so heavy, so bulky, so clumsy, and above all, so
inflexible! Having data in a static form, how could one search it,
extract portions for comment, analysis, or elaboration? What if a book
contained errors? How was one to locate all the copies and notify
their owners? How could one simultaneously view a hundred of them? How
could one possibly have enough on hand to do any serious work? How to
write anything at all, never having assurance that one's readers would
have immediate access to all the necessary background material? She
speculates that the hapless writers of the past either had to speak
hopelessly above most reader's heads or else painstakingly repeat
information already available elsewhere. No wonder progress had been
so slow! With hordes of people duplicating each other's efforts, that
progress had occurred at all was amazing. And how was anyone to read
comfortably? Fumbling with turning pages, struggling to to get the
correct lighting...could those people have read anything while
lying in bed? She struggles with the idea momentarily, then gives up.

Wearying with her thoughts and labors, she tells her computer to
save her work environment. She will return to it later. She pulls
off her goggles and gloves, and slides them into a case on her belt. 
She seizes the oars, and slowly makes for shore.

Dan Mocsny

fenwick@garth.UUCP (Stephen Fenwick) (11/09/88)

In article <17555@shemp.CS.UCLA.EDU> lange@cs.ucla.edu (Trent Lange) writes:
>In article <13203@andante.UUCP> prem@andante.UUCP (Swami Devanbu) writes:
>>Bah humbug.
>>When I read a book, I want to curl up in a comfy chair, with a blanket
>>around me, a bowl of curried popcorn, and a pot of tea. 
>>A computer is a computer and a book is a book. 
>>Prem Devanbu 
>Well, then, what you *really* want is a flat screen about the size of
>a piece of notebook paper, and maybe about half an in deep.  With
>perhaps a touch screen, for point and click operations.  Maybe
>an infrared or radio connection, a *really* fast one, to the main
>cube on your desk.
>
>*Then* you could curl up in your comfy chair and blanket, and have
>access to the complete works of Shakespeare.  And, of course, your
>newspaper would be electronic, so you wouldn't have to worry about
>getting grimy newsprint on your curried popcorn!
>
>Someday...
>
>Trent Lange

Bah Humbug, Trent.  Prem was right.  When I want to turn a page, I want
to turn a real page, made of paper (preferably acid-free).
What's the rated lifetime of a CD-ROM (before oxygen starts to corrupt
to aluminum surface) ?  What are the environmental limits (temperature,
humidity, etc) ?  I have some books that are well over 100 years
old, and still in excellent condition (no data drop outs).  Some have
been exposed to excessive humidity (drop in water, 'way back when); 
some have been heat-stressed (airplane cargo bay to hot car).  None
have lost any information.

One of my primary tools is a 14th Ed., 1929 Encyclopaedia Britannica.
If it is true that the life of a CD-ROM is less than 50 years, I would
now be seeing data loss.  This is unacceptable.

Steve Fenwick
-- 
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\/////////////////////////////////////////
My company is not responsible for what I say.  I might be...
E-Mail route: ...!{ sun | sri-unix }!pyramid!garth!fenwick
USPS: Intergraph APD, 2400 Geng Road, Palo Alto, California 
AT&Tnet: (415) 852-2325
//////////////////////////////////////\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

fenwick@garth.UUCP (Stephen Fenwick) (11/09/88)

In article <4105@encore.UUCP> bzs@encore.com (Barry Shein) writes:
>The point of putting books on-line is not to ensure you never read in
>bed again, it's to make them accessible to new generations of tools,
>for the researchers, writers and curious of the world.
>
>[ very sound example of the need for automated library search capabilities ]

The only problem with this is keeping everything on file in a manner that
allows users to find what they need.  This is non-trivial, as the information
content of a work may not be limited by the author's conception of the its
content.  Watch the PBS series "Connections" to see what I mean.  Machines
are currently very good a fast data retrieval, but decidedly bad at making
inferences about the data that they store.

Steve Fenwick

-- 
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\/////////////////////////////////////////
My company is not responsible for what I say.  I might be...
E-Mail route: ...!{ sun | sri-unix }!pyramid!garth!fenwick
USPS: Intergraph APD, 2400 Geng Road, Palo Alto, California 
AT&Tnet: (415) 852-2325
//////////////////////////////////////\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

dmocsny@uceng.UC.EDU (daniel mocsny) (11/10/88)

In article <1803@garth.UUCP>, fenwick@garth.UUCP (Stephen Fenwick) writes:
> One of my primary tools is a 14th Ed., 1929 Encyclopaedia Britannica.
> If it is true that the life of a CD-ROM is less than 50 years, I would
> now be seeing data loss.  This is unacceptable.

This is also incomprehensible. Do we imagine that in fifty years we will
be unable to create arbitrarily many backups of important information?
I suppose some proponents of copyright law would like to jeopardize your
data security, but this will not be a technological problem. Indeed, it
would not have to be a problem right now if the WORM people could get
their standards together.

Unfortunately, many recent books were not printed on acid-free paper.
And few books are of sufficient quality to stand up to serious use.
Many libraries' collections are crumbling away. We must archive this
knowledge to electronic form soon or lose it forever.

Books certainly work well when you don't need many of them, or when you
refer to passages so frequently that you might as well leave them open
on your desk. Nobody is trying to do away with books altogether (yet).
When you need occasional access to information stuck in a huge
collection (system docs, parts catalogs, technical literature) CD-ROM
makes sense. As display and storage technologies mature, electronic
publishing will spread.

Dan Mocsny

dmocsny@uceng.UC.EDU (daniel mocsny) (11/10/88)

In article <1804@garth.UUCP>, fenwick@garth.UUCP (Stephen Fenwick) writes:
> The only problem with this is keeping everything on file in a manner that
> allows users to find what they need.  This is non-trivial, as the information
> content of a work may not be limited by the author's conception of the its
> content.  Watch the PBS series "Connections" to see what I mean.  

This is exactly why we need to store information in a form that
retains the maximum flexibility, because the author cannot predict all
the uses it might find. Suppose we just store all the books and
articles as fully-indexed files, and follow the present card catalog
system. Is this going to make information less accessible than it now
is in printed form?

How much human effort goes into re-typing printed information? Look
at almost any scholarly paper out there.  Up to half of it is
literature survey. Most of the survey is there because the author
can't count on readers having ready access to all the previous papers.
Sometimes the survey adds value, by putting previous work in
perspective, but a lot of it simply gives researchers useless
work to do.

> Machines
> are currently very good a fast data retrieval, but decidedly bad at making
> inferences about the data that they store.

True enough, but I'll be happy to make the inferences about what I
need. First I've got to get at the information. A machine that did
no more than automatically retrieve all the citations in a given paper
would be an enormous help. (You know how frustrating being stumped by
a missing citation is -- the author skips some important steps because
they're in paper X, your library doesn't have the journal, so off you
go, wasting valuable time and money trying to track it down.) I could
also make real progress with a few boolean expressions and short
phrases, provided that I could search abstracts and/or text of papers
and books.

Perhaps someday we will have machines that ``look over your shoulder''
and spot analogies between problem X that's stumping you and problem
Z that appeared in some obscure East-block journal. If we could do
that today, our 50-year technological diffusion patterns would speed
up to weeks and days.

Dan Mocsny

vitale@hpcupt1.HP.COM (Phil Vitale) (11/11/88)

> > One of my primary tools is a 14th Ed., 1929 Encyclopaedia Britannica.
> > If it is true that the life of a CD-ROM is less than 50 years, I would
> > now be seeing data loss.  This is unacceptable.

> ... Do we imagine that in fifty years we will
> be unable to create arbitrarily many backups of important information?

If there is ultimately no hardcopy -- relying on arbitrary numbers
of backups gets scary.

"Who" decides what books are *important* enough to backup?  The number of
works will not decrease, making the backup process an ever growing problem.

In addition, CD-ROM is probably the first major print distribution medium
where the act of copying and the act of modifying are of equal ease.

"Who" will insure that the copy of the book in front of you is really a
copy of the original, or one that was modified along the way by a "concerned"
individual/party/government when it was "backed-up."  (Orwell and 1984.)

(Not that these concerns are new to CD-ROMs, just that the potential
for abuse seems greater.)

> Unfortunately, many recent books were not printed on acid-free paper.
> And few books are of sufficient quality to stand up to serious use.
> Many libraries' collections are crumbling away. We must archive this
> knowledge to electronic form soon or lose it forever.

Electronic form is not the only way to preserve knowledge.  Books have
been remarkably successful at preserving information across the ages.
(Are we really going to have a CD-ROM reader capable of reading the
disks we make today say 300 years from now?)

There are serious efforts underway to come up with methods to de-acidify
large (room-sized) numbers of books at a time.  Also from what I gather,
there are no longer major price or technology obstacles in producing
acid or non-acid paper.  Rather it is an issue of investments in existing
processing equipment.  (Can someone with closer experience to the industry
comment?)

> Nobody is trying to do away with books altogether (yet).

I would hope not ever, not completely.  The methods used for long-term
preservation and rapid information access need not preclude each other.

I really enjoyed pouring over some of the original papers of da Vinci,
Darwin, and Bach.  (Handling drafts, pencil-written by Tolkein, was quite
a thrill for a young undergrad.)

Somehow the same information loses its impact
when it is displayed on a
CRT screen.

Then again, it would have been nice to have some backup disks of the
library at Alexandria before the fire ...

> Dan Mocsny

Phil Vitale

ekwok@cadev4.intel.com (Edward C. Kwok) (11/11/88)

In article <1218@atari.UUCP> danscott@atari.UUCP (Dan Scott) writes:
>in article <1147@xn.LL.MIT.EDU>, olsen@XN.LL.MIT.EDU (Jim Olsen) says:
>
>> Imagine the value to those law students of having, for modest cost,
>> the entire United States Code, Code of Federal Regulations, or United
>> States Reports (Supreme Court decisions) in their shirt pockets!
>
>I would have to agree.  I usually take what a lawyer tells me with a
>grain of salt, but perhaps I would have more faith if I knew he had access to
>all cases that set president for what I am dealing with.

Not very likely, each volume of the United States Report contains, on the
average 1500 pages, and each page contains roughly 8000 characters. That
makes about 12 Mbytes per volume. There are more than 470 volumes, the 
last time I look. 







de

dmocsny@uceng.UC.EDU (daniel mocsny) (11/12/88)

In article <-290109999@hpcupt1.HP.COM>, vitale@hpcupt1.HP.COM (Phil Vitale) writes:
> If there is ultimately no hardcopy -- relying on arbitrary numbers
> of backups gets scary.

I observe a fair number of people using computers. I have no statistics
to back me up, but I think a see a general trend: computer users with
less experience and sophistication are often quicker to hit the printers
for rough drafts, etc. That is, they tend to generate more hardcopy per
unit of work done. I say this not to disparage them. Present-day computers
are still brittle, expensive, and lacking adequate displays. Learning to
work with less paper takes practice and motivation. We are still a long
way from eliminating paper.

I do not expect this to always be true. Eventually, computers will be
cheap and reliable to the point of transparency. We will not fear relying
on them over paper any more than we currently fear relying on telephones
over couriers. (What?!? Your office has X telephones that you use
constantly, and you don't retain a comparable number of messenger boys
on your staff? What if something went...wrong????)

> "Who" decides what books are *important* enough to backup?  The number of
> works will not decrease, making the backup process an ever growing problem.

The only reason we have to ration our information recording is because
we have not yet mastered the art of doing it cheaply. This will
change. Have you seen the projected pricing on digital paper? A one GB
diskette for $5.  660 GB tape reels, at $0.005/MB.  The number of
works will increase, but storage technologies are advancing faster. I
expect to live to see the day when the average person can afford
storage capacity sufficient for today's Library of Congress.

> "Who" will insure that the copy of the book in front of you is really a
> copy of the original, or one that was modified along the way by a "concerned"
> individual/party/government when it was "backed-up."  (Orwell and 1984.)
> 
> (Not that these concerns are new to CD-ROMs, just that the potential
> for abuse seems greater.)

How do you know that a paper book is legitimate? I think this has more
to do with the number of copies than anything else, not to mention the
ease with which two copies may be compared. Which would you rather do,
verify that two paper books were identical, or type diff? You are right,
though, we need a central repository to maintain archived masters, since
electronic information invites editing.

> Electronic form is not the only way to preserve knowledge.  Books have
> been remarkably successful at preserving information across the ages.
> (Are we really going to have a CD-ROM reader capable of reading the
> disks we make today say 300 years from now?)

Books have been amazingly good, with maddening exceptions, of course. If
we were really into leaving a heritage, we would go back to clay tablets.
They resist burning better.

In 300 years our information technologies could be so ridiculously advanced
that we should have machines that could start with almost any chunk of
matter and tell whether or not it was an artifact. They should be
able to tell whether said artifact contains recorded information, and then
extract as much of it as remains. A CD-ROM would be easy pickings compared
to digging up clay tablets written in totally forgotten languages. However,
the beauty of electronic media is that they make information readily
available. Unless we suffer a breakdown in civilization (at which point
reading CD-ROMs will be the least of our worries), we can easily copy our
accumulated information to each succeeding storage technology that appears.
With paper you're stuck with what you've got. Time hates a static medium.

> I really enjoyed pouring over some of the original papers of da Vinci,
> Darwin, and Bach.  (Handling drafts, pencil-written by Tolkein, was quite
> a thrill for a young undergrad.)

I hope the manuscripts were not badly damaged by whatever you poured over
them. :-)

> Somehow the same information loses its impact
> when it is displayed on a
> CRT screen.

You'll never hear me claiming today's display technology to be anywhere
near adequate. It is improving, albeit much too slowly. I see no reason
to doubt that computer displays will eventually match our visual acuity.
I do doubt I will see that happen soon.

> Then again, it would have been nice to have some backup disks of the
> library at Alexandria before the fire ...

My sentiments exactly. Putting all your information on hard-to-copy
media and stacking them in one place is thumbing your nose at reality.
Then again, pity the poor despots who will be robbed of their ability
to make a shocking public spectacle. Somehow, typing rm * just doesn't
have the same impact as a big, roaring bonfire.

> Phil Vitale

Dan Mocsny

tiedeman@acf3.NYU.EDU (Eric S. Tiedemann) (11/12/88)

In article <-290109999@hpcupt1.HP.COM> vitale@hpcupt1.HP.COM (Phil Vitale) writes:
>"Who" decides what books are *important* enough to backup?  The number of
>works will not decrease, making the backup process an ever growing problem.

To a greater extent than in the past, individuals will be able to afford to
make backups.  This is the good we see.

>In addition, CD-ROM is probably the first major print distribution medium
>where the act of copying and the act of modifying are of equal ease.
>
>"Who" will insure that the copy of the book in front of you is really a
>copy of the original, or one that was modified along the way by a "concerned"
>individual/party/government when it was "backed-up."  (Orwell and 1984.)
>
>(Not that these concerns are new to CD-ROMs, just that the potential
>for abuse seems greater.)

Ultimately, the reader will have to ensure this for himself.  One way is to use
public key authentification.  It's a lot easier to be sure you have the right 
key than that you have the right text.

>Electronic form is not the only way to preserve knowledge.  Books have
>been remarkably successful at preserving information across the ages.

OK.  The prudent among us may use codex form.  Again, this will be cheaper to 
do if you have the text in machinable form to begin with.

>There are serious efforts underway to come up with methods to de-acidify
>large (room-sized) numbers of books at a time.  Also from what I gather,

References?

>Somehow the same information loses its impact
>when it is displayed on a
>CRT screen.

Somehow information loses its impact when it's gone, as you go on to note.

>Then again, it would have been nice to have some backup disks of the
>library at Alexandria before the fire ...

Eric
tiedeman@acf3.nyu.edu

news@littlei.UUCP (11/12/88)

In article <1804@garth.UUCP> fenwick@garth.UUCP (Stephen Fenwick) writes:
|In article <4105@encore.UUCP> bzs@encore.com (Barry Shein) writes:
|>The point of putting books on-line is [...] to make them accessible to new 
|>generations of tools, [...]
|
|The only problem with this is keeping everything on file in a manner that
|allows users to find what they need.  This is non-trivial, as the information
|content of a work may not be limited by the author's conception of the its
|content.  Watch the PBS series "Connections" to see what I mean.  Machines
|are currently very good a fast data retrieval, but decidedly bad at making
|inferences about the data that they store.

With a general purpose hypertext system, humans would make the machine
record the connections as they were discovered.  See alt.hypertext.

From what I've read here in comp.sys.next, the "Digital Librarian" is not
a general purpose hypertext system.  I'm not even sure it's hypertext.

Scott Peterson  --  OMSO Software Engineering  --  Intel,  Hillsboro OR

          uunet!littlei\
  tektronix!reed!foobar >!sdp!sdp     -- or --     sdp@sdp.hf.intel.com
          psu-cs!foobar/

ebh@argon.UUCP (Ed Horch) (11/12/88)

In article <1803@garth.UUCP> fenwick@garth.UUCP (Stephen Fenwick) writes:
>One of my primary tools is a 14th Ed., 1929 Encyclopaedia Britannica.
>If it is true that the life of a CD-ROM is less than 50 years, I would
>now be seeing data loss.  This is unacceptable.

Well, you've got two alternatives.  Either the technology will have
advanced, and you can copy the data onto whatever the latest nifty
storage media is (terabit EPROMS? :-), or technology will have stag-
nated, and you'll have to settle for copying it onto another CD-ROM.
Suppose the life of a CD-ROM is only ten years - recopying your data
every *nine* years doesn't exactly sound like a full-time job.

Haven't you ever copied archival data from an old floppy or tape to a
fresh one?

On the other hand, what are you going to do with your Britannica when
the pages get too brittle to turn, and you're forced to put the books
in climate controlled storage to keep them from decaying further?

BTW, what's the life expectancy of microfilm and microfiche?

-Ed

This has strayed from anything NeXT-specific, so I've redirected
followups to comp.periphs.

sac@well.UUCP (Steve Cisler) (11/12/88)

I'm impressed with the longevity of the discussion about optical
publishing. As a librarian, I tend to think that the trails blazed by
NeXT (and other companies) to move information from one medium
to another will have more effect on society than the speed or choice
of CPU or DSP. Of course, the total package will make users more or
less apt to use these electronic libraries. As an example, William
Arms of Carnegie-Mellon U. spoke at the October 88 Educom Conference
about Project Mercury--an electronic library on campus, serving
the computer science and engineering departments (initially). What
struck me was his choice of a Sun workstation or equivalent as the
minimum quality interface for the user (i.e. no AT's or Mac SE's, etc).
Evidently, the large screen is extremely important to Arms; he wants
people to read on screen. Even the current displays for the Mac, NeXT
and Sun just don't have the same amount of information for the eye
as does a paperback. Consequently, displays have to really improve
before you will get an English professor to read onine. I think, though,
the types of text retrieval software that NeXT is bundling will help
get people to use the digital library instead of the print version of the
same works.

Does anyone have thoughts about traditional publishers willingness
to go into a new medium (and distribution method)? I think most are
afraid of cannibalizing their print market. What is going to woo them
away? CD-ROM has done it to a very limited extent.

Steve Cisler
Connect: Libraries and Telecommunications
Box 992
Cupertino, CA 95015

wald-david@CS.YALE.EDU (david wald) (11/14/88)

In article <398@uceng.UC.EDU> dmocsny@uceng.UC.EDU (daniel mocsny) writes:
>Wearying with her thoughts and labors, she tells her computer to
>save her work environment. She will return to it later. She pulls
>off her goggles and gloves, and slides them into a case on her belt.
>She seizes the oars, and slowly makes for shore.

Explicitly saving?  What's wrong with autosave?


============================================================================
David Wald                                              wald-david@yale.UUCP
						       waldave@yalevm.bitnet
============================================================================

dmocsny@uceng.UC.EDU (daniel mocsny) (11/15/88)

In article <42955@yale-celray.yale.UUCP>, wald-david@CS.YALE.EDU (david wald) writes:
> In article <398@uceng.UC.EDU> dmocsny@uceng.UC.EDU (daniel mocsny) writes:
> >Wearying with her thoughts and labors, she tells her computer to
> >save her work environment.

> Explicitly saving?  What's wrong with autosave?

The advantage of fiction is that I make the rules up as I go along. So I
can easily say:

1. Of course her computer maintains a triply-redundant audit trail of
everything she has ever done. That way it can perform statistical studies
of her usage patterns, and automatically optimize its command and file
structures to suit her. She tells the computer to save simply out of
habit, the same way one's boss tells one to do the things one was hired to
do and is consequently doing already.

2. Autosave wasn't in Virtual Workstation Release 2.1a. Release 2.1b is
out, but she hasn't bothered to upgrade yet.

Dan Mocsny

p.s. to the poster who wanted references to this sort of technology,
you can start with 

``NASA's Virtual Workstation: Using Computers to Alter Reality,''
NASA Tech Briefs, July/August 1988.

Also see Scientific American's article on advanced user interfaces,
published in their special issue on computing sometime in the Fall of
1987.

tim@hoptoad.uucp (Tim Maroney) (11/16/88)

In article <1147@xn.LL.MIT.EDU>, olsen@XN.LL.MIT.EDU (Jim Olsen) says:
> Imagine the value to those law students of having, for modest cost,
> the entire United States Code, Code of Federal Regulations, or United
> States Reports (Supreme Court decisions) in their shirt pockets!

In article <3177@mipos3.intel.com> ekwok@cadev4.UUCP (Edward C. Kwok) writes:
>Not very likely, each volume of the United States Report contains, on the
>average 1500 pages, and each page contains roughly 8000 characters. That
>makes about 12 Mbytes per volume. There are more than 470 volumes, the 
>last time I look. 

That's less than 6 Gigs.  I don't think that's an unrealistic
expectation for optical disks in 1998.  Of course, by then, there will
be more volumes.  A set of 10 or so current 660Mb disks is still going
to be a lot easier to deal with than a wall full of large books,
especially with indexing.  There'd be a main index on one disk that
also contained the latest volume(s) in progress; that one disk would
be periodically updated.

Unfortunately, the Next computer will turn out to require two floptical
drives to be useful for this kind of heavy-duty archiving.

Can anyone give us realistic compression and indexing estimates?  The
assumption that the two balance out is beginning to bother me.
-- 
Tim Maroney, Consultant, Eclectic Software, sun!hoptoad!tim
"Next prefers its X and T capitalized.  We'd prefer our name in lights in
Vegas."  -- Louis Trager, San Francisco Examiner