[comp.databases] Relational Database, with a Graphical type field

bg0l+@andrew.cmu.edu (Bruce E. Golightly) (06/27/90)

The Ingres extensions I talked about are still consistant with the requirements
be specified. What Ingres Corp is providing is a user-defined data type, which
might very well be graphical in nature. Along with that comes to ability and
responsibility to define appropriate functions for manipulating the new type.
These functions might well include over-loading of something basic like the
plus sign operator.

It sounds like the correct conclusions have already been drawn in the
discussion. OO extensions for a DBMS will provide an enormous potential
for sophisticated developers. That power, however, carries with it weighty
responsibilities.

segel@Tellabs.COM (Mike Segel) (06/28/90)

In article <1189@abcom.ATT.COM> brr@abcom.ATT.COM (Rao) writes:

First, let me stress that I am not working for Informix, and that
my posting is an educated guess not necessarily true. (Maybe Dr.
Scump aka Col. Panic can verify :-)

	The functionality you are looking for can be found in the multi
media DB Online by Informix. (Multimedia, cute little buzzword no?)
I belive Silicon Graphics introduced a product based on Informix's 
standard engine as well.

>
>	I would like to put forth the following necessary
>	functionality:
>
>	1) To be able to update the graphical entry of a tuple.
>
	This gets to be tricky. It is not efficient to actually store
the image as part of the tupple, instead actually store a pointer to
the image. In Online I believe it is a pointer to the DB space which
holds all of the images specified by the table. I think what SG did was
to have a field store the FD of the image and actually put locks on the 
file when that tuple was accesed, regardless if the image was shown.

>	2) To be able to change the attributes (i.e. charecteristics)
>	   of the graphical field.
>
	This is more of a function of the front end. The database
need only consider the image to be a large Binary Blob. Sort of
abstracts the data. #2 seems to be application dependant.

>	3) To be able to compare two fields of type "graphical".
>		This might be the hardest to define. A Griphics Guru
>		can be of help here.
>
	This is definately application dependant. Who knows what you
are storing in the BLOB. It could be a voice mail message, or some
non-visual binary field.

>	4) To implement all operations that are possible
>	   on generic fields using SQL, 
>	   i.e. Insert, Update, Make Null, etc.
>
>	5) Aditionally, to be able to zoom, reduce size, rotate, etc.
>	   graphical operations on the graphical field.
>
	#4 is a given, 'cept NULL would be difficult to explain.
#5 is really a process of the front end.

>Guess an object -oriented database is better for such a
>requirement, where function (operator) overloading can be
>used by polymorphism.
>
	I tend to disagree. Most applications which people say require
an OODBMS can be done on a relational DBMS. Granted I don't know much about
OODBMS's (I am still reading up on them when I get the time) but there is
a lot which can be done in relational tupples, if one takes an abstract
look at the data for a long enough period. [Guys don't flame me for this
one. OK?]
>
>Would like to invite suggestions/additions from netters
>and even other approaches.
>
 Why reinvent the wheel? I think Oracle might have a BLOB field, as well so
 might SYBASE. What they all lack is a good GUI which takes advantage
 of the back end ability. I think a possible reason for this, is that
such a front end would have to be too generic to be useful, or too
application specific or platform specific. Maybe someone from their companies
could verify?


>-bindu rama rao

-Gargoyle

cameron@kirk.nmg.bu.oz (Cameron Stevenson) (06/28/90)

From article <YaVp8zK00Uh7E0l15z@andrew.cmu.edu>, by bg0l+@andrew.cmu.edu (Bruce E. Golightly):
> We're looking at some similar areas for our next round of development.
> As the providers of voice and data services for the university, we must
> manage a cable plant, which implies that we need to handle maps and plans
> showing the locations of cables, wiring closets and outlets. Given those 
> goals, I am starting to look at the kinds of things mentioned.
> 
> Carnegie Mellon uses Ingres for administrative data base applications. A
> extension to Ingres has recently been announced that supports user-defined
> data types. I believe that this may be the key to what we wish to do.
> 
> More news as it develops.

We do exactly the same thing here at Bond University. However, we are not
trying to hold all the information within a relational database. Instead,
we run a CAD package (MicroStation - from Intergraph) to hold the graphical
information. This package allows for links between the graphical elements
and relational elements (rows in a table). The range of link types are 
supported ie. one to one, one to many, many to one, etc.. The links are
maintained by the CAD package, and run through Intergraph's relational
server. This effectively allows the CAD package to talk to 'any' database
which conforms to ANSI SQL. Currently there are links to ORACLE, Ingres, and
Informix.

Having established the links, it is possible to execute SQL queries through
the graphics system, to the relational database, and have the results 
displayed graphically ie. hilite in red all data outlets with PC's attached.

Intergraph also sell a development package which can handle these capabilities
through a forms based application, complete with screen gadgets (a sort of
SuperHyperCard ??).

ALSO ... without sounding too much like an advertisement ... MicroStation
runs on PC's (currently links to dBase, with hints of links to ORACLE),
Mac's (links to ORACLE), Intergraph's workstations (Sun/Apollo/Silicon
competitors) with all the goodies I mentioned earlier (relational server,
development package, links to multiple database systems). If that wasn't
enough, MicroStation will support both raster and vector graphics. So getting
floor plan information into the system can be extremely quick.

Send me some mail if you want more information, but I'd suggest you give
it a look if you haven't already considered it.

Cameron Stephenson                            Telephone  +61 75 951220
Bond University
Gold Coast     Australia

snorri@strengur.is (snorri) (06/28/90)

The new Informix OLTP engine (Informix OnLine) has the BYTE and TEXT datatypes.

The BYTE and TEXT datatypes are called BLOBs (Binary Large OBjects)
and have a theoretical limit of 2 gigabytes. 
One can store scanned images, voice, video, spreadsheet files, WP files, 
ordinary text files, etc. in those fields.

These BLOBs are selected from the database through standard sql and
all the Informix application tools (ISQL, 4GL, ESQL/C etc.) can access them.

-- 

Snorri Bergmann, Strengur Consulting Engineers, Reykjavik Iceland
INTERNET: snorri@strengur.is

bapat@rm1.UUCP (Bapat) (06/28/90)

In <897@dgis.dtic.dla.mil> jkrueger@dgis.dtic.dla.mil (Jon) writes:

>brr@abcom.ATT.COM (Rao) writes:

>>	I am trying to write a relational database
>>	that can be used to store graphical images
>>	...file name as string, reference to graphical file

>Then no atomic, serializable updates.  Is this OK?

Why is that? Why can I not update the graphical field atomically?
If I'm only doing single-transaction updates, why would serializability
be a concern?

>>	...a binary field to hold a large graphical image

>Lacks operations on columns of this type.
>It's unstructured and essentially display-only.

Agreed. One cannot do pattern matches on large binary data fields. For example,
one couldn't say 
   "select mug_shot from employee where <the employee wears glasses>."

Could a graphical type field be implemented as a blob?
(How do vendors implement blobs anyway?)
-- 
Subodh Bapat              bapat@rm1.uu.net     OR           ...uunet!rm1!bapat
MS E-204, P.O.Box 407044, Racal-Milgo, Ft Lauderdale, FL 33340  (305) 846-6068

"In the great journey of life, I seem to have lost my boarding pass."

kevinc@sequent.UUCP (Kevin Closson) (06/28/90)

In article <2873@tellab5.tellabs.com> segel@Tellabs.COM (Mike Segel) writes:
>In article <1189@abcom.ATT.COM> brr@abcom.ATT.COM (Rao) writes:
>	This is more of a function of the front end. The database
>need only consider the image to be a large Binary Blob. Sort of
                                      ^^^^^^^^^^^^^^^^^^
                          a large binary (B)inary (l)arge (ob)ject ??????

moiram@tekcae.CAX.TEK.COM (Moira Mallison) (06/29/90)

In article <2873@tellab5.tellabs.com> segel@Tellabs.COM (Mike Segel) writes:
>>	1) To be able to update the graphical entry of a tuple.
>>
>	This gets to be tricky. It is not efficient to actually store
>the image as part of the tupple, instead actually store a pointer to
>the image. 

The problem with this is that you lose one of the important aspects
of the relational model, ie all the data resides in relations.
Now, some of the attributes hold data, and some hold pointers, and
it's up to the application to know what to do with the pointers.
I don't see this as a step forward.

>>	3) To be able to compare two fields of type "graphical".
>>		This might be the hardest to define. A Griphics Guru
>>		can be of help here.
>>
>	This is definately application dependant. Who knows what you
>are storing in the BLOB. It could be a voice mail message, or some
>non-visual binary field.

Advances in database technology will ideally make the DBMS smarter.
A BLOB does not.  "All I've got is a whole bunch of bytes.  I don't
know what to do with them.  You better figure that out."  The more
information that can be stored in the DBMS, the less effort will
be expended to build applications around it.   What is stored in
the DBMS can be more easily shared and re-used.

>>Guess an object -oriented database is better for such a
>>requirement, where function (operator) overloading can be
>>used by polymorphism.
>>
>	I tend to disagree. Most applications which people say require
>an OODBMS can be done on a relational DBMS. Granted I don't know much about
>OODBMS's (I am still reading up on them when I get the time) but there is
>a lot which can be done in relational tupples, if one takes an abstract
>look at the data for a long enough period. 

There is a lot that can be done with the relational model, but that
doesn't mean that the relational model can be all things to all 
people.  And some things that can be done with the relational model
cannot necessarily be done EASILY with the relational model.   One
of the attractive aspects of the relational model is it's simplicity
and it's rigor.  But you start to lose that when you start storing
pointers to other objects (get it?) in your attributes.  If that's
what you need to do, get a DBMS that more fully supports the operations
on the data.

Moira Mallison
CAx Data Management
Tektronix, Inc

jkrueger@dgis.dtic.dla.mil (Jon) (06/29/90)

bapat@rm1.UUCP (Bapat) writes:

>>>	...file name as string, reference to graphical file

>>Then no atomic, serializable updates.  Is this OK?

>Why is that? Why can I not update the graphical field atomically?

How do you roll back an update to a file?

>If I'm only doing single-transaction updates, why would serializability
>be a concern?

1) other users doing concurrent updates
2) why give up transactions?  you lose atomic updates to
structured data

>one couldn't say 
>   "select mug_shot from employee where <the employee wears glasses>."

In the sense that such operations aren't defined on unstructured BLOBs,
indeed on couldn't.  But there are ways of defining them on data types;
in fact, the data type is defined by the operation.  A useful graphical
data type is one on which one can select by graphic features.

-- Jon
-- 
Jonathan Krueger    jkrueger@dtic.dla.mil   uunet!dgis!jkrueger
Drop in next time you're in the tri-planet area!

miket@blia.BLI.COM (Mike Tossy) (06/29/90)

>
> I am trying to write a relational database
> that can be used to store graphical images
> ...file name as string, reference to graphical file
> 

The ShareBase II product line includes the ability to store UNIX like
- byte array - files on the relational database server.  File operations
are governed by the same transaction management system as the relational
database, therefore code like this:

  set autocommit off
  select blah, blah ....
  file operation ...
  set autocommit on

does indeed result in the dbms and file operations being atomic.  (The name
of the file holding the drawing becomes a column in the table.)

This technique has been used for storing both digitized photographs and
CAD/CAM drawings in files and data about those drawings in tables. 


  Mike Tossy                                      ShareBase Coropration
  miket@blia.bli.com                              14600 Wichester Blvd
  (408) 378-7575 ext2200                          Los Gatos, CA 95030
					  (Formerly: Britton Lee, Inc.)

jkrueger@dgis.dtic.dla.mil (Jon) (06/29/90)

moiram@tekcae.CAX.TEK.COM (Moira Mallison) writes:

>Advances in database technology will ideally make the DBMS smarter.
>A BLOB does not.  "All I've got is a whole bunch of bytes.  I don't
>know what to do with them.  You better figure that out."  The more
>information that can be stored in the DBMS, the less effort will
>be expended to build applications around it.   What is stored in
>the DBMS can be more easily shared and re-used.

Strongly agree.

I like to ask people if they would tolerate a DBMS which had no date or
date/time data type.  What's the problem?  You can just use seconds
since 1970, right?  You don't mind coding conversion routines into each
application, do you?  Writing parsing and query generation into
otherwise trivial programs?  OK, how about storing dates in a group of
text fields, then you don't have to do that, you just lose integrity
(e.g. dates like "9/9/99" or "2/29/81"), programmer productivity try
selecting on a data range) and performance (try executing the above
select).

Most people begin to agree that ADT's are a Good Thing for DBMS.

For hard core cases I ask if they would tolerate a DBMS without an
integer type.  What's the problem?  Just store them as bitstrings, you
can write your own math routines, right?  Integers -- who needs 'em.
Also floats -- I can write my own IEEE compliant math into every
application.

I haven't yet had anyone say that's OK.

Then I like to opine that someday we'll feel the same way about a DBMS
without support for user-defined data types.  It's all a matter of
where we draw the line.  Today we accept DBMS that support basic data
types important to business.  Some day we'll want more.  We won't need
them for every application, but then every application doesn't need
database management either.  Finding the right tool for the job will
be more straightforward someday.  Right now it can be a frustrating
exercise in weighing tradeoffs.

-- Jon
-- 
Jonathan Krueger    jkrueger@dtic.dla.mil   uunet!dgis!jkrueger
Drop in next time you're in the tri-planet area!

nigelc@cognos.UUCP (Nigel Campbell) (06/29/90)

In response to D.L. asking about anything else 

	try Starbase from Cognos or Interbase from Interbase

It has a blob datatype which can also be supported by filters
which are described to the database kernel . This allows you 
to filter one blob to another type e.g. a compressed blob 
to an uncompressed blob etc . This is a DSRI compliant database
and is supported by PowerHouse 4.g.l. from us . 


-- 
Nigel Campbell          Voice: (613) 783-6828                P.O. Box 9707
Cognos Incorporated       FAX: (613) 738-0002                3755 Riverside Dr.
uucp: nigelc@cognos.uucp || uunet!mitel!sce!cognos!nigelc    Ottawa, Ontario
                                                             CANADA  K1G 3Z4

llojd@rivm.UUCP (Jan Diesel) (06/29/90)

In the discussion concerning the storage of graphical data in rdbms's
the (new) Ingres user-defined datatype has been mentioned a few times.
As far as I know the maximum size of *any* Ingres datatype/column is 2Kb.
In my opinion this is a serious drawback for storing graphical data.

-------------------------------------------------------------------
Jan Diesel            llojd@rivm.UUCP
National Institute for Public Health and Environmental Protection
Laboratory for Air Research
P.O.Box 1, 3720 BA  BILTHOVEN, The Netherlands.
-------------------------------------------------------------------

segel@tellabs.com (Mike Segel) (06/30/90)

In article <6207@tekgen.BV.TEK.COM> moiram@tekcae.CAX.TEK.COM (Moira Mallison) writes:
>In article <2873@tellab5.tellabs.com> segel@Tellabs.COM (Mike Segel) writes:
>>>	1) To be able to update the graphical entry of a tuple.
>>>
>>	This gets to be tricky. It is not efficient to actually store
>>the image as part of the tupple, instead actually store a pointer to
>>the image. 
>
>The problem with this is that you lose one of the important aspects
>of the relational model, ie all the data resides in relations.
>Now, some of the attributes hold data, and some hold pointers, and
>it's up to the application to know what to do with the pointers.
>I don't see this as a step forward.
>

	Well, the problem arises when you now have each tuple being 1 - 2 Meg
	in size. So now how do you efficiently sort on non graphic data?
	Or keep a multituple view in memory?
	What I am saying, is that the database stores a pointer to where the
	graphical information is stored. For example, I BELIEVE Informix's Online
	stores a pointer to the Blob area inside the database. When the end user
	/ applications programmer goes to fetch the blob, he does not see this,
	he/she will fetch it like it is part of the tuple. (Again I am assuming
	this, perhaps someone from Informix will elaborate.) My point is, that
	You are decreasing the performance of your engine if you actually try
	to store the image as part of the tuple. Whether you let the application
	see the pointer, or keep it internal to the engine, is up to you.

>>>	3) To be able to compare two fields of type "graphical".
>>>		This might be the hardest to define. A Griphics Guru
>>>		can be of help here.
>>>
>>	This is definately application dependant. Who knows what you
>>are storing in the BLOB. It could be a voice mail message, or some
>>non-visual binary field.
>
>Advances in database technology will ideally make the DBMS smarter.
>A BLOB does not.  "All I've got is a whole bunch of bytes.  I don't
>know what to do with them.  You better figure that out."  The more
>information that can be stored in the DBMS, the less effort will
>be expended to build applications around it.   What is stored in
>the DBMS can be more easily shared and re-used.
>
	What sort of information is contained in the blob is application
	dependant. I prefer to take the minimalist approach in designing
	back-ends. The less intelligent the backend, the greater ability 
	to treat data in an abstract fashion. Yes, it puts more emphasis
	on the 4GL or front-end, but it allows for a wider range of potential
	applications to be developed.
		Now, not being an object oriented guru,  I thought that if I were
	designing an OODBMS, I would keep the backend as simple as possible
	and let the front end do all the work.

>and it's rigor.  But you start to lose that when you start storing
>pointers to other objects (get it?) in your attributes. 
	Not really. How the back-end stores the information should be a black
	box to the front-end. As long as the front end can get the information
	in a timely fashion.
>
>Moira Mallison

-Mike Segel


--
Mike Segel         | uunet!balr.com                    | Std.disclaimer 
BALR Corporation   | segel@quanta.eng.ohio-state.edu   | implied and 
Oakbrook, Illinios | uunet!tellabs.com!segel           | understood
-------------------^-----------------------------------^----------------

ghm@ccadfa.adfa.oz.au (Geoff Miller) (07/02/90)

This discussion started, as I recall, around the problem of a database to
store ID images, so maybe someone can answer this question  -  can acceptable
ID images be made up of standard components, as in the "Identikit" system
used by the police?  If so, then that would provide both simplified storage
and a method for comparison of images, but it only works if the resulting
images are accurate enough (and looking at the Identikit pictures the police
release I would have to wonder about that).

Geoff Miller
ghm@cc.adfa.oz.au

brianc@labmed.ucsf.edu (Brian Colfer) (07/03/90)

In article <6207@tekgen.BV.TEK.COM> moiram@tekcae.CAX.TEK.COM (Moira Mallison) writes:
>In article <2873@tellab5.tellabs.com> segel@Tellabs.COM (Mike Segel) writes:
>>>	1) To be able to update the graphical entry of a tuple.
>>>
>>	This gets to be tricky. It is not efficient to actually store
>>the image as part of the tupple, instead actually store a pointer to
>>the image. 
>
>The problem with this is that you lose one of the important aspects
>of the relational model, ie all the data resides in relations.
>Now, some of the attributes hold data, and some hold pointers, and
>it's up to the application to know what to do with the pointers.
>I don't see this as a step forward.

This is not true with all BLOB type systems.  For example,
with Informix-Online the engine treats it as if it were stored as part
of the tuple so to the application it doesn't matter...

I don't know the actual algorithms in OnLine ... I only know that it
doesn't really matter since I can almost treat the engine as a black data
box.

>
>>>	3) To be able to compare two fields of type "graphical".
>>>		This might be the hardest to define. A Griphics Guru
>>>		can be of help here.
>>>
>>	This is definately application dependant. Who knows what you
>>are storing in the BLOB. It could be a voice mail message, or some
>>non-visual binary field.
>
>Advances in database technology will ideally make the DBMS smarter.
>A BLOB does not.  "All I've got is a whole bunch of bytes.  I don't
>know what to do with them.  You better figure that out."  The more
>information that can be stored in the DBMS, the less effort will
>be expended to build applications around it.   What is stored in
>the DBMS can be more easily shared and re-used.

Non-numeric, Non-date/time, and non-ASCII data is not very well defined.
It would be interesting to have a system which knew about all the common
graphic types, GIF, TIFF, XBM, XWD, CGM, PostScript etc.... and 
all the important spreadsheet types ... and all the sound storage types ...
and ...

I think this is better handled in the front end ... there are some
free utilities to deal with most of these types which can be
integrated into the frontend.

>
>>	               .... Most applications which people say require
>>an OODBMS can be done on a relational DBMS. 

>There is a lot that can be done with the relational model, but that
>doesn't mean that the relational model can be all things to all 
>people.  And some things that can be done with the relational model
>cannot necessarily be done EASILY with the relational model.   One
>of the attractive aspects of the relational model is it's simplicity
>and it's rigor.  

OODBMS may be the solution to some of the limitations of current
relational database products.   But I am not yet convinced.  I don't
see what advantage OODBMS has over relational databases.  For example,
one advantage proposed by some OODBMS advocates is in the ability to
specify methods, data transform procedures, as a part of an objects 
definition.  While not as abstract but certainly more clear one can
define such transforms through a system table.  

The ability for the relational model to accomadate self reference, e.g.
system tables, coupled with its inherent algebraic character makes it
the most powerful strategy to deal with data.   

--
Brian  Colfer          | UC San Francisco        |------------------------|
                       | Dept. of Lab. Medicine  | System Administrator,  |
brianc@labmed.ucsf.edu | S.F. CA, 94143-0134 USA | Programer/Analyst      | 
BRIANC@UCSFCCA.BITNET  | PH. (415) 476-2325      |------------------------|

davek@infmx.UUCP (David Kosenko) (07/03/90)

In article <6207@tekgen.BV.TEK.COM> moiram@tekcae.CAX.TEK.COM (Moira Mallison) writes:
>In article <2873@tellab5.tellabs.com> segel@Tellabs.COM (Mike Segel) writes:
>>>	1) To be able to update the graphical entry of a tuple.
>>>
>>	This gets to be tricky. It is not efficient to actually store
>>the image as part of the tupple, instead actually store a pointer to
>>the image. 
>
>The problem with this is that you lose one of the important aspects
>of the relational model, ie all the data resides in relations.
>Now, some of the attributes hold data, and some hold pointers, and
>it's up to the application to know what to do with the pointers.
>I don't see this as a step forward.
>

	Mr. Segel was somewhat incorrect in his description of 
INFORMIX-OnLine's treatment of BLOB data types in this respect.  There
is the ability, internal to the database engine, to store actual BLOB
data in a physical space seperate from the rest of the data.  This is
done for various reasons, most relating to efficiency of access.  What
is stored, again internal to the database, in the actual tuple data is
a pointer to the location of this BLOB data; however, to the end user,
this data is still held in the relation itself.  You do not need to know
about this internal structure in your applications, and the relational
model is maintained.  I hope you can see this implementation as a step
forward.

>>>	3) To be able to compare two fields of type "graphical".
>>>		This might be the hardest to define. A Griphics Guru
>>>		can be of help here.
>>>
>>	This is definately application dependant. Who knows what you
>>are storing in the BLOB. It could be a voice mail message, or some
>>non-visual binary field.
>
>Advances in database technology will ideally make the DBMS smarter.
>A BLOB does not.  "All I've got is a whole bunch of bytes.  I don't
>know what to do with them.  You better figure that out."  The more
>information that can be stored in the DBMS, the less effort will
>be expended to build applications around it.   What is stored in
>the DBMS can be more easily shared and re-used.
>

	Yes, and BLOBs do fit this criteria.  The ability to store
arbitrarily large data items in a relation, and have it accessible to
all who can query that relation does, in many ways, "make the DBMS smarter."



Dave Kosenko

n
e
w
s
f
o
d
d
e
r

-- 
Disclaimer:  The opinions expressed herein | There's more than one answer
are by no means those of Informix Software | to these questions pointing me
(though they make you wonder about the     | in a crooked line...
 strange people they hire).                |

jkrueger@dgis.dtic.dla.mil (Jon) (07/05/90)

segel@tellabs.com (Mike Segel) writes:

>Well, the problem arises when you now have each tuple being 1 - 2 Meg
>in size. So now how do you efficiently sort on non graphic data?

By using appropriate storage structures, access methods, sort
algorithms, query decomposition (might reduce rows to be sorted based
on other query terms, even eliminate the need for sort if 0 or 1
returned row).

>Or keep a multituple view in memory?

Not a problem.  Perhaps a solution to some other, unstated, problem.
Besides "which memory", etc., e.g. how to distribute over net.

>You are decreasing the performance of your engine if you actually try
>to store the image as part of the tuple.

Neither true nor false.  Which usually indicates you aren't honing
in on the right issues.  "ADT's don't kill performance, people
who don't know how to use them kill performance"  :-)

>I prefer to take the minimalist approach in designing
>back-ends. The less intelligent the backend, the greater ability 
>to treat data in an abstract fashion.

What would be an example of this?  To weigh against all the
counter-examples.

-- Jon
-- 
Jonathan Krueger    jkrueger@dtic.dla.mil   uunet!dgis!jkrueger
Drop in next time you're in the tri-planet area!

segel@tellabs.com (Mike Segel) (07/09/90)

	[As a point of clarification, I am talking about DB internal mechanisms
	 for handling Blobs. Anyone confused by this discusion should sit back
	 and think for a while. Please do not use my discussion as an opinion on
	 the performance of any database or database company. HAPPY DAVE? ;-]

In article <913@dgis.dtic.dla.mil> jkrueger@dgis.dtic.dla.mil (Jon) writes:
>segel@tellabs.com (Mike Segel) writes:
>
>>Well, the problem arises when you now have each tuple being 1 - 2 Meg
>>in size. So now how do you efficiently sort on non graphic data?
>
>By using appropriate storage structures, access methods, sort
>algorithms, query decomposition (might reduce rows to be sorted based
>on other query terms, even eliminate the need for sort if 0 or 1
>returned row).
>
	Jon, you are missing the point. By keeping the Blob as part of the tuple,
	you have now a tuple of 2Meg (+- rest of tuple) in width. So each of
	your records are now 2Meg in width. How many and how efficiently can
	you sort/querry ect. on a tuple of this size. How much memory
	is now required. (How much swap?).  Lets say you build up a cursor
	and it contains 3 rows. That could be around 6 Meg of memory/swap.
	Not to effiecient since 2 Meg of that data is not required for any
	join or relational function. (Yet...)

	As well as the fact that not all tuples will have a blob attached, but will
	have to have space allocated for a blob. 

	All of these problems are reduced when you just have a pointer as part of
	the tuple. The pointer can point to the Blob storage area of a raw disk
	(Informix Online), or to a file or directory in Unix. 

	At the end of last year, Silicon Graphics announced a database product
	which allowed for the storage of Blobs based on Informix's Standard Engine.
	How could they do this? Having not seen the package, or source code, 
	I can only conjecture on the following....

		The front end I belive was in X, or some other windowing environment.
		So it could have been written in ESQL/C. The storage of the blobs
		could be that they store all the blobs as a single file, or a directory
		of multiple files. (I would think that multiple files is better.)
		Then the only question would be, How do they perform locking? This is 
		fairly straightforward and has been published in several books.
		Then the tuple need only contain an FD to the graphic blob. Of course 
		there are some other potential problems which are irrelevant to this
		discussion.

	The point is, they (SG) and Informix are providing the ability of ADT's
	by allowing for Blobs. I think back that the discussion evolved from 
	trying to allow for ADT's like graphical images, sounds, text, or various
	other fields. This can all be accomplished withing the relational model.

>
>Neither true nor false.  Which usually indicates you aren't honing
>in on the right issues.  "ADT's don't kill performance, people
>who don't know how to use them kill performance"  :-)
>
	Yeah. It's like tuning the back-end to gain performance when a series
	of code reviews and a rethink of the specs would do more good ;-)

>>I prefer to take the minimalist approach in designing
>>back-ends. The less intelligent the backend, the greater ability 
>>to treat data in an abstract fashion.
>
>What would be an example of this?  To weigh against all the
>counter-examples.
>
	Simple. Take the idea of a blob. In informix, it is a byte stream
	of up to 2 Meg. Now, with this simple type, you can now allow for 
	a database to contain Images, voice/sound, or any other data which
	is in its simplest form, digital information.
		
	Now informix also allows for a varchar and I belive an another type
	of stream. (I need to go back and check so don't flame me if I am wrong.)
	What they have done, is to have the back end define the basic building
	blocks which will allow for other ADT's.

		This is great for certain applications. One example, is the real estate
	demo, informix uses for online. They have this demo done twice. Once using
	Sunview on a Sun workstation and a CD Rom device, and the other on a Mac.
	running Wingz. (Another fine product from Informix ;-) Now, both show you
	a raster of the house, and different views. How is it stored? How can you
	take a raster/gif/ picture which is required for two or three diferent 
	machines, and store it in the DB? You could create an ADT for each 
	raster/image format, but that means storing the photograph in the DB
	several times. Or you could separate the header information from the blob,
	then have the front end application, based on the machine, reasemble the
	image in the correct format and the header information. 
	
	So now your front-end application needs to be a little smarter, yet your
	back-end is capable of supporting various front ends without having to 
	be modified. My point is that the DB backend should be storing the data 
	in its simplest components rather than trying to handle data in its more
	complex forms.

>-- Jon

- Mike (" I am no expert. Noone pays me for my opinions" ;-)

--
Mike Segel         | uunet!balr.com                    | Std.disclaimer 
BALR Corporation   | segel@quanta.eng.ohio-state.edu   | implied and 
Oakbrook, Illinios | uunet!tellabs.com!segel           | understood
-------------------^-----------------------------------^----------------

jkrueger@dgis.dtic.dla.mil (Jon) (07/10/90)

segel@tellabs.com (Mike Segel) writes:

>Jon, you are missing the point. By keeping the Blob as part of the tuple,
>you have now a tuple of 2Meg (+- rest of tuple) in width.

No.  It appears that way to queries that don't project fewer columns,
though.  The old virtual/transparent distinction.  How the engine
manages resources like disk storage isn't visible to folks that
send queries to the engine.  Nor how the engine selects or sorts
on the large columns -- could be that it trims trailing whitedots,
uses G3 compression, uses sparse matrix algorithms, etc.  And of
course avoiding exhaustive scan of large columns isn't any different
in principle from avoiding exhaustive scans of many rows.  It is
harder to implement, however; witness that no commercial product
of which I'm aware provides lazy fetching of columns.

>As well as the fact that not all tuples will have a blob attached, but will
>have to have space allocated for a blob. 

No.  Look at how VM algorithms work.  Empty cols can cost small fixed
allocations.  Instances of BLOBs can cost in proportion to their
contents.  This even presumes that tables and cols still appear to be
fixed length, which isn't a hard requirement; they could expand
arbitrarily to fit their contents, too.  But even without that they
can appear as fixed length while being implemented in cheaper ways.
Trailing whitespace compression for text cols works this way now.

>All of these problems are reduced when you just have a pointer as part of
>the tuple. The pointer can point to the Blob storage area of a raw disk
>(Informix Online), or to a file or directory in Unix. 

If the pointer appears different from ordinary objects to the user, you
lose the simplicity and safety of the data model.  If not, why call it
a pointer?  Also it's unlikely the overhead of the UNIX filesystem is
going to be your bottleneck.  Work smarter!  Don't expect your image
database will get its best performance on the highest bandwidth, some
engines will use processor to avoid some of those bit copies.

>The point is, they (SG) and Informix are providing the ability of ADT's
>by allowing for Blobs.

But ADT's have nothing to do with BLOBS.  Nothing.  Consider bignums,
arbitrary precision floats, etc.  They have everything to do with
defining operations on objects of that type, and preventing access to
and manipulation of objects of that type via other means than the
defined operations.

>I think back that the discussion evolved from 
>trying to allow for ADT's like graphical images, sounds, text, or various
>other fields. This can all be accomplished withing the relational model.

Correct.  But these are just the sexy data (lit. and fig., in some
cases :-)  Consider what it would mean to scientific and engineering
folk to have a database with a numeric type that doesn't overflow.

>...Sunview on a Sun workstation and a CD Rom device, and the other on a Mac.
>running Wingz. Now, both show you
>a raster of the house, and different views. How is it stored? How can you
>take a raster/gif/ picture which is required for two or three diferent 
>machines, and store it in the DB? You could create an ADT for each 
>raster/image format, but that means storing the photograph in the DB
>several times. Or you could separate the header information from the blob,
>then have the front end application, based on the machine, reasemble the
>image in the correct format and the header information. 

You're confusing two, no three issues here.  One is remote data access,
one is defining families of image data types, and one is database
design.  In times of old this was known as the incompatible subroutine
library problem -- different floating point formats at different
precisions, and overlapping sets of routines for each format, but worse
yet specific to the language, operating system, and processor you call
them from.  Calling sequence, don't you know.  It took twenty years to
standardize on IEEE floats, write mostly portable libraries, and
arrange for common (or at least sane) linkage schemes.  It is now
possible to write programs that use floats and expect them to behave
the same way across a wide variety of machines.  It is *still* not
possible to share binary floats between programs running on different
Mwchines in a net -- without conversion, which is the point.  That's a
separate problem.  And yet a third problem is designing good databases
to get shared.

In the example you cite, the right solution is to design an image type
appropriate for the pictures of houses you have, define this type to
your database engine, populate the database, provide common access to
your network of (different) machines, and, decide on a common model
for painting the pictures on different display hardware.  Notice how
little of this is a database problem.  Also notice that we still want
the engine to understand the image data type, not each front end.

>My point is that the DB backend should be storing the data 
>in its simplest components rather than trying to handle data in its more
>complex forms.

But *floats* are complex.  Very.  Interpretation of bitmapped images
should learn from this.  We have and can examine engines that share
floats among different hardware architectures.  They do *not* do this
be keeping the engine ignorant of what the bits in a float mean.  They
do not do this by letting each front end decide what floats mean to
it.  Access to had better not equal subversion of.

-- Jon
-- 
Jonathan Krueger    jkrueger@dtic.dla.mil   uunet!dgis!jkrueger
Drop in next time you're in the tri-planet area!