steve@comp.vuw.ac.nz (Steve Cassidy) (02/27/91)
I need some help in finding an appropriate representation for my data. I am storing a set of objects (they are, in fact, rules for translating text to speech) for which I have an ordering relation (generality -- the rule applies to more pieces of text) which defines only a partial ordering on the objects (eg. x>y -- yes; x>z -- dunno; x>w -- no). I need to do three operations: 'traverse' -- look at the least general rule first, and so on until a match is found (ie maybe not to the end) insert -- insert a new object into the partial ordering delete -- delete an object from the partial ordering 'traverse' is done more frequently than insert and delete but all are done quite often. There will be between five and fifty objects in each partial ordering. The test for ordering is relatively expensive in this case as it requires a match of the two rules. It seems that to do insert I need to compare the new object with all existing objects to find its place in the ordering, is there any way that I can avoid doing this? (for instance, if I know that x>y I know I don't have to compare x with anything smaller than y - but can I take advantage of this?) So, is there an efficient way of representing this as a prolog datastucture? I would prefer to do updates without copying (a la difference lists), but be able to 'garbage-collect' deleted items (probably by copying) every now and then. Thanks, Steve Cassidy Computer Science, steve@comp.vuw.ac.nz Victoria University of Wellington, ==================== Box 600, Voice: +64 4 715328 Wellington, New Zealand. Fax: +64 4 712070
steve@comp.vuw.ac.nz (Steve Cassidy) (03/01/91)
I wrote: >I need some help in finding an appropriate representation for >my data. I am storing a set of objects (they are, in fact, rules >for translating text to speech) for which I have an ordering >relation (generality -- the rule applies to more pieces of text) which >defines only a partial ordering on the objects (eg. x>y -- yes; >x>z -- dunno; x>w -- no). Let me attempt to answer my own question. What I'm after representing is a directed acyclic graph (dag). After a little more thought (and a good nights sleep) I came up with the idea of using adjacency lists, where each node has associated with it a list of nodes that follow it in the ordering. So the dag: a b c \ / / d / ( where the arrows go from top to bottom) / \ / (f) e would be written as: [a/[], b/[], c/[], d/[a,b], e/[a,b,c,d]] Adding an item to the dag (say f in brackets above) involves comparing it with the existing elements using our ordering relation. If f > X then X is added to the adjacency list of f, if X > f then f is added to the adj list of X. In the above case we must add f/[a,b,d] to the list. Note that this results in us keeping the full transitive closure of the dag at all times, this helps for deletion. Deletion is a problem, it would seem to require copying all or most of the existing structure, plus to remove, say, b we need to remove b's adjacency list and remove b from all other adjacency lists. My solution is to represent each node as a pair n( a, DelA ) where DelA is a deleted flag which will be unbound if a has not been deleted. The above dag would now look like: [n(a, DA)/[], n(b, DB)/[], n(c, DC)/[], n(d, DD)/[n(a, DA),n(b, DB)], n(e, DE)/[n(a, DA),n(b, DB),n(c, DC),n(d, DD)]] To delete a now I just unify DA = del. My access routines must ignore items marked as deleted and, if there are a lot of deleted items all operations will be slowed but deletion itself is very quick, which is what I wanted. Since a lot of items may be deleted it is a good idea to get rid of the deadwood every now and then. We can do this by copying the entire dag and leaving out the deleted items. I've implemented this using binary trees to store both the set of nodes/adjacency lists and the individual adjacency lists. My initial trials seem to work, I have not yet tried it in anger. So, any opinions on this solution? Is it good style to use unbound variables as deletion flags as I have done? If there are any flaws in my solution I'd appreciate some clues. Thanks, Steve Cassidy ========================================================== Computer Science, steve@comp.vuw.ac.nz Victoria University of Wellington, ==================== Box 600, Voice: +64 4 715328 Wellington, New Zealand. Fax: +64 4 712070 ----------------------------------------------------------
ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (03/01/91)
In article <1991Feb27.032403.23920@comp.vuw.ac.nz>, steve@comp.vuw.ac.nz (Steve Cassidy) writes: [He has a set of terms with a partial order "generality" defined on them, the comparison is expensive.] > I need to do three operations [on collections of 5 to 50 such terms]: > 'traverse' -- look at the least general rule first, and so on > until a match is found (ie maybe not to the end) [most frequent] > insert -- insert a new object into the partial ordering > delete -- delete an object from the partial ordering [these two less frequent but common] The bad news is that when you insert the next term into a collection of 49 terms, you may find that *none* of them is comparable to the new term. That means that the worst case of anything you come up with is going to be as bad as linear search. This is not a Prolog problem, and I would like to know of a good data structure for it myself. There was one time when I had to maintain a collection of sets, but in that case it dawned on me that ordering sets by cardinality also ordered them by inclusion. Can a similar trick be used? -- The purpose of advertising is to destroy the freedom of the market.
hassan@prl.dec.com (Hassan Ait-Kaci) (03/01/91)
> From: steve@comp.vuw.ac.nz (Steve Cassidy) > Newsgroups: comp.lang.prolog > Subject: Re: Partial Orders > Date: 1 Mar 91 00:57:26 GMT > References: <1991Feb27.032403.23920@comp.vuw.ac.nz> > Organization: Dept. of Comp. Sci., Victoria Uni. of Wellington, New Zealand. > Nntp-Posting-Host: climie.comp.vuw.ac.nz > > What I'm after representing is a directed acyclic graph (dag). > After a little more thought (and a good nights sleep) I came > up with the idea of using adjacency lists, where each node has > associated with it a list of nodes that follow it in the > ordering. > > [...] > > Note that this results in us keeping the full transitive > closure of the dag at all times, I'm not sure that I understand exactly what you're after, but the above comments strongly suggest to me that perhaps you may find some relevant material in the following article. Regards, -hak @ARTICLE{latticeops , AUTHOR = {Hassan A\"{\i}t-Kaci and Robert Boyer and Patrick Lincoln and Roger Nasr} , TITLE = {Efficient Implementation of Lattice Operations} , JOURNAL = {ACM Transactions on Programming Languages and Systems} , YEAR = {1989} , VOLUME = {11} , NUMBER = {1} , PAGES = {115--146} , MONTH = {January} }
ged@terra.ucsc.edu (03/05/91)
> From: steve@comp.vuw.ac.nz (Steve Cassidy) > >Let me attempt to answer my own question. What I'm after representing >is a directed acyclic graph (dag). After a little more >thought (and a good nights sleep) I came up with the idea of >using adjacency lists, where each node has associated with >it a list of nodes that follow it in the ordering. This transitive closure representation uses O(n^2) space. While a dag representation of the non-transitive representation of a poset is still O(n^2), it uses a subset of the space of the transitive dag. Then the non-transitive representation of your poset a b c \ / / d / \ / e would be written as: [a/[], b/[], c/[], d/[a,b], e/[c,d]] This problem has been attacked by people from a number of areas: chemical substructure search, chess playing systems, semantic network retrieval. All of which have complex comparison operations (graph isomorphism). Before we could help you further we need to know the structure of the terms in the poset. As Richard O'Keefe pointed out: ordering sets by cardinality also ordered them by inclusion. Similarly, cardinality can be used for graphs. However, terms such as semantic networks which have abstraction over terms, size is not a necessary condition. Size is still only a very coarse first step. Searching the dag of a poset can in fact be worse than linear, since a traversal is O(m + n), where the number of edges, m, could be O(n^2) in the worse case. Ideally we would like a data structure similar in performance to a balanced tree for total orders. Unfortunately we cannot select an arbitrary pivot element that is used in balanced trees. The dag representation is useful for posets whose ordering is close to a balanced tree: wide and shallow hierarchies. Some references which you may find of interest include: @techreport{Levinson89, author = {Robert A. Levinson}, title = {A Self-Organizing Pattern Retrieval System and Its Applications}, institution = {University of California}, year = {1989}, number = {UCSC-CRL-89-21}, month = {September}} @inproceedings{Ellis90a, author = {Gerard Ellis}, title = {Compiling Conceptual Graphs}, booktitle = {CGW5}, year = {1990}, address = {Boston\&Stockholm}, publisher = {Link{\"o}ping University}, editor = {Peter Eklund and Laurie Gerholz}, series = {ISBN 91-7870-718-8}, pages = {23-32}} @inproceedings{Ellis90b, author = {Gerard Ellis}, title = {Sorting Conceptual Graphs}, booktitle = {Proceedings of the 5th Annual Workshop on Conceptual Structures}, year = {1990}, address = {Boston\&Stockholm}, publisher = {Link{\"o}ping University}, editor = {Peter Eklund and Laurie Gerholz}, series = {ISBN 91-7870-718-8}, pages = {265-280}} I have a more recent paper of my work if you are interested. Gerard Ellis Department of Computer and Information Sciences 227 Applied Sciences Building University of California Santa Cruz, CA 95064 U.S.A. ARPANET: ged@terra.ucsc.edu PH: (408) 459 4043 423 6734 (home) FAX: 429 0146
steve@comp.vuw.ac.nz (Steve Cassidy) (03/05/91)
Gerard Ellis writes: > >This transitive closure representation uses O(n^2) space. >While a dag representation of the non-transitive representation of a poset is >still O(n^2), it uses a subset of the space of the transitive dag. I chose to store the transitive dag to make the traverse operation faster, having just thought about it some more, it doesn't have that effect. However, it makes deletion a whole lot simpler and in my case the space cost isn't the problem. >Before we could help you further we need to know the structure of the >terms in the poset. As Richard O'Keefe pointed out: ordering sets by >cardinality also ordered them by inclusion. Similarly, cardinality can >be used for graphs. However, terms such as semantic networks which have >abstraction over terms, size is not a necessary condition. Size is >still only a very coarse first step. My terms look like: 447:[class24]/[a]/[class42, e, #]/[ey] RuleNo:LeftContext/Graph/RightContext/Phon as I said, they are rules for translating text to speech. The ordering is on generality, more general rules apply to a superset of the text strings that the less general rules apply to. Two rules can be orthogonal if the sets of strings that they cover don't overlap; they might also be 'equal', if the sets overlap -- although in my application this shouldn't occur. The classes above are just sets of letters, eg class24 = |cfhl|. Thanks for the pointers, I may chase them if I need a speedup. The stuff I've written already, as described before, gives me usable performance. My application is learning the above rules from examples and I can get through 500 examples in 20-30 min. This is sufficient for my purposes. (However, if there is an obvious speedup that occurs to somebody, please let me know!) Thanks again, Steve Cassidy ========================================================== Computer Science, steve@comp.vuw.ac.nz Victoria University of Wellington, ==================== Box 600, Voice: +64 4 715328 Wellington, New Zealand. Fax: +64 4 712070 ----------------------------------------------------------
ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (03/06/91)
In article <1991Mar05.021953.11576@comp.vuw.ac.nz>, steve@comp.vuw.ac.nz (Steve Cassidy) writes: > My terms look like: 447:[class24]/[a]/[class42, e, #]/[ey] > RuleNo:LeftContext/Graph/RightContext/Phon Operators make for pretty input, but inefficient storage. On a typical Prolog system, storing RuleNo:LeftContext/Graph/RightContext/Phon will give you a tree looking something like :(RuleNo, /( ,Phon)) /( ,RightContext) /(LeftContext,Graph) which takes 4*3 = 12 words of overhead in addition to the elements stored. To access RuleNo costs 1 "compound" unification step LeftContext 4 Graph 4 RightContext 3 Phon 2 If instead you used rule(RuleNo,LeftContext,Graph,RightContext,Phon) then the space overhead would be 6 words, and accessing each element would take 1 "compound" unification step. As I said, operators are pretty, so if you are reading rules, it may pay you to read them in operator syntax, and then flatten them: flatten_rule(RuleNo:LeftContext/Graph/RightContext/Phon, rule(RuleNo,LeftContext,Graph,RightContext,Phon)). so that computations can use the more efficient form. You can use the same rule to re-introduce the operators for output. I'm interested in hearing about Prolog-based work on morphology. Any more information available about the application/project? -- The purpose of advertising is to destroy the freedom of the market.