CLIFF@IBM.COM ("Cliff Pickover") (01/10/91)
Over the years, I've done various studies that suggest that DNA and RNA can be modelled as a "Markov" sequence. Obviously, genetic sequences are not random, but very "DNA-like sequences" can be created if there is a Markov correlation. For DNA and RNA, a "Markov process" means that the bases are not independent, but rather a previous base in a sequence affects the frequency of occurrence of a neighbor base. Sequences produced in this manner may "fool" you since they statistically look so much like real genetic sequences. These correlations are not hard to model on a computer. Since I don't have access to RNA folding programs, I'm wondering if anyone would like to fold an artificial Markov RNA sequence to see if it looks like "real" RNA folded sequences. I'm providing a recipe for producing the sequence below. It generates what I call a Markov GC sequence. Try P0=.3, P1=.7. You can also produce a 4-valued sequence (GCAU) using the same approach. If we find any interesting results, I'd be happy to coauthor a short paper on this with interested parties. Below is a method that allows you to generate a correlated sequence of Gs and Cs. Each base has "knowledge" of the base which comes before it. For simplicity, lets use a random binary sequence called B(i). "i" simply counts the number of bases which can be symoblized by 0 and 1. P0 and P1 are the probabilities that B(i) is equal to zero or one, respectively, if B(i-1) is equal to zero. P1 and 1 - P1 are the probabilities that B(i) is equal to one or zero, respectively, if B(i-1) is equal to one. Since the values of B(i) depend on the values at B(i-1), even small deviations from randomness (i.e. P0 and P1 not equal to 0.5 ) affect the DNA. Note: A method for generating a Markov process is included below. If p0 and p1 are 0.5, then a random sequence of 0s and 1s is generated. { olddata=0; For a 100-bases RNA sequence, do this 100 times: Random(result); /* return a random number on (0,1) */ if olddata=0 then if result < p0 then data = 0; else data=1; if olddata=1 then if result < p1 then data = 1; else data=0; if data = 1 then Write("G"); if data = 0 then Write("C"); olddata = data end;
Ellington@Frodo.MGH.Harvard.EDU (Deaddog) (01/10/91)
In article <9101091601.AA07596@genbank.bio.net> CLIFF@IBM.COM ("Cliff Pickover") writes: > I'm wondering if anyone would like to fold an artificial Markov RNA > sequence to see if it looks like "real" RNA folded sequences. It is almost impossible to create an RNA sequence that doesn't fold up to look 'real.' I have sequenced a number of 'random' RNAs of length 100 (and subsequently folded them); they look as good as tRNA or portions of Group 1 introns or any other 'real' RNA that you care to mention. Given that there are only four bases, and that G can pair with either U or C and that U can pair with either G or A, it is not surprising that reasonable secondary structures abound. Non-woof