[sci.bio] Alu locations

toms@fcs260c2.ncifcrf.gov (Tom Schneider) (12/06/90)

Hi everyone!

A friend of mine, Doug Halverson, has cloned several pieces of human DNA and
has found that they don't all have Alu sequences in them.  His question is, in
human genes that have been mapped or sequenced so far, have stretches of DNA as
long as 10 to 20 kb been found to be Alu free?

One way to answer this is to ask how many Alu sequences are in the genome.
According to Darnell (J. Darnell and H. Lodish and D. Baltimore", title =
"Molecular Cell Biology", year = "1986", publisher = "Scientific American
Books, Inc.", address = "N. Y."}) there are 5x10^5 per genome.  Since the
genome is around 3x10^9 bp, one would expect about 1 every 10kb, but this is
guess work.  Does anyone know more about the distribution?

  Tom Schneider
  National Cancer Institute
  Laboratory of Mathematical Biology
  Frederick, Maryland  21702-1201
  toms@ncifcrf.gov

colby@bu-bio.bu.edu (Chris Colby) (12/06/90)

In article <1965@fcs280s.ncifcrf.gov> toms@fcs260c2.ncifcrf.gov (Tom Schneider) writes:
>Hi everyone!
>
>A friend of mine, Doug Halverson, has cloned several pieces of human DNA and
>has found that they don't all have Alu sequences in them.  His question is, in
>human genes that have been mapped or sequenced so far, have stretches of DNA as
>long as 10 to 20 kb been found to be Alu free?

	Is this a trick question? Why don't you just do a restriction
digestion with the enzyme Alu1 (for which the Alu site was named I'm
led to believe). Look and see if you get any DNA larger than 20 kB.

	Of course since an Alu sequence is 300 bp long (I just opened
my copy of Watson and looked it up) Alu1 will cut at some sites that
are not Alu sequences. This is because Alu1 has a 4bp recognition
sequence. Watson sez: Alu sequences are present in more than a million
copies and represent 3 - 6 percent of the genome. Any 5000 bp segment
will probably contain one because they are widely distributed. 

>  Tom Schneider
>  National Cancer Institute
>  Laboratory of Mathematical Biology
>  Frederick, Maryland  21702-1201
>  toms@ncifcrf.gov

Chris Colby
email: colby@bu-bio.bu.edu

toms@fcs260c2.ncifcrf.gov (Tom Schneider) (12/07/90)

In article <70131@bu.edu.bu.edu> colby@bu-bio.UUCP (Chris Colby) writes:
>In article <1965@fcs280s.ncifcrf.gov> toms@fcs260c2.ncifcrf.gov (Tom Schneider) writes:
>>Hi everyone!
>>
>>A friend of mine, Doug Halverson, has cloned several pieces of human DNA and
>>has found that they don't all have Alu sequences in them.  His question is, in
>>human genes that have been mapped or sequenced so far, have stretches of DNA
>>as long as 10 to 20 kb been found to be Alu free?

>	Is this a trick question? Why don't you just do a restriction
>digestion with the enzyme Alu1 (for which the Alu site was named I'm
>led to believe). Look and see if you get any DNA larger than 20 kB.

No, it isn't a trick question.  I presume that you mean AluI, not Alu1. AluI
cuts at AGCT, and so appears very frequently in the genome.  I find 3542 sites
in 683804 bases of human sequence, which is one every 193.06 bases.  In
equi-probable random sequence one would expect to find it every 256 bases, so
perhaps the low frequency reflects some bias in the sequences I looked at.  If
we performed the experiment you suggest, we would get a lovely smear, mostly at
the bottom of the gel in small fragments.  The restriction enzyme AluI comes
from the bacterium Arthrobacter luteus, hence the name.  According to Darnell,
the Alu's were named because SOME of them have AluI sites; conversely other
sequences also have AluI sites, so the name is not so good.

>	Of course since an Alu sequence is 300 bp long (I just opened
>my copy of Watson and looked it up) Alu1 will cut at some sites that
>are not Alu sequences. This is because Alu1 has a 4bp recognition
>sequence. Watson sez: Alu sequences are present in more than a million
>copies and represent 3 - 6 percent of the genome. Any 5000 bp segment
>will probably contain one because they are widely distributed. 

If you look back at the original posting, you will see that I made a similar
calculation.  The point of the posting was that this is a GUESS, not knowledge
of the actual distribution in the genome.  If Alu sites happen to come in
bunches, then it will not be true that they appear as you predict.  The
question is: what is the actual frequency in known sequences.  Silly me, I see
here in Darnell (p. 432) that "If human DNA is used to make a set of genomic
clones with an average length of 20 kb, more than 90 percent of all randomly
chosen clones contain an intermediate repeat sequence."  Looks like that
ties it up!

>Chris Colby
>email: colby@bu-bio.bu.edu

  Tom Schneider
  National Cancer Institute
  Laboratory of Mathematical Biology
  Frederick, Maryland  21702-1201
  toms@ncifcrf.gov