[comp.archives] [sci.bio] Delila/Sequence Logos in C

toms@fcs260c2.ncifcrf.gov (Tom Schneider) (01/30/91)

Archive-name: bionet/molbio/delila/1991-01-29
Archive-directory: ncifcrf.gov:/pub/delila/ [129.43.1.11]
Original-posting-by: toms@fcs260c2.ncifcrf.gov (Tom Schneider)
Original-subject: Delila/Sequence Logos in C
Reposted-by: emv@ox.com (Edward Vielmetti)

Hi folks!  I have just today succeeded in translating the makelogo program into
C and running it.  If you did not obtain the programs before because they are
written in Pascal, you should now be able to run them.  Information on how to
do this is available by anonymous ftp from ncifcrf.gov in the directory
pub/delila.  Get the README file first.

The p2c translator is available from:  daveg@csvax.cs.caltech.edu (David
Gillespie).  You can obtain it by anonymous ftp from csvax.caltech.edu in the
pub directory.

The files are also available to people on BITNET from Dan Davison
(davison@uh.edu) on "gene-server%bchs.uh.edu@CUNYVM" (many thanks
to Dan for this service).  A reference for this is:
 
@article{Davison1990,
author = "D. B. Davison
 and J. E. Chappelear",
title = "The {Genbank}-Server at the {University of Houston}",
journal = "Nucl. Acids Res.",
volume = "18",
pages = "1571-1572",
year = "1990"}

p2c is not available on BITNET so far as I know, but perhaps Dan would be
willing to make it available if someone wants it.

To remind you, these programs generate ``sequence logos'' described in:

@article{Schneider.Stephens.Logo,
author = "T. D. Schneider
 and R. M. Stephens",
title = "Sequence Logos: A New Way to Display Consensus Sequences",
journal = "Nucl. Acids Res.",
volume = "18",
pages = "6097-6100",
year = "1990"}

The abstract is:

"A graphical method is presented for displaying the patterns in a set of
aligned sequences.  The characters representing the sequence are stacked on top
of each other for each position in the aligned sequences.  The height of each
letter is made proportional to its frequency, and the letters are sorted so the
most common one is on top.  The height of the entire stack is then adjusted to
signify the information content of the sequences at that position.  From these
`sequence logos', one can determine not only the consensus sequence but also
the relative frequency of bases and the information content (measured in bits)
at every position in a site or sequence.  The logo displays both significant
residues and subtle sequence patterns."

  Tom Schneider
  National Cancer Institute
  Laboratory of Mathematical Biology
  Frederick, Maryland  21702-1201
  toms@ncifcrf.gov