JP2@CU.NIH.GOV (02/22/90)
In response to the query as to the status of sequencing the mouse genome, a meeting was held of representatives of the US mouse community last Nov. at Princeton. The Human Genome Center was about to post the report for that meeting, as well as reports for several other meetings the NCHGR has sponsored over the past few months. . I will post the report for the mouse meeting now with the others to follow. Participants in the meeting were: Dr. Neal Copeland--NIH, Frederick MD Dr. Verne Chapman-- Roswell Park Memorial Inst. Buffalo, NY Dr. Peter D'Eustachio, NYU Medical Ctr, New York, NY Dr. William Dove-- Univ of Wisconsin, Madison, WI Dr. Marshall Edgell, U. of N. Carolina, Chapel Hill, NC Dr. Eva Eicher, Jackson Laboratories, Bar Harbor, ME Dr. Rosemary Elliott, Roswell Park Dr. Jeff Friedman, Rockefeller Univ. New York, NY Dr. Lee Hood, Cal Tech, Pasadena, CA Dr. Nancy Jenkins, NIH, Frederick, MD Dr. Jan Klein, Max Planck Institute Biology, Tubigen, FRG Dr. Eric Lander, MIT, Cambridge, MA Dr. Terry Magnuson, Case Western Reserve Univ., Cleveland Dr. Joe Nadeau, Jackson Labs, Bar Harbot, ME Dr. Ken Paigen, Jackson Labs, Dr. Eugene Rinchik, Oak Ridge National Laboratory, Oak Ridge, TN Dr. Leane Russell, Oak Ridge Dr. Shirley Tilghman, Princeton Univ, Princeton, NJ Dr. Lee Silver, Princeton Univ Any questions about the report can be directed to Shirley Tilghman who organized the meeting. Questions about funding for mouse genome work through the Human Genome Center can be directed to the NCHGR staff-- Jane Peterson--jp2@nihcu.bitnet, Mark Guyer--gy4@nihcu.bitnet or Bettie Graham--b2g@nihcu.bitnet ------------------------------------------------------------------ INTRODUCTION The commitment of the National Institutes of Health to map and sequence the human genome includes an equal commitment toward understanding the information which will accumulate. This will require the acquisition of equivalent amounts of information for a variety of model organisms. Far from being redundant data gathering, the comparative approach is essential to the success of the Genome Project. Many studies have shown that important genes are conserved among organisms, providing a crucial criterion for discriminating among genes. However sequence alone provides minimal information about the specific role of a gene in biology. That requires experimental manipulation, which is impossible in humans. The mouse became the preeminent mammalian model organism over the last fifty years because of its convenient size, fertility, short gestation period, ease of maintenance and well defined genetics. The recent advances in transgenic DNA technology and the ability to mutate genes in the mouse germline via homologous recombination in embryonic stem (ES) cells have only served to strengthen this important role for the mouse. In fact, the mouse is the only mammal in which one can experimentally test the biological role of sequences of unknown function conserved in mammals. The power of the interplay between scientific discovery in the human and the mouse has been amply demonstrated in the last several years. For example the function of cloned human genes, for e.g. the human Zfy gene, whose role in male sex determination is currently being questioned, can be specifically tested only in mice. Mouse models for autoimmunity, such as the MRL/lpr and moth-eaten (me) mice, mimic accurately the symptoms of arthritis, vasculitis and nephritis associated with rheumatoid arthritis and systemic lupus erthymatosus in humans. Third, the mouse mutants offer unique access to genes that are complementary to those identified in human populations as "disease genes". Human recessive lethals, if there is no visible heterozygous phenotype, will never be observed. Yet these represent a group of important essential genes in mammalian biology. Finally the discovery of significant synteny between the mouse and human genomes has led to the exchange of DNA markers between mouse and human geneticists that has contributed to the physical mapping of the Duchenne's muscular dystrophy locus, neurofibromatosis and Wilm's tumour. Within the last few years new advances in mapping of the mouse genome, which take advantage of interspecies crosses, have made it possible to develop high resolution multilocus genetic and physical linkage maps of the entire genome. The development of these maps will virtually revolutionize the study of the mammalian genome. This report grew out of a meeting held at Princeton University on November 17, 1989. In attendance were 19 scientists (see Appendix) who have extensive expertise in mouse genetics and molecular biology. The goal of the meeting was to draw up a reasonable blueprint for generating a genetic and physical map, and ultimately the DNA sequence, of the mouse genome. Goals were set for these endeavours that we hope will be useful to the National Institutes of Health as it sets its priorities for the Genome Project. The report is divided into five sections which cover genetic mapping, physical mapping, DNA sequencing, informatics and animal resources. SECTION I. GENETIC MAPPING In the past, chromosomal gene assignments in the mouse relied on meiotic mapping involving the use of classical two- and three-point crosses between genetically nonidentical inbred strains or recombinant inbred (RI) strains established from crosses of inbred laboratory mouse progenitors. Both of these approaches are, however, limited by the difficulty in identifying allelic differences among inbred strains. RI strain analysis also suffers from the additional disadvantage that only close linkages can be identified. These two disadvantages have been overcome by using interspecific crosses, which exploit the differences between genetically diverse Mus species and standard inbred strains. The resulting interspecific hybrids are multiply heterozygous across every chromosome and, because the inbred strains have been extensively characterized, the phase of linkage is known for every combination of genes that is known. Through the power of interspecific backcross mapping it is now possible to develop high resolution multilocus linkage maps for the mouse genome. By eliminating the time-consuming search for extensive polymorphism, a chronic problem in human gene mapping, it is likely that the mouse genetic map will proceed far more rapidly than the human map. Once generated, the mouse genetic map will allow the complete identification of regions of synteny and linkage conservation between mouse and man, thereby facilitating the human mapping efforts that will be ongoing concurrently. The other major advantage of multi-locus mapping is that, for each cross the map is cumulative; that is, each new gene is mapped relative to every other gene, irrespective of the laboratory in which it is performed. The long-term goal is to develop a genetic map of the mouse with molecularly identified genes mapped to a resolution of 0.5 to l.0 cM. A map of this density will serve as an important tool for the rapid and efficient molecular analysis of mutant loci, and it can serve as the base for the physical mapping of the mouse genome and eventually the sequence analysis of specific regions. The workshop identified a two-step process for building the mouse genetic map toward the long-term objective. First is the development of an efficient method for localizing additional markers on the linkage map with respect to a series of anchor loci. Second is the elaboration of mapping resources that define the order of genes on chromosomes at a resolution of 1.0 cM or less. These steps are discussed in more detail below. 1. A set of approximately 320 anchor loci which will serve to colinearize genetic maps generated at different sites and using different methods. The immediate goal is to develop backcross gene-mapping resources that locate a new locus on the genetic map within 5 cM distance of an existing anchor gene. By choosing 5 cM as the optimal distance between anchor loci, approximately 320 anchor loci (l,600 cM mouse genome/5 cM between anchors) will be needed to cover the entire mouse genome. This density of anchor loci should provide sufficient genetic resolution for future high resolution genetic mapping studies. The 320 loci become important anchor references for further orientation of the genetic map and they will become sequence tagged sites (STSs) for the elaboration of physical maps. For a locus to be chosen as an anchor it must hybridize well to mouse DNA and provide a STS that is polymorphic in interspecific mouse backcrosses. Ideally, the anchors should also define the ends of conserved regions between mouse and human genomes, facilitating cross species comparisons, and be polymorphic among inbred laboratory strains. The development of an STS based anchor map of the mouse genome will enable any interested investigator to participate in mouse genome mapping. As long as each investigator includes the appropriate anchor loci in his or her map, all maps can readily be aligned in the future. It is anticipated that the majority of laboratories will employ interspecific backcrosses for mapping purposes. However, many other types of crosses (intraspecific, RI, etc.) need to be continued as well. These will be important in assessing the generalities of the interspecies map, as at least one chromosomal inversion between M. domestices and M. spretus has been identified. Such rearrangements would have profound effects on recombinational mapping. 2. A genetic map of 1cM resolution. The optimal resolution of the mouse genetic linkage map was set at l cM (where a l cM genetic map is defined as a map where greater than 95% of the recombination breakpoints lie within l cM of each other). The workshop members arrived at that level of genetic resolution based on the conviction that this is sufficient to guarantee a meaningful integration of the genetic and physical maps. To achieve the 1cM goal, an additional 3,000-4,000 probes (200-400 markers per chromosome) will need to be placed on the genetic map. A backcross panel of 300 progeny provides a 95% probability of recombining genes that are 1.0 cM apart and a 460 progeny panel provides a 99% probability of recombination in that distance. This is well within the means of many laboratories to produce. We anticipate that a l cM mouse linkage map can be in place within the next 5 years. Since cloned genes will primarily be mapped in the future by interspecific backcross panels, DNA sequence information should be available to convert these probes into STS's for the mouse. The meeting endorsed the use of STS's for mouse genome mapping. To facilitate the overall conversion to an STS-based map in the future, we will strongly encourage the prior identification and description of the STS as a prerequisite for inclusion of any new marker into the composite map database. SECTION II. THE PHYSICAL MAP As an ultimate goal, the physical map of the mouse should consist of an ordered set of STS's spaced at intervals of l00 kilobase pairs of DNA (kb) throughout the mouse genome. Once the ordered STS's are in hand, it is hoped that it will not be necessary to maintain any particular reference library of clones. Rather the information in STS's will provide the probes for isolating clones from any region of the mouse genome. Without endorsing a particular strategy, which would be premature at this time, there was unanimous sentiment that a physical map of the mouse genome is a very high priority which could be achieved within the next five to eight years. The workshop strongly endorsed physical mapping at two levels of complexity: regional and genome wide. These will be considered separately, although it is clear that coordination, possibly through the use of a common yeast artificial chromosome (YAC) library, will enhance both kinds of efforts. It was the strong conviction of the attendees that both were essential and should begin immediately. Unless genome-wide mapping is tackled, closure of the physical map will be many years away. 1. Regional mapping The workshop recommended that current efforts at regional physical mapping be intensified immediately. Techniques for making chromosome specific or chromosome enriched libraries, such as chromosome microdissection, somatic cell hybrids segregating mouse chromosomes, and flow sorted (Robertsonian) chromosomes are currently available. Others are likely to be developed in the next few years. This endeavour will almost certainly take place in regions that are well characterized with respect to genetics and biological function. Although interpretation of this criterion will certainly vary from investigator to investigator, evaluation of what is currently available in the mouse-mutation resource does suggest several regions on which to concentrate. Clearly, the proximal region of chromosome l7 is destined for such model status; the years of investigation into the nature and function of both the t complex and the major histocompatibility complex, the vast storehouse of mouse "reagents" such as MHC congenic strains, recombinant t haplotypes, ethylnitrosourea-induced lethal mutations, spontaneous chromosomal rearrangements involving t chromosomes, and region-specific chromosome-dissection microclones, which provide many nucleation points for the development of long-range physical maps of the region, all combine to suggest that intensive study of this region should continue. Other regions to be considered are those covered by existing complexes of overlapping radiation- and chlorambucil-induced germline deletions, or those regions such as the heavy and light chain immunoglobulin loci and T cell receptor loci which are dense with genetic information. The length of genome associated just with the 6 to ll cM deletion complex surrounding the albino (c) locus on chromosome 7, for example, could approach 0.3% to 0.7% of the total genome. Similar arguments can be made for the dilute-short ears complex on chromosome 9, the Agouti complex on chromosome 2, and the brown (b) complex on chromosome 4. These animal resources are unique in mammalian biology, and provide powerful tools for physical mapping. Molecular cloning of breakpoint-junction fragments from independent members of a deletion complex affords an efficacious way of chromosome "jumping" back and forth within a defined length of genome. The existing mutation resource, as well as that which can be newly developed, thus provides the genetic tools for intensive physical and correlated functional mapping in selected regions of the mouse genome. Because of extensive mouse-human genome-segment homologies, exploitation of mouse mutations, and the maps to which they contribute, may make it possible to assign functions to human DNA sequences that might otherwise be characterized only at the DNA sequence level. 2. Genome-Wide Physical Mapping Although there is strong support for regional mapping efforts, it is recognized that regional mapping will not lead to a complete physical map. For this, it will be necessary to generate an overlapping contig map of the entire mouse genome. The workshop endorsed immediate initiation of projects aimed at genome-wide physical mapping. The paradigms for such an endeavour are the successful projects undertaken for E. coli, yeast, and C. elegans. Although there are a variety of approaches which can be used to assemble an overlapping contig map, certain basic considerations apply across the board. As shown by theory and practice, it is impractical to achieve closure of overlapping contig maps, at least by random addition of clones. For the mouse genome, it seems practical to aim at assembling clones into perhaps 2000-4000 islands of physically overlapping clones which must then be ordered relative to each other in the genome. Once the islands are ordered, the gaps can be systematically closed. Projects to generate contigs in a genome of 3 X 109 bp should be strongly encouraged. Clone fingerprinting in a mammalian genome may require new approaches, for example, hybridization of restriction enzyme digested DNA from YAC clones with probes detecting frequent repeats in the mouse genome to produce distinctive patterns of fragment sizes. Robust methods for band detection will be essential to a project of this size. The development of additional moderately repetitive probes is also highly encouraged. Moderately repetitive probes have many important uses aside from fingerprinting, including the rapid mapping of uncloned mouse mutations, the establishment of skeleton linkage maps for the mouse genome, and the mapping of polygenic traits. The blueprint set for the genetic map in Section I. clearly is important in considering strategies for genome-wide physical mapping. The assignment of contigs to the genetic map will be greatly facilitated by interspecific cross mapping; e.g., taking probes containing genetically-mapped polymorphisms and identifying the contigs that contain them. By contrast, it is worth noting that the greater difficulty of genetic mapping in the human might tend to favor high resolution in-situ hybridization as a mapping strategy. Ultimately the physical contig map will be converted into STS's, in a variety of ways. For example, one might (i) make STSs that detect genetic polymorphisms, placing them on the genetic map and a clone collection simultaneously; (ii) make STSs from the ends of clones, using the STS to detect small overlaps between clones; (iii) make STSs from genes of interest; or (iv) use a combination of these approaches. SECTION III. DNA SEQUENCING In the end the physical and genetic maps are preludes to complete DNA sequence analysis. It is important that the mouse community take immediate (and selective) advantage of the maps to generate regional sequence. The workshop supported the idea of several regional DNA sequencing projects where a 0.5-2 mb focus is chosen based on interesting genetics and biology. This is the size of a large gene or many multigene loci (e.g., cystic fibrosis, MHC, T-cell receptor and antibody loci). Ideally, each sequencing project should be done in parallel with the same region in the human genome, and in intimate coordination with investigations on the biology and genetics of the locus. At this time a group of l2-l5 people with appropriate instrumentation and computational support could sequence perhaps one mb per year. The number and scope of regional sequencing projects could increase as the sequencing technologies become more mature. It is important to stress again why sequence information for the mouse is not redundant with that for the human. The evolutionary distance between mouse and human is sufficient to allow the identification of important features of mammalian genomes, which would include both coding, regulatory and structural elements. For example the first enhancer was identified by virtue of its homology between these two species. Second, given that the mouse will be the primary experimental organism in which to explore the function of human genes, then it will be necessary to have intimate knowledge of the homologous mouse genes. The complete DNA sequence will provide powerful new tools to explore the developmental and molecular biology of these genes (e.g., knowledge of all restriction sites, PCR probes, antibodies to coding regions, etc.). These tools will revolutionize the manner in which the corresponding loci can be analyzed. SECTION IV. INFORMATICS Informatics is a key aspect of the mouse portion of the Genome Project. Not only is it essential for managing and exploiting the substantial and complex data that will be generated, but it will also represent a key connection to the other species in the Project and to all subsequent studies of organismal biology. We strongly endorse the notion that this effort be international in scope, especially as groups in Europe and Japan are already (Green) contributing significantly to the Mouse Genome Effort. The International Mouse Genetics Community has well-established organs of communication that serve as important background resources. The Mouse Newsletter, and Green's Genetic Variants and Strains of the Laboratory Mouse are two such examples. It is in the best interests of the entire field that databases be assembled with the international community in mind. There are two primary uses of genome-related databases: as repositories of data from which particular bits of information can be extracted and as a source of information from which data can be retrieved for analyses of genome organization and biological function. These uses dictate the manner in which these data should be stored, displayed, analyzed and distributed. The following is based on the premise that informatics needs should be addressed with small, focused, sophisticated databases that any user can access in a simple, uniform manner, preferably one based on a graphical, point-and-click environment. The important issues which need to be addressed are outlined below. 1. Databases structure There are two alternative philosophies of database composition. One is to create a monolithic database of all possible mouse genome-related information. The advantages of this approach are several. Information is concentrated at a particular site and users immediately know where and how information can be obtained. Moreover, users need to learn only one user environment and they can become intimately familiar with the subtleties of the system. The alternative philosophy involves marginal databases consisting of a series of small specialized databases each containing specific types of genome information. Again there are several advantages to this approach. Each database is maintained by a specialist in that particular area. Each specialist can be relied upon to keep the information in a better manner than the most well-intentioned person who is less intimately related with the data. Databases can be built and maintained in ways that address the specific needs of the specialty area. Changing interests and needs can be tracked efficiently. The principal disadvantage of this approach is that users must learn the idiosyncrasies of each system. Projects are underway, however, to address this problem by providing a single user environment ("front-end") in which access to diverse databases can be obtained transparently. 2. Quality of the Databases It goes without saying that the data must be of high quality. Every effort should be made to enter data in its most robust form, thereby enabling the widest variety of uses. An example is the observed numbers of progeny of each haplotype class in multi-locus crosses, rather than summarized measures of recombination frequencies. Summarized data by their nature limit the range of analyses that are possible. An important feature that database keepers should incorporate is a coding of the data according to their status, e.g. published versus personal communication, confirmed versus provisional. This feature would enable users to identify strong and weak data and to make their own decisions about which data should be included in their analysis. Another important feature should be comments on special or idiosyncratic aspects of design, methods, results, or any other relevant part of each study. 3. Connections between databases It is essential that means be developed for accessing and analyzing diverse databases simultaneously. These connections represent one of the most important consequences of the Genome Project - the utilization of genetic and physical mapping data to understand organismal biology. A simple example is comparison of the physical map with the genetic map of phenotypic deviants with simultaneous access to a description of the phenotype, the genetic and physical maps of the homologous portions of the human genome, and a description of genetic diseases that map to that region. Excellent communication exists not only between the various groups working on Mouse Informatics, but also with the Human Informatics community. (Lest this comment appear gratuitous - several meetings between the two Informatics groups at The Jackson Laboratory has led to a firm commitment of both groups to work together to build comprehensive databases for the mouse.) Efforts are being made to coordinate database development in order to minimize unintentional redundancy. Plans are being developed for a workshop to be held as soon as possible to discuss Mouse Informatics. This workshop will provide a valuable forum for discussing informatic's needs, alternative solutions, and means of implementation. 4. Database needs and existing databases A wide variety of reasonably sophisticated databases of mouse genetic information already exist. Each of these databases is being enhanced to improve performance, user compatibility, and flexibility. It is probable that other kinds of databases will be developed. For example, important new databases dealing with screening physical mapping libraries with defined clones and probes and the assembly of the physical map from overlapping segments of the genome are needed. Databases that deal with important areas of biological information about the mouse and other species of mammals also exist. A simple example of the utility of these databases involves MUTCAT, M.C. Green's description of heritable mutations. Many of the databases, analytical methods, and multi-level connections between databases can only be described very generally; specific descriptions of future database needs is premature. Listed below are examples of the databases that already exist or that need to be developed. Database needs ExistsTo be developed Directly genome related Genetic maps multilocus In part mutation (incl. the historical map) Yes chromosome maps Yes Physical maps contig - ordered clones Yes hit-map (hyb. of clones to library) Yes Cytogenetic map banding patterns Yes chr rearrangements and fragile sites Yes in situ hybridization Yes deletions In part RFLP map Yes Comparative genetic maps Yes Comparative physical maps Yes Relevant databases of biological information Protein sequence and structure Yes Protein function ?? Mutant phenotype Yes Disease loci Yes MIM Yes MUTCAT Yes LONDON DYSMORPHOLOGY Yes Resources Clones and probes Yes Lane list of strains and mutations Yes JAX DNA resource Yes It is premature to define hardware and software standards. Software needs are not sufficiently well-defined that specific programs can be developed and generally supported. Instead, effort should focus in the short term on research on the construction and use of database structures and mechanisms to integrate the various efforts to develop useful databases. The hardware available for these problems is rapidly advancing. The software and databases should be as independent of particular hardware architectures as possible. The goal should be portability, flexibility, and independence. Dialogue between research groups involved in studying these problems is essential. SECTION V. MOUSE STRAIN AND MUTATION RESOURCES One of the major reasons that the mouse has become the preeminent mammalian model organism is the rich genetic resources that exist world wide. The roles that the various Mus species collected worldwide, the standard inbred strains, with their genetic homogeneity, and the large number of mutant strains will play in genetic and physical mapping has already been described. However it is equally important to consider the important role these mutations, and the ones that will be generated in the future, will play in correlating DNA sequence with biological function. The generation and study of mutations in the mouse will contribute in a significant way to understanding the functional makeup of the mammalian genome by providing a direct, experimentally malleable system for associating biologically significant phenotypes first with regions of the genome, and then with specific DNA sequences themselves. That the availability of heritable genetic alterations, and their subsequent use as biological "reagents", are essential components in the molecular-genetic analysis of any organism is an undisputed fact. Recent developments in both classical- and molecular-mutagenesis techniques have provided means to induce particular types of mouse mutations, each with its own applications, advantages, and disadvantages. Indeed, the exciting new technique of homologous recombination in embryonic stem-cell (ES) lines is a powerful method for introducing specific, defined mutations within a gene of choice. Assuming that current technical difficulties with this procedure will be ironed out in the near future, its only limitations are that one must first start with a cloned sequence (no problem at the end of the genome initiative, but difficult initially), and that genes must be mutated one at a time. It was generally agreed at the Workshop that the technique of homologous recombination takes its place as an extremely important and useful tool in the future of biology, but will not (and should not) be the only source of new mutations to complement the genome initiative. Insertional mutagenesis, either by naturally occurring endogenous elements, or by exogenously added DNA (as in the formation of transgenic mice), likewise provides a means of inducing mutations that will also be immediately accessible at the molecular level. In contrast to homologous-recombination-induced mutations, insertional mutations may prove to be quite random, occurring at just about any locus throughout the genome. Spontaneous and, importantly, agent-induced chromosomal rearrangements have proved, in many systems, to be valuable genetic tools for gaining initial molecular access either to a specific (uncloned) gene or to an entire chromosomal region. These rearrangements --- specifically, deletions, translocations, and inversions --- have also proved essential in providing both cytogenetic and molecular landmarks or reference points in the development of detailed physical maps of entire chromosomal regions; in some cases, they provide the actual definition of when one has indeed arrived at a specific locus at the molecular level. Because of these attributes, chromosomal rearrangements are, at this point in time, highly desirable research tools for beginning the molecular analysis of a genomic region or locus. Several regions of the mouse genome (perhaps two to three percent, in aggregate) are currently covered by extensive "complexes" of overlapping radiation- and chlorambucil-induced deletions. Unfortunately, these types of rearrangements are currently not available for the major portion of the mouse genome. Several laboratories are now developing the use of chlorambucil mutagenesis of male germ cells as a high-efficiency method for creating chromosomal rearrangements (and particularly deletions) at numerous additional regions throughout the genome, and this should be encouraged. Intragenic mutations (existing or newly induced), while often providing suitable mammalian models for the study of developmental and clinical genetics, identify loci that are difficult to localize precisely on genetic and physical maps with current technologies. The localization of such loci into intervals of the l-cM cloned-marker interspecific-backcross map will present a major challenge, especially for those mutations associated with phenotypes such as lethality or reproductive failure. On the other hand, such intragenic mutations have considerable utility for refining maps of entire, specific, genomic regions. Recently developed procedures for high-efficiency germline mutagenesis with agents, such as N-ethyl-N-nitrosourea, that induce primarily intragenic mutations, along with appropriate breeding protocols, now provide the best means for determining the functional-locus makeup of specific stretches of chromosome, for estimating the numbers and kinds of genes that one might expect to find within the complete DNA sequence, and for determining the gene densities that might be encountered in different "structural" regions of chromosomes (e.g., G-light versus G-dark bands). Particularly cogent in this context would be the exploitation of high-efficiency mutagenesis protocols in mouse regions that share significant homologies with certain human regions. In this way, aspects of the functional composition of certain regions of the human genome might be ascertained early on in the genome initiative. In summary, the role of the mouse-mutation resource in the genome initiative is currently substantial, and it should become even more important and relevant as the initiative evolves. Indeed, the major strength of coordinating the molecular aspects of the human and mouse genome initiatives with the mouse-mutation resource is that the outcomes need not stand alone as a sterile catalog; they can readily be associated with functional information derived from genetic investigations, and with experimental studies that can delve into the earliest development of abnormal phenotypes.