Analysis of the Gene-Dense Major Histocompatibility Complex Class III Region and Its Comparison to Mouse

  1. Tao Xie1,4,7,
  2. Lee Rowen1,7,
  3. Begoña Aguado2,5,
  4. Mary Ellen Ahearn3,
  5. Anup Madan1,6,
  6. Shizhen Qin1,
  7. R. Duncan Campbell2, and
  8. Leroy Hood1,8
  1. 1 Institute for Systems Biology, Seattle, Washington 98103, USA
  2. 2 MRC Rosalind Franklin Center for Genomics Research (formerly HGMP Resource Center), Hinxton, Cambridge CB10 1SB, UK
  3. 3 Department of Pediatrics, University of Miami School of Medicine, Miami, Florida 33136, USA

Abstract

In mammals, the Major Histocompatibility Complex class I and II gene clusters are separated by an ∼700-kb stretch of sequence called the MHC class III region, which has been associated with susceptibility to numerous diseases. To facilitate understanding of this medically important and architecturally interesting portion of the genome, we have sequenced and analyzed both the human and mouse class III regions. The cross-species comparison has facilitated the identification of 60 genes in human and 61 in mouse, including a potential RNA gene for which the introns are more conserved across species than the exons. Delineation of global organization, gene structure, alternative splice forms, protein similarities, and potential cis-regulatory elements leads to several conclusions: (1) The human MHC class III region is the most gene-dense region of the human genome: >14% of the sequence is coding, ∼72% of the region is transcribed, and there is an average of 8.5 genes per 100 kb. (2) Gene sizes, number of exons, and intergenic distances are for the most part similar in both species, implying that interspersed repeats have had little impact in disrupting the tight organization of this densely packed set of genes. (3) The region contains a heterogeneous mixture of genes, only a few of which have a clearly defined and proven function. Although many of the genes are of ancient origin, some appear to exist only in mammals and fish, implying they might be specific to vertebrates. (4) Conserved noncoding sequences are found primarily in or near the 5′-UTR or the first intron of genes, and seldom in the intergenic regions. Many of these conserved blocks are likely to be cis-regulatory elements.

Footnotes

  • [Supplemental material is available online at www.genome.org and http://www.systemsbiology.org. The nucleotide sequences of human and mouse cosmid or BAC clones have been submitted to GenBank as a series of separate entries. The accession numbers are as follows: AC007080, AF109719, AF109905, AF109906, AF049850, and AF030001 (mouse); and AF129756, AF134726, AF019413, U89337, U89336, and U89335 (human). The following individuals kindly provided reagents, samples, or unpublished information as indicated in the paper: T. Spies.]

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.1736803.

  • 4 Present address: Hartwell Center for Bioinformatics and Biotechnology, St. Jude's Children's Research Hospital, Memphis, TN 38105, USA

  • 7 These authors contributed equally to this work.

  • 5 Present address: Centro Nacional de Biotecnologia (CNB), CSIC Campus Universidad Autonoma 28049, Madrid, Spain

  • 6 Present address: Neurogenomics Research Laboratory, University of Iowa, Iowa City, IA 52246, USA.

  • 8 Corresponding author. E-MAIL lhood{at}systemsbiology.org; FAX (206) 732-1254.

    • Accepted September 18, 2003.
    • Received July 9, 2003.
| Table of Contents

Preprint Server