The completion of the Mammalian Gene Collection (MGC)

  1. The MGC Project Team1,
  2. Gary Temple2,3,
  3. Daniela S. Gerhard4,
  4. Rebekah Rasooly5,
  5. Elise A. Feingold3,
  6. Peter J. Good3,
  7. Cristen Robinson3,
  8. Allison Mandich3,
  9. Jeffrey G. Derge6,
  10. Jeanne Lewis6,
  11. Debonny Shoaf6,
  12. Francis S. Collins3,
  13. Wonhee Jang7,
  14. Lukas Wagner7,
  15. Carolyn M. Shenmen7,
  16. Leonie Misquitta8,
  17. Carl F. Schaefer8,
  18. Kenneth H. Buetow8,
  19. Tom I. Bonner9,
  20. Linda Yankie7,
  21. Ming Ward7,
  22. Lon Phan7,
  23. Alex Astashyn7,
  24. Garth Brown7,
  25. Catherine Farrell7,
  26. Jennifer Hart7,
  27. Melissa Landrum7,
  28. Bonnie L. Maidak7,
  29. Michael Murphy7,
  30. Terence Murphy7,
  31. Bhanu Rajput7,
  32. Lillian Riddick7,
  33. David Webb7,
  34. Janet Weber7,
  35. Wendy Wu7,
  36. Kim D. Pruitt7,
  37. Donna Maglott7,
  38. Adam Siepel10,
  39. Brona Brejova10,11,
  40. Mark Diekhans12,
  41. Rachel Harte12,
  42. Robert Baertsch12,
  43. Jim Kent12,
  44. David Haussler12,
  45. Michael Brent13,14,
  46. Laura Langton13,
  47. Charles L.G. Comstock13,
  48. Michael Stevens14,
  49. Chaochun Wei13,15,
  50. Marijke J. van Baren13,
  51. Kourosh Salehi-Ashtiani16,
  52. Ryan R. Murray16,
  53. Lila Ghamsari16,
  54. Elizabeth Mello16,
  55. Chenwei Lin16,17,
  56. Christa Pennacchio18,19,
  57. Kirsten Schreiber18,20,
  58. Nicole Shapiro18,21,
  59. Amber Marsh18,22,
  60. Elizabeth Pardes18,23,
  61. Troy Moore24,
  62. Anita Lebeau25,
  63. Mike Muratet25,
  64. Blake Simmons24,
  65. David Kloske24,
  66. Stephanie Sieja24,
  67. James Hudson25,
  68. Praveen Sethupathy3,
  69. Michael Brownstein9,
  70. Narayan Bhat7,26,
  71. Joseph Lazar27,
  72. Howard Jacob27,
  73. Chris E. Gruber28,
  74. Mark R. Smith28,
  75. John McPherson29,
  76. Angela M. Garcia29,
  77. Preethi H. Gunaratne29,30,
  78. Jiaqian Wu29,31,
  79. Donna Muzny29,
  80. Richard A. Gibbs29,
  81. Alice C. Young32,
  82. Gerard G. Bouffard32,33,
  83. Robert W. Blakesley32,33,
  84. Jim Mullikin32,33,
  85. Eric D. Green32,33,
  86. Mark C. Dickson34,35,
  87. Alex C. Rodriguez34,36,
  88. Jane Grimwood34,37,
  89. Jeremy Schmutz34,37,
  90. Richard M. Myers34,37,
  91. Martin Hirst38,
  92. Thomas Zeng38,
  93. Kane Tse38,
  94. Michelle Moksa38,
  95. Merinda Deng38,
  96. Kevin Ma38,
  97. Diana Mah38,
  98. Johnson Pang38,
  99. Greg Taylor38,
  100. Eric Chuah38,
  101. Athena Deng38,
  102. Keith Fichter38,
  103. Anne Go38,
  104. Stephanie Lee38,
  105. Jing Wang38,
  106. Malachi Griffith38,
  107. Ryan Morin38,
  108. Richard A. Moore38,
  109. Michael Mayo38,
  110. Sarah Munro38,
  111. Susan Wagner38,
  112. Steven J.M. Jones38,
  113. Robert A. Holt38,
  114. Marco A. Marra38,
  115. Sun Lu39,
  116. Shuwei Yang39,
  117. James Hartigan40,
  118. Marcus Graf41,
  119. Ralf Wagner41,
  120. Stanley Letovksy42,43,
  121. Jacqueline C. Pulido42,
  122. Keith Robison42,
  123. Dominic Esposito44,
  124. James Hartley44,
  125. Vanessa E. Wall44,
  126. Ralph F. Hopkins44,
  127. Osamu Ohara45 and
  128. Stefan Wiemann46
  1. 3 National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA;
  2. 4 National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA;
  3. 5 National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland 20892, USA;
  4. 6 SAIC-Frederic, Inc., National Cancer Institute at Frederick, Frederick, Maryland 21702, USA;
  5. 7 National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland 20894, USA;
  6. 8 National Cancer Institute, Center for Bioinformatics, Rockville, Maryland 20852, USA;
  7. 9 National Institute of Mental Health, National Institutes of Health, Bethesda, Maryland 20892, USA;
  8. 10 Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853, USA;
  9. 11 Present address: Department of Computer Science, Comenius University, 842 48 Bratislava, Slovakia;
  10. 12 Center for Biomolecular Science & Engineering, University of California, Santa Cruz, California 95064, USA;
  11. 13 Center for Genome Sciences, Washington University, St. Louis, Missouri 63130, USA;
  12. 14 Department of Computer Science, Washington University, St. Louis, Missouri 63130, USA;
  13. 15 Present address: School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240 and Shanghai Center for Bioinformation Technology, Shanghai 200235, China;
  14. 16 Center for Cancer Systems Biology and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts 02115, USA;
  15. 17 Present address: Pediatrics Department, Stanford University School of Medicine, Stanford, California 94305, USA;
  16. 18 The I.M.A.G.E. Consortium, Biology and Biotechnology Research Program, Lawrence Livermore National Laboratory, Livermore, California 94550, USA;
  17. 19 Present address: Lawrence Berkeley National Laboratory, Walnut Creek, California 94598, USA;
  18. 20 Present address: Norgren Systems, Ronceverte, West Virginia 24970, USA;
  19. 21 Present address: Department of Energy Joint Genome Institute, Walnut Creek, California 94598, USA;
  20. 22 Present address: National Ignition Facility, Lawrence Livermore National Laboratory, Livermore, California 94550, USA;
  21. 23 Present address: Computing Applications and Research Department, Lawrence Livermore National Laboratory, Livermore, California 94550, USA;
  22. 24 Open Biosystems, now a part of ThermoFisher Scientific, Huntsville, Alabama 35806, USA;
  23. 25 HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA;
  24. 26 Present address: United States Patent and Trademark Office, Alexandria, Virginia 22314, USA;
  25. 27 Department of Dermatology, Physiology, Medical College of Wisconsin, Milwaukee, Wisconsin 53226, USA;
  26. 28 Express Genomics, Inc., Frederick, Maryland 21701, USA;
  27. 29 Baylor College of Medicine Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA;
  28. 30 University of Houston, Houston, Texas 77004, USA;
  29. 31 Present address: Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, Connecticut 06620, USA;
  30. 32 NIH Intramural Sequencing Center, National Human Genome Institute, National Institutes of Health, Bethesda, Maryland 20892, USA;
  31. 33 Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA;
  32. 34 Stanford Human Genome Center, Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA;
  33. 35 Present address: Cardiodx, Inc., Palo Alto, California 94303, USA;
  34. 36 Present address: Baxter International, Inc., Deerfield, Illinois 60015, USA;
  35. 37 Present address: HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA;
  36. 38 Genome Sciences Centre, BC Cancer Agency, Vancouver BC, V5Z 4S6 Canada;
  37. 39 GeneCopoeia Inc., Rockville, Maryland 20850, USA;
  38. 40 Beckman Coulter Genomics, Beverly, Massachusetts 01915, USA;
  39. 41 Geneart AG, Regensburg, Germany 93053;
  40. 42 Codon Devices, Inc., Cambridge, Massachusetts 02139, USA;
  41. 43 Present address: Helicos BioSciences Corporation, Cambridge, Massachusetts 02139, USA;
  42. 44 Protein Expression Laboratory, NCI/SAIC-Frederick, Frederick, Maryland 21702, USA;
  43. 45 Kazusa DNA Research Institute, Kisarazu, Chiba 292-0818, Japan;
  44. 46 German Cancer Research Center, D-69120 Heidelberg, Germany

    Abstract

    Since its start, the Mammalian Gene Collection (MGC) has sought to provide at least one full-protein-coding sequence cDNA clone for every human and mouse gene with a RefSeq transcript, and at least 6200 rat genes. The MGC cloning effort initially relied on random expressed sequence tag screening of cDNA libraries. Here, we summarize our recent progress using directed RT-PCR cloning and DNA synthesis. The MGC now contains clones with the entire protein-coding sequence for 92% of human and 89% of mouse genes with curated RefSeq (NM-accession) transcripts, and for 97% of human and 96% of mouse genes with curated RefSeq transcripts that have one or more PubMed publications, in addition to clones for more than 6300 rat genes. These high-quality MGC clones and their sequences are accessible without restriction to researchers worldwide.

    Footnotes

    | Table of Contents

    Preprint Server