- Split View
-
Views
-
Cite
Cite
Romulo M. Brena, Joseph F. Costello, Genome–epigenome interactions in cancer, Human Molecular Genetics, Volume 16, Issue R1, 15 April 2007, Pages R96–R105, https://doi.org/10.1093/hmg/ddm073
- Share Icon Share
Abstract
Genetic and epigenetic mechanisms contribute to the development of human tumors. However, the conventional analysis of neoplasias has preferentially focused on only one of these processes. This approach has led to a biased, primarily genetic view, of human tumorigenesis. Epigenetic alterations, such as aberrant DNA methylation, are sufficient to induce tumor formation, and can modify the incidence, and determine the type of tumor which will arise in genetic models of cancer. These observations raise important questions about the degree to which genetic and epigenetic mechanisms cooperate in human tumorigenesis, the identity of the specific cooperating genes and how these genes interact functionally to determine the diverse biological and clinical paths to tumor initiation and progression. These gaps in our knowledge are, in part, due to the lack of methods for full-scale integrated genetic and epigenetic analyses. The ultimate goal to fill these gaps would include sequencing relevant regions of the 3-billion nucleotide genome, and determining the methylation status of the 28-million CpG dinucleotide methylome at single nucleotide resolution in different types of neoplasias. Here, we review the emergence and advancement of technologies to map ever larger proportions of the cancer methylome, and the unique discovery potential of integrating these with cancer genomic data. We discuss the knowledge gained from these large-scale analyses in the context of gene discovery, therapeutic application and building a more widely applicable mechanism-based model of human tumorigenesis.
INTRODUCTION
Cancer is typically described in terms of genes that are mutated or deregulated. This gene-based model is derived, in large part, from whole genome but low-resolution analytical methods which certainly have biased the process of gene discovery. Higher resolution, high-throughput technical advances in DNA sequencing, genome scanning and epigenetic analysis have produced an impressive cadre of new cancer gene candidates to fit to the model. However, significant portions of the cancer genome and epigenome remain uncharted, suggesting that even more cancer genes and potential targets for diagnosis and therapy remain to be discovered. This realization has stimulated national and international collaborative efforts to fully map various cancer genomes and epigenomes ( 1–3 ), with noted successes in pilot phases. Here we discuss the technologies that have propelled these efforts, the resulting gene discoveries and the fundamental principles of the pathogenic mechanisms of cancer with special emphasis on epigenetic studies of DNA methylation. Because epigenetic mechanisms can cause genetic changes and vice versa, we also review known epigenetic-genetic interactions in the context of an integrated mechanism-based model of tumorigenesis.
DNA METHYLATION IN NORMAL CELLS
DNA methylation is essential for normal development, chromosome stability, maintaining gene expression states and proper telomere length ( 4–16 ). DNA methylation involves transfer of a methyl-group to cytosine in a CpG dinucleotide via DNA methyltransferases that create (DNMT3A, 3B) or maintain (DNMT1) methylation patterns. The haploid human methylome consists of approximately 28 233 094 CpGs, nearly 70% of which are methylated in normal cells. Just 7% of all CpGs are within CpG islands ( 17 ), and most of these are unmethylated in normal tissues. Normally methylated sequences include those few CpG islands associated with the inactive X chromosome, and some imprinted and tissue-specific genes, as well as pericentromeric DNA (e.g. Sat2 repeats on chr1 and chr16), intragenic regions and repetitive sequences. In fact, 45% of all CpGs in the genome are in repetitive elements, thus accounting for a large proportion of total 5-methylcytosine ( 17–22 ). Normal DNA methylation patterns may vary among individuals ( 23 , 24 ), potentially stemming from environmental exposure ( 25 ), or stochastic methylation events ( 26 ). The importance of inter-individual epigenomic variance has been postulated to influence the development of disease, and also the time of disease onset. An intriguing potential example of this phenomenon is illustrated by psychiatric diseases, such as bipolar disorder and schizophrenia in monozygotic twins. In some instances, only one member of the twin pair develops the pathology, while in others, the time of disease onset between the twins may differ by several years or even decades. Importantly, molecular studies have failed to identify a genetic component that may account for this phenotypic discordance ( 27 ). In light of this evidence, high resolution mapping of the methylome, ideally at single CpG dinucleotide resolution, may provide a new avenue for understanding the disease or susceptibility factors that could be used to detect at-risk individuals.
DNA METHYLATION IN HUMAN PRIMARY TUMORS
In primary human tumors, methylation patterns are severely disrupted. This includes aberrant hypermethylation of CpG islands in promoter regions which is frequently associated with gene silencing ( 16 , 28–31 ), and genome-wide hypomethylation ( 19 , 32–36 ). Typically, aberrant CpG island methylation is assessed in genes already known to play a role in tumor development, especially in tumor samples that do not harbor genetic alterations of the gene. This candidate gene approach has identified aberrant methylation-mediated silencing of genes involved in most aspects of tumorigenesis, commonly altering the cell cycle ( 37–42 ), blocking apoptosis ( 43–46 ) or DNA repair ( 47–53 ). In general, aberrant CpG island methylation tends to be focal, affecting single genes, but not their neighbors ( 54 , 55 ). Two genomic loci, however, are subjected to epigenetic silencing over an entire chromosomal domain of 150 kb in one case and 4 MB in the other ( 56–58 ).
These and other studies have established an important role for aberrant methylation in tumorigenesis and prognostication, but have focused on only a small number of the estimated 15 000 CpG island-associated promoters in the genome ( 59 ), and only on those genes first identified through genetic screens. Among those CpG islands analyzed, many have only been ‘sampled’ for methylation at fewer than five of potentially 100 or more CpGs in a single island. Even more revealing of the early stage of cancer methylome analyses is the fact that, of the roughly 50% of non-CpG island-associated promoters which could also be influenced by aberrant methylation at specific CpGs, few have been studied in cancer.
Concurrent with promoter hypermethylation, many human tumors exhibit a global decrease in 5-methylcytosine, or genomic hypomethylation, relative to matching normal tissues ( 19 , 32 , 34 , 60 , 61 ). In severe cases, hypomethylation can affect more than 10 million CpGs in a single tumor ( 62 ). Three mechanisms by which hypomethylation contributes to malignancy have been proposed, including transcriptional activation of oncogenes, loss of imprinting (LOI) and promoting genomic instability via unmasking of repetitive elements ( 32 , 60 ). Most surprisingly, despite the knowledge of hypomethylation for more than two decades, the vast majority of genomic loci affected by cancer hypomethylation are unknown ( 36 , 60 , 63–66 ), though presumably a significant proportion of DNA methylation loss occurs in repetitive sequences ( 67 ). A resurgence of interest in hypomethylation, along with newer technologies for assessing hyper- and hypomethylation discussed herein should address these sizable gaps in our knowledge of the cancer methylome.
HYPOTHESES ADDRESSED WITH LARGE SCALE SEQUENCING AND METHYLOME TECHNOLOGIES
Tumor suppressor genes are typically discovered through studying familial cancers and through mapping allelic loss of heterozygosity (LOH) in sporadic human tumors ( 68 ). Regions exhibiting recurrent, non-random deletion are selected for further identification of a candidate tumor suppressor gene by attempting to identify a second hit involving a point mutation or homozygous deletion ( 69 ). Thus, until recently, surveys for point mutations have been confined largely to regions of recurrent LOH or genomic amplification. Current proposals for sequencing entire cancer genomes aim to identify genes that have escaped detection by lower resolution approaches, to provide new targets for therapy and to further improve the experimental modeling of cancer. Pilot projects have proven the utility of this approach with great success ( 70–72 ). A recent zenith in sequencing, including 13 023 genes in 22 tumor cell lines, yielded a wealth of new candidate cancer genes and potential therapeutic targets ( 72 ). These analyses also distinguished mutations likely to contribute to the tumorigenic process from the many inconsequential mutations that riddle the cancer genome.
A related hypothesis is being addressed concurrently by taking an unbiased approach to mapping non-random and tumor type-specific epigenetic alterations that result in gene silencing ( 73–76 ). These studies address the hypothesis that there may be tumor suppressor genes that have escaped detection because they are seldom inactivated by genetic lesions, but often silenced by epigenetic mechanisms ( 55 ). Using Restriction Landmark Genome Scanning (RLGS) ( 77 ), the first of many large-scale methylation analysis methods, it was estimated that hundreds of CpG islands may be aberrantly methylated in any given tumor, though the range of methylation across individual tumors varies significantly ( 74 ). Similar to mutation spectra, only a subset of these methylation events are sufficiently recurrent to qualify as non-random events, potentially arising through selection of cells harboring a methylation-mediated silencing event that confers a growth advantage. Large scale integrated genomic and epigenomic tumor profiles showed that the majority of loci affected by aberrant methylation are in fact independent of recurrent deletions ( 55 , 78 , 79 ). Taken together, these data suggest genomic and epigenomic approaches are complementary for cancer gene discovery, and their integration could provide an ideal and more comprehensive platform for interrogating the cancer genome (Fig. 1 ).
GENOME-WIDE DNA METHYLATION ANALYSIS
Analyzing the human genome for changes in DNA methylation is a challenging endeavor. A majority of the 28 million CpG dinucleotides in the haploid genome are located in ubiquitous repetitive sequences common to all chromosomes which hampers determination of the precise genomic location where many DNA methylation changes occur ( 80 , 81 ). In addition, gene associated CpG islands encompass a minor fraction of all CpG sites, and their hypermethylation therefore has only a limited affect on global 5-methylcytosine levels in cancer cell DNA ( 82 ). However, since changes in CpG island methylation can abrogate gene expression ( 83 ), identifying aberrant CpG island methylation often, but not always, identifies genes whose expression is affected during, or because of, the tumorigenic process.
RLGS was the first method to emerge as a genome-wide screen for CpG island methylation ( 84 , 85 ). In RLGS, genomic DNA is digested with the rare-cutting methylation-sensitive restriction enzymes, such as Not I or Asc I. The recognition sequences for these enzymes occur preferentially in CpG islands ( 74 , 86 ), effectively creating a bias toward the assessment of DNA methylation in gene promoters. Importantly, Not I and Asc I recognition sequences rarely occur within the same island, effectively doubling the number of CpG islands interrogated for DNA methylation in any given sample ( 87 ). Following digestion, the DNA is radiolabeled and subjected to two-dimensional gel electrophoresis. DNA methylation is detected as the absence of a radiolabeled fragment, which stems from the enzymes' failure to digest a methylated DNA substrate. The main strengths of RLGS are that PCR and hybridization are not part of the protocol, allowing for quantitative representation of methylation levels and a notably low false positive rate relative to most other global methods for detecting DNA methylation. Additionally, a priori knowledge of sequence is not required ( 88 ), making RLGS an excellent discovery tool ( 89–92 ). However, RLGS is limited to the number of Not I and Asc I sites in the human genome that fall within the well-resolved region of the profile. In practice, the combinatorial analysis of both enzymes can assess the methylation status of up to 4100 landmarks ( 93 , 94 ).
The Human Genome Project ( 95 ) has played a major role in the development of newer methods for DNA methylation analysis ranging from single gene, intermediate range and high throughput (e.g. 100–1000 loci/genes in 200 samples) ( 96 , 97 ), to more complete methylome coverage (array-based methods) ( 22 , 98–104 ). To allow for more in-depth discussion of these methods, we unfortunately had to exclude discussion of a number of other very effective PCR and array-based methods. Arrays originally designed for genome-wide analysis of DNA alterations have been adapted for methylation analysis. A main advantage of array platforms is their potential to increase the number of CpGs analyzed, and the technically advanced state of array analysis in general. Critical parameters for methylation arrays for analysis of human cancer include effective resolution, methylome coverage (total number of CpGs analyzed), reproducibility, ability to distinguish copy number and methylation events and accurate validation through an independent method.
Differential methylation hybridization, the first array method developed to identify novel methylated targets in the cancer genome ( 103 ), has served as a basis for many newer generation array methods. In this assay, DNA is first digested with Mse I, an enzyme that cuts preferentially outside of CpG islands, and then ligated to linker primers. The ligated DNA is subsequently digested with up to two methylation sensitive restriction enzymes, such as Bst UI, Hha I or Hpa II. Since these enzymes are four-base pair restriction endonucleases, their recognition sequence is ubiquitous in GC rich genomic regions, such as CpG islands. After the second round of enzymatic digestion, the DNA is amplified by PCR using the ligated linkers as primer binding sites. Detection of DNA methylation is accomplished by fluorescently labeling the PCR product from a test sample, such as tumor DNA, and co-hybridizing it with the PCR products derived from a control sample, such as normal tissue DNA. Aberrantly methylated fragments are refractory to the methylation-sensitive restriction endonuclease digestion, resulting in the generation of PCR products. On the other hand, an unmethylated fragment would be digested, preventing PCR amplification. Therefore, the comparison of signal intensities derived from the test and control samples following hybridization to CpG island arrays provides a profile of sequences that are methylated in one sample and not the other. One potential drawback of most methylation array methods is the need to use potentially unfaithful linker ligation and linker PCR amplification which is prone to false positives. Nevertheless, massive improvements in oligonucleotide arrays, particularly for allelic methylation analysis, hold promise to bring even greater methylome coverage to methylation array-based methods in the future ( 22 , 100–102 , 105 , 106 ).
Bacterial artificial chromosome (BAC) arrays have also been successfully introduced as a high-throughput DNA methylation analysis ( 22 , 98 , 107 , 108 ), and complete tiling path arrays are available now ( 98 ). In one application with BAC arrays, genomic DNA is digested with a rare cutting methylation sensitive restriction enzyme. The digested sites are filled-in with biotin, and unmethylated fragments are selected on streptavidin beads, which are then co-hybridized to the BAC array with a second reference genome. In contrast to other array methods, ligation and PCR are not used in this protocol. The use of rare cutting restriction enzymes ensures that most BACs will contain only a single site or single cluster of sites, allowing single CpG effective resolution of the methylation analysis and accurate validation. Tiling path BAC arrays can be easily adapted for use with different restriction enzymes to significantly increase the number of analyzable CpGs. However, genome coverage using restriction enzymes is limited by the presence of their recognition sequence in the targets of interest.
The particular combination of array and methylation-sensitive detection reagents is also critical for tumor methylome analysis. These reagents include methylation-sensitive restriction enzymes, 5-methylcytosine antibody, methylated DNA-binding protein columns or bisulfite-based methylation detection. Bisulfite is a chemical that allows conversion of cytosine to uracil, but leaves 5-methylcytosine unconverted ( 109 ). This method is a staple of single gene analysis and high-throughput analysis of small sets of genes ( 110 , 111 ). However, due to the significantly reduced sequence complexity of DNA after bisulfite treatment, its use for array application is more limited ( 112 , 113 ). DNA selected through methyl-binding protein columns or by 5-methylcytosine antibody-immunoprecipitation has also been applied to micro-arrays ( 107 , 114–118 ). The effective resolution of methylation using either method is dependent in part on the average DNA fragment size after random shearing, generally 500 bp to 1 kb. It is not yet clear how many methylated CpG residues are needed for productive methylated DNA-antibody binding to occur, or whether the antibody has significant sequence bias. An advantage of this approach is that it is not as limited to specific sequences as restriction enzyme-based assays. However, the large amount of DNA required for this method currently may preclude its use for DNA extracted from archival cancer specimens. Whole genome amplification after immunoprecipitation could circumvent this limitation, albeit with greater potential for sequence representation bias. The 5-methylcytosine antibody approach has been used to successfully map the methylome of Arabidopsis thaliana ( 115 , 118 ), and has been applied to human cancer cell lines ( 114 , 116 ).
Methylation-sensitive restriction enzymes, either rare or common cutters, can theoretically provide single CpG precision/effective resolution. In practice, however, common cutters, even when applied to oligonucleotide arrays, will not yield single CpG resolution because up to 10 oligonucleotides spanning multiple common cutter sites are averaged into one value. Additionally, because protocols using common cutters require ligation and PCR ( 100 , 103 , 106 ), the distance and sequence between sites preclude a large proportion of these sites from analysis, reducing genome coverage. The restriction enzyme McrBc has also been tested for methylation detection ( 114 , 119 ), although the resolution of methylation events is undefined due to the unusual recognition site of McrBc (two methylated CpGs separated by 40–3000 bp of non-specific sequence).
An innovative large-scale SAGE-like sequencing method has also been employed for methylation analysis of breast cancer and the surrounding stoma cells ( 120 ). Gene expression arrays can also be used to identify methylation-related silencing of genes by focusing on silent genes that are reactivated in tumor cell lines exposed to a DNA demethylating agent ( 121–124 ).
Reduced Representation Bisulfite Sequencing, a large-scale genome-wide shotgun sequencing approach ( 125 ), has been successfully employed to investigate loss of DNA methylation in DNMT [1 kd , 3a −/− , 3b −/− ] ES cells. An advantage of this method is that it is amenable to gene discovery without pre-selecting targets, though sites exhibiting heterogeneous methylation might be confounding when represented by only a single sequence read. Substantially increasing the depth of sequencing may mitigate this limitation somewhat. Also, since clone libraries can be constructed, the system can be automated to maximize efficiency.
Epigenome projects of normal human cells have taken a sequencing-based bisulfite strategy which gives single CpG resolution of methylation status ( 126–128 ). While these projects are not initially designed to determine the methylation status of 28 million CpGs, the efforts to date have been immense and impressive, including different cell type, and inter-individual and inter-species comparisons. These and other studies are adding to whole new disciplines within epigenetic research, including population epigenetics and comparative epigenetics. In addition to the main goals of these projects, the data will also be of substantial value for comparison with cancer methylome data, whether from arrays or from sequencing bisulfite converted DNA.
METHYLOME ANALYSIS IN THE DISCOVERY OF CANCER GENES
As discussed, no single current genome-wide DNA methylation approach can assay the entire cancer methylome. Thus, more focused and integrative approaches exploiting the cooperation between genetic and epigenetic mechanisms have been undertaken to identify new cancer genes, many with promising results. Recently, for example, transcription factor 21 ( TCF21 ) was identified as a putative tumor suppressor in head and neck and lung cancers by specifically screening a known region of LOH for aberrant DNA methylation ( 129 ). Interestingly, this gene is located in a 9.6 Mb chromosomal domain known to suppress metastasis in melanoma cell lines ( 130 ). However, no candidate gene had been proposed for this region since mutations in TCF21 are infrequent ( 130 ). A similar strategy was utilized to identify oligodendrocyte transcription factor 1 ( OLIG1 ), a frequently methylated gene and prognostic factor in human lung cancer located in a region of chromosomal loss ( 131 , 132 ), as well as for HIC1 and others ( 133 ). As with TCF21 , OLIG1 also was methylated at a much higher frequency than the existing LOH data would have suggested ( 129 , 132 ), indicating that aberrant DNA methylation is likely the main mode of inactivation for these genes in the tumor types analyzed. Other putative tumor suppressor genes also located in regions of frequent LOH, such as DLEC1 , PAX7 , PAX9 , HOXB13 and HOXB1 , have been identified via the use of affinity columns to enrich methylated DNA sequences ( 134 ). Given their specific technical limitations, these studies indicate that the integration of several experimental strategies will be required in order to maximize the discovery of new cancer-related genes. These studies illustrate the discovery potential of combined approaches, though the current cast of candidate cancer genes derived from methylation screens alone is far larger than can be discussed here.
COMPUTATIONAL ANALYSIS OF THE METHYLOME
Aberrant DNA methylation exhibits tumor-type specific patterns ( 74 ). However, it is unclear how these patterns are established and why a large number of CpG islands seem to be refractory to DNA methylation, while others are aberrant methylated at high frequency ( 76 , 89 , 135–137 ). A functional explanation for this observation could be that all CpG islands may be equally susceptible to DNA methylation, but only a fraction are detected in tumors because of selection pressures. This hypothesis, though likely true for some genes, is unlikely to explain the mechanism responsible for aberrant methylation of all CpG island associated genes.
Using sequence-based rules derived from cancer cell methylation data has also been explored as a way to predict the pattern of aberrant methylation in cancer genome-wide ( 138–141 ). These studies have identified consensus sequences, proximity to repetitive elements and chromosomal location as potential factors influencing or perhaps determining the likelihood of a CpG island becoming aberrantly methylated. If the sequence context in which a CpG island is located influences its likelihood of becoming aberrantly methylated, the convergence of different computational analyses is likely to find commonalities that could help explain this phenomenon. An important goal in these studies will be to distinguish sequence rules that predict pan-cancer methylation versus those that predict tumor type-specific methylation, as these rules could be mutually exclusive. An intriguing and particularly striking association between a subset of genes susceptible to aberrant promoter methylation in adult human cancers and a subset of genes occupied or marked by polycomb group proteins in human embryonic stem cells has been reported independently by three groups ( 142–144 ). These and earlier studies ( 145 , 146 ) offer important new insight into possible mechanisms by which certain genes might be susceptible to methylation in cancer, and epigenetic support for the theory that human tumors arise from tissue stem cells. Comparison of the sequences associated with CpG occupancy and those derived from the computational analysis of methylation prone and methylation resistant loci described above might be particularly revealing.
EPIGENOME–GENOME INTERACTIONS IN HUMAN CANCER AND MOUSE MODELS
Genetic and epigenetic mechanisms both contribute to, and likely interact during tumorigenesis. In genetic mouse models of tumors, disruption of DNA methylation modifies dramatically the incidence of tumor formation and the spectrum of tumor types ( 147–149 ). Methylation imbalance alone is also sufficient to induce tumors in mice ( 35 , 150 ). These studies illustrate a functional role of epigenetic imbalance in tumorigenesis, and also emphasize the interaction of genetic and epigenetic mechanisms in determining tumor incidence and tumor type.
In human tumors, genetic and epigenetic mechanisms can cooperate directly or indirectly. For example, direct cooperation includes complete inactivation of tumor suppressors by methylation of one allele and either deletion or mutation of the other ( 151 , 152 ). Epigenetic mechanisms can also cause genetic alterations, and vice versa. For example, aberrant methylation-associated silencing of MLH1 leads to microsatellite instability in colon cancer ( 52 , 53 ). Similarly, methylation and silencing of the MGMT gene, which encodes a DNA repair enzyme, is significantly associated with G:C to A:T transition mutations in the tumor suppressor gene p53 in colorectal tumors ( 153 ). Indirectly, aberrant loss of methylation in the pericentromeric regions of chromosomes 1 and 16, followed by cell division, is associated with abnormalities of these chromosomes, including loss and gain of whole chromosome arms, in cancer and in ICF syndrome patients. Alternatively, translocations of PML and Retinoic Acid Receptor can create a fusion protein that abnormally recruits DNA methyltransferases and causes aberrant methylation at specific promoters in leukemia ( 154 ). More global epigenetic defects described as a CpG island methylator phenotype ( 155 , 156 ) are tightly associated with genetic mutations of the oncogene BRAF, potentially suggesting a common genetic–epigenetic course for these tumors. This candidate gene approach suggests there are important interactions between these two major mechanisms of tumorigenesis, but the extent to which these individual observations can be extrapolated to the whole cancer genome is unknown. Efforts that integrate different technologies, as described above, promise a more complete understanding of the genomic and epigenomic contribution to tumorigenesis.
Conflict of Interest statement . None declared.