Introduction

Phenotypic differences exist between endothelial cells located within different extraocular vascular beds [1]. This phenomenon of “endothelial cell heterogeneity” [2] may also be observed in the human eye. Using oligonucleotide microarrays with probe sets for 8,746 transcripts, we demonstrated that approximately 10% of genes were significantly differentially expressed by donor-matched cultured human retinal and choroidal endothelial cells [3]. Consistent with these findings, we subsequently showed that protein expression differed between human retinal and choroidal endothelial cells; when retinal and choroidal endothelial protein extracts were studied by two-dimensional difference gel electrophoresis, 25% of the 123 protein spots that were analyzed contained significantly different amounts of material derived from the retinal versus choroidal endothelial cells [4].

Differential gene expression by ocular endothelial subpopulations is likely controlled by multiple mechanisms including the interaction of transcription factors with their cis-regulatory motifs. Furthermore, it is expected that genes with similar expression patterns would have common motifs. This relationship is not simple, as gene expression is usually controlled by combinations of transcription factors acting via groups of cis-regulatory motifs, which are referred to as cis-regulatory modules [5]. Nonetheless, this provides a mechanism whereby activity of one or more transcription factors could result in cell type-specific expression of a subset of genes. A corollary to this statement is that existence of motifs or modules that are significantly more abundant in genes preferentially expressed in one cell type implicates activity of the respective transcription factor(s) in the regulation of gene expression of that cell specifically.

New in silico methods provide one avenue for comparing the regulation of gene expression between different cell types (or different states of the same cell). Biostatistics software programs that link to bioinformatics databases allow the investigator to identify and compute the abundance of cis-regulatory motifs or modules within the promoter region of any given gene. TRANSFAC [6], a database first conceived by Wingender [7] in 1988, is arguably the most comprehensive and widely used assembly of transcription factors and their respective cis-regulatory elements and regulated genes, with built-in search tools [8, 9]. CisModule is a recently published algorithm, best described as a “Bayesian module sampler,” that was developed for “discovery” of novel—or known—cis-regulatory modules and constituent motifs [10]. Although the literature does not contain published reports of the application of this computational approach to the investigation of human ocular endothelial cells, Waleev et al. [11] used a similar approach to study changes in gene regulation in human umbilical vein endothelial cells in response to stimulation with tumor necrosis factor alpha [12]. In this study, we performed analyses using TRANSFAC and CisModule to identify cis-regulatory elements and modules in genes that were differentially expressed by retinal versus choroidal endothelial cells.

Materials and methods

Gene expression microarray data

We analyzed a microarray data set that had been generated by comparing gene expression of resting retinal and choroidal endothelial cells from eyes of three human cadaver donors using Affymetrix GeneChip Human Genome Focus Arrays (two technical replicates/cell type/donor; technical replicates were generated by preparing duplicate cultures of endothelial cells, from which RNA was extracted separately for hybridization to independent arrays). These data may be found in the Gene Expression Omnibus repository [13] (GSE7850: see retinal/choroidal endothelial cell_human 4/5/6_unstimulated_replicate1/2). Information about the experimental procedure and data normalization and processing were previously published [3]. Significant differences in gene expression between retinal and choroidal endothelial cells were determined using the significance analysis of microarrays [14] with the false discovery rate [15] set at 5%; transcripts identified by this method were further filtered to select for those that demonstrated a unidirectional and twofold or greater change across all three donors. This filtering of the data was performed to enrich the data set for genes that were clearly differentially expressed by retinal and choroidal endothelial cells, because genes that were not differentially expressed were not expected to influence the analysis. Alternative approaches would have been to study the entire group of expressed genes, including genes that were not differentially expressed, or a subset of these genes that had been chosen at random. However, manual determination of promoter sequences would have been impractical had we taken either of these approaches. We preferred to determine the promoter sequences manually (see “Collection of promoter sequences”), as this allowed us to define the length and position of the promoter sequence and, in addition, ensured that the transcription start site (TSS) was correctly identified.

Collection of promoter sequences

Human gene sequences were identified on the nucleotide website of the National Center for Biotechnology Information [16]. For each gene, nucleotide sequences, including 2,000 bp of the 5′ flanking promoter region and 1,000 bp downstream of the TSS, were collected. If a particular gene encoded transcript variants with different TSSs, the defined 3,000 bp sequences for all variants were recorded. Sequences were stored as text files, with the positions of TSS, ATG start codon, exon(s), and intron(s) annotated.

Identification of cis-regulatory motifs

Before analysis of the promoter sequences, regions that contained interspersed repeats or were of low complexity were identified and masked using RepeatMasker [17]. The modified sequences were evaluated for the presence and abundance of known cis-regulatory motifs, using TRANSFAC Professional v11.4 (Biobase GmbH, Germany), with Match, to perform a matrix-based search. For any motif identified in this way, relative abundance was compared for genes that were highly expressed in retinal versus choroidal endothelial cells by the Fisher’s exact test, with a p value below 0.05 defining a significant difference in abundance.

Identification of cis-regulatory modules

We used CisModule [18] to predict potential cis-regulatory motifs and modules in the same modified promoter sequences that were studied in the previous analysis. CisModule is based on a hierarchical mixture model that recognizes the existence of the cis-regulatory module as a series of transcription factor binding sites within short genomic sequences and acting in concert to regulate gene expression; the algorithm, which uses Bayesian inference, involves simultaneous sampling of a promoter sequence for both putative modules and motifs [10]. Consequently, CisModule may be employed to identify both known and novel cis-regulatory elements and to predict cis-regulatory modules, by analysis of a group of co-regulated genes. We ran each set of promoter sequences separately through CisModule a total of three times with parameters set as follows: up to five motifs per module, motif length of 8 to 14 bp, and module length of up to 200 bp. The run with the highest posterior model odds for each cell subtype was selected for further analysis by TRANSFAC Professional, to identify cis-regulatory motifs.

Results

Promoter sequences

In the gene expression data set, 73 and 66 transcripts of the total of 8,746 transcripts represented on the array were present at significantly higher levels in retinal and choroidal endothelial cells, respectively. We retrieved genomic sequence encompassing 2,000 bp of the 5′ flanking promoter region and 1,000 bp downstream of the TSS for 67 of the 73 genes that were highly expressed in retinal cells and 63 of the 66 genes that were highly expressed in choroidal cells. As we also retrieved the sequences for any genes encoding transcript isoforms with alternative TSSs, we collected, in total, 91 and 76 promoter sequences corresponding to gene transcripts that were present in relative abundance in retinal and choroidal endothelial cells, respectively. The differentially expressed genes are identified in Supplementary Tables 1 and 2.

Comparison of cis-regulatory motifs in genes differentially expressed by retinal and choroidal endothelial cells

The Match program of TRANSFAC Professional returned a total of 5,756 cis-regulatory motifs, corresponding to 159 different transcription factors, from the 91 retinal endothelial sequences, and a total of 4,973 cis-regulatory motifs corresponding to 158 different transcription factors, from the 76 choroidal endothelial sequences. To minimize spurious identification of false motifs, we filtered our search to identify only motifs with a “core match” of one; for any given potential binding site of 7 to 15 bp, an invariable “core” 5-bp sequence had to be present. In this fashion, we reduced the number to 2,879 motifs in retinal endothelial sequences and 2,597 motifs in choroidal endothelial sequences, corresponding to 146 and 147 transcription factors, respectively. A total of 135 transcription factors were identified in both retinal and choroidal endothelial sequences, while 23 transcription factors were identified in relation to either retinal or choroidal endothelial, but not both, sequences. Transcription factors and corresponding motif consensus core sequences (defined by International Union of Pure and Applied Chemistry [IUPAC] nucleotide code) [19] that were most abundant in retinal endothelial promoter sequences were (1) zinc finger 5 (ZF5), CGCGC (n = 140); (2) avian myeloblastosis viral oncogene homolog (v-Myb), AACGG (n = 112); (3) Hand1:E47, TCTGG (n = 91); (4) epidermal growth factor receptor-specific transcription factor (ETF), GGCGG (n = 87); and (5) transcription factor AP-2 (TFAP2), CMSGC (n = 72). Transcription factors and motif consensus core sequences that were most abundant in choroidal endothelial promoter sequences were (1) ZF5, CGCGC (n = 179); (2) ETF, GGCGG (n = 106); (3) v-Myb, AACGG (n = 89); (4) MYC-associated zinc finger protein (MAZ), GGGAG (n = 75); (5) and Hand1:E47, TCTGG (n = 71). Table 1 shows the distribution of binding sites by strand and by location. The transcription binding sites were located approximately 1:1, forward strand/reverse strand and approximately 3:2, 2,000 bp upstream/1,000 bp downstream of the TSS.

Table 1 Distribution of cis-regulatory motifs in promoter sequences of genes that are differentially expressed by human retinal and choroidal endothelial cells

We used Fisher’s exact test to compare the abundance of cis-regulatory motifs, corresponding to each of the transcription factors identified in the previous analysis, in the promoter regions of differentially expressed retinal and choroidal endothelial genes. Before testing, for genes having transcript variants with different TSSs, we collapsed all isoforms into one entry by taking the maximum frequencies of each motif identified for any isoform. For five transcription factors, motifs were significantly more abundant in retinal endothelial promoter sequences. Transcription factors and corresponding motif consensus core sequences (defined by IUPAC nucleotide code) [19] were (1) glucocorticoid receptor (GCCR; p = 0.015), GTTCT; (2) high mobility group AT-hook 1 (HMGIY; p = 0.022), GGAAA; (3) heat shock transcription factor 1 (HSF1; p = 0.025), AGAAY; (4) p53 (p = 0.025), CATGY; and (5) vitamin D receptor (VDR; p = 0.024), GGGTS and TGAMC. Motifs recognized by three transcription factors were significantly more abundant in choroidal endothelial promoter sequences. These transcription factors and motif consensus core sequences were (1) transcription factor E2F (E2F; p < 0.001), TGGCG and CGGCA; (2) Yin Yang 1 (YY1; p = 0.001), GCCAT; and (3) ZF5 (p = 0.006), CGCGC. In addition, the analysis identified motifs recognized by TFAP2 as being more abundant in retinal endothelial promoter sequences, and, simultaneously, it identified motifs recognized by TFAP2-alpha as being more abundant in choroidal endothelial promoter sequences; as TFAP2-alpha is a member of the TFAP2 family, we attributed this result to a database anomaly and did not consider it valid for further consideration. Supplementary Table 3 presents the total number of motifs in retinal and choroidal endothelial promoter sequences corresponding to the eight transcription factors identified as described above. Figure 1 illustrates graphically the relative frequencies of motifs recognized by these transcription factors.

Fig. 1
figure 1

Relative frequency of cis-regulatory motifs that were significantly over-represented within the promoters of genes differentially expressed by retinal versus choroidal endothelial cells. Frequency was determined with respect to total number of cis-regulatory motifs within all genes studied for each cell subtype. Motifs are identified according to the corresponding transcription factors (GCCR glucocorticoid receptor, HMGIY high mobility group AT-hook 1, HSF1 heat shock transcription factor 1, VDR vitamin D receptor, E2F transcription factor E2F, YY1 Yin Yang 1, ZF5 zinc finger 5)

Identification of putative cis-regulatory modules in genes differentially expressed by retinal and choroidal endothelial cells

We next employed CisModule to identify potential cis-regulatory modules, comprised of up to five cis-regulatory motifs, in the promoter regions of differentially expressed retinal and choroidal genes. The algorithm detected 220 representations of a putative module in retinal endothelial sequences and 441 representations of a putative module in choroidal endothelial sequences. The predicted cis-regulatory modules for promoter sequences in genes highly expressed by retinal or choroidal endothelial cells were different. Table 2 shows the predicted modules for retinal and choroidal sequences and possible identities for respective constituent cis-regulatory motifs, obtained by submitting the motif sequences to TRANSFAC Professional. Nine of the ten predicted motifs were identified, and there was one novel motif. Among the 67 genes that were more highly expressed in retinal endothelial cells, 54 genes (81%) were identified as containing one or more of the predicted modules, while 63 of 63 genes (100%) that were more highly expressed in choroidal endothelial cells were similarly identified. Table 3 presents these data and indicates the number of motifs that composed the predicted retinal and choroidal endothelial modules. Supplementary Tables 4 and 5 give the position weight matrices of the predicted motifs with consensus sequences, and Fig. 2 shows the sequence logos. A sequence logo is a graphical representation of a sequence; at any given position, the height of each letter within a stack indicates the relative frequency of the corresponding nucleotide, and the total height of stack at any given position indicates how informative that position is within the binding site [20].

Fig. 2
figure 2

Sequence logos for predicted cis-regulatory modules in a genes that were expressed at significantly higher levels in retinal endothelial cells and b genes that were expressed at significantly higher levels in choroidal endothelial cells

Table 2 Predicted cis-regulatory modules and constituent motifs, with possible identities of corresponding transcription factors as determined by TRANSFAC Professional, for genes that were highly expressed by human retinal or choroidal endothelial cells
Table 3 Number of predicted cis-regulatory modules and motifs in promoter sequences of genes that were differentially expressed by human retinal and choroidal endothelial cells

Discussion

Previously, we compared the relative abundance of gene transcripts between endothelial cells isolated from retina and choroid by microarray, to characterize vascular endothelial diversity within the eye [3]. In this study, we utilized a computational approach to identify cis-regulatory motifs and transcription factors that might account for the differences in gene expression we observed. Particular cis-regulatory motifs may cluster together within a promoter region as modules that control cell-specific gene expression on the binding of transcription factors [5]. Using the TRANSFAC database and the CisModule software, we have identified a number of cis-regulatory motifs and modules that appear to regulate gene expression in either retinal or choroidal endothelial cells. Identification of such promoter elements may have implications for the pathogenesis of diseases that specifically involve these endothelial cells, including proliferative diabetic retinopathy and neovascular age-related macular degeneration (AMD).

Mitochondrial dysfunction, overproduction of vascular endothelial growth factor (VEGF), and endothelial cell proliferation are key events in the pathogenesis of choroidal neovascularization in AMD [21, 22]. Interestingly, cis-regulatory motifs for E2F, YY1, and ZF5 are prevalent in promoters of genes that are preferentially expressed in choroidal endothelial cells. All three transcription factors could potentially influence directly or, via a signaling pathway, one or more of the aforementioned events. When neovascularization is induced in the E2F1 gene-deficient mouse, either by hind limb ischemia or tumor challenge, vascular proliferation is exacerbated, as a result of the production of excess VEGF; E2F1 interacts with p53 under hypoxic conditions to repress VEGF transcription [23]. In the mouse, ZF5 represses transcription of the myc gene [24], whose product regulates the activity of specific members of the E2F family of proteins to induce either cell proliferation or cell death [25]. Loss of cellular YY1 leads to a diminution of mitochondrial oxidative function via a pathway involving the mammalian target of rapamycin (mTOR) [26]. Overexpression of mTOR augments the activity of hypoxia-inducible factor (HIF)-1 alpha, and secretion of VEGF under conditions of low oxygen tension, thereby increasing the growth of new vessels [27]. Dysregulation of the transcription factors that interact with E2F, YY1, and ZF5, or polymorphisms within the cis-regulatory motifs themselves, might influence the onset or progression of neovascular AMD.

cis-Regulatory motifs that are more prevalent in the promoters of genes preferentially expressed in retinal endothelial cells (i.e., GCCR, HMGIY, HSF1, p53, and VDR) interact with transcription factors that regulate cell proliferation, apoptosis, aging, and the stress response; all these processes are pertinent to neovascularization in diabetic retinopathy. Transcription factor, HSF1, is regulated in response to cellular stress and promotes the removal of accrued unfolded peptides from the endoplasmic reticulum [28]. This process may be perturbed in diabetes mellitus [29]. As proliferative diabetic retinopathy involves abnormal growth of retinal blood vessels, it is intriguing that a low level of HMGA1 leads to a decrease of cell-surface insulin receptors in patients with insulin resistance and type II diabetes [30]. Moreover, an overproduction of HMGA1 protein can lead to cell proliferation and inhibition of apoptosis mediated by p53 [31]. p53 has many activities, including induction of apoptosis in cells with damaged DNA, anti-oxidative effects with implications for the reduction of oxidative damage associated with aging, and anti-angiogenic effects via inhibition of HIF-1 alpha [32, 33]. The carboxyl region of VDR binds the active form of vitamin D, while the amino region binds to the cis-regulatory motif [34]. Activation of VDR by vitamin D prevents the proliferation of endothelial cells isolated from tumors [35], and, consistent with this observation, vitamin D treatment inhibits angiogenesis in a mouse model of retinoblastoma [36]. Interestingly, a polymorphism within the VDR gene, believed to alter the ability of the activity of the transcription factor, is associated with type I diabetes [37]. Like VDR, GCCR is a hormone-activated transcription factor [38]. Activation of GCCR also suppresses neoplasia-associated angiogenesis in vivo, an effect that results from decreased levels of VEGF and interleukin-8 [39]. Interestingly, both VDR and GCCR are differentially expressed by different extraocular endothelial cell subpopulations [40, 41].

As with any in silico research, it will be important to confirm findings in in vitro and/or in vivo biological systems. For example, one might construct reporter vectors containing multiple copies of a cis-regulatory motif or a combination of these motifs for transfection of retinal and choroidal endothelial cells, to evaluate cell-specific interactions of transcription factor(s) and promoter element(s). It is also important to be aware that the output from a computational study of this nature depends very much on the selection of sequences for analysis. For example, we initially performed analysis of the same genes that were used for the present study, with Promoter Analysis and Interaction Network Tool (PAINT) v3.5 [35], an alternative program for identifying cis-regulatory motifs within a genomic sequence. PAINT predicts the TSS position along the gene sequence and identifies cis-regulatory motifs upstream of this forecasted site [42]. In the pilot study with PAINT, we examined the 2,000-bp upstream region of the predicted TSS and obtained a different set of cis-regulatory motifs to those identified in this project (data not shown). In the current study, we analyzed both 2,000-bp downstream and 1,000-bp downstream of the predicted TSS, as sequences downstream of the TSS, including intronic sequences, may contain cis-regulatory elements [43]. In addition, we more accurately determined the position of the TSS by manually collecting promoter sequences from the literature. The same general principles apply to the prediction of cis-regulatory modules by CisModule. An apparent contradiction is that cis-regulatory motifs recognized by ZF5 were more abundant in genes highly expressed by choroidal endothelial cells and, yet, also present within a predicted retinal endothelial cis-regulatory module (Table 2). However, the result of the CisModule analysis implies that ZF5 may preferentially regulate gene expression in retinal endothelial cells only when acting in concert with other transcription factors. One novel putative motif was identified within the retinal cis-regulatory module (Table 2). If nuclear protein extract isolated from retinal endothelial cells specifically binds this novel motif, it would be of immediate interest to identify the corresponding transcription factor(s).

Identifying and characterizing retinal and choroidal endothelial cell-specific gene expression may have implications for the treatment of vascular diseases of the eye. Regarding the proliferating choroidal endothelial cells responsible for neovascular AMD, the logical goal of treatment is to inhibit division and/or induce apoptosis of these cells alone. The same reasoning applies to retinal endothelial cells and diabetic retinopathy. Delivery of a therapeutic transgene under the control of a promoter with cis-regulatory motifs that are preferentially activated in one endothelial population could potentially enhance gene expression exclusively in those cells, avoiding detrimental effects to neighboring non-diseased cells, including other endothelial populations. Retinal cell-specific promoters and enhancers have been characterized and used to achieve cell-specific expression within the retina [44, 45]. Endothelial cell-specific promoter elements isolated from the human VEGF-R1 gene effectively target transgene expression to the ocular vasculature when injected intravitreally [46], and enhancer elements that specifically limit expression to proliferating endothelial cells have been identified [47]. A promoter construct that combines specific retinal or choroidal endothelial cis-regulatory motifs and element(s) that initiate expression in proliferating cells could provide a safe and effective new treatment for proliferative diabetic retinopathy or neovascular AMD, respectively.