Introduction

An insulin receptor tyrosine kinase substrate of 53/58-kDa protein was originally identified in hamster cells through biochemical studies after insulin and/or IGF-I treatment (Yeh et al. 1996). It is phosphorylated upon stimulation with insulin and/or IGF-I, but differs from other members of the well-known insulin receptor substrate groups (namely, human IRS1, IRS2 and IRS4) in terms of conserved amino-acid sequence motifs and other features (Hubbard and Till 2000). The human homologue was identified as a binding partner with DRPLA (Okamura-Oho et al. 1999), in which CAG triplet repeat expansion in the coding region causes a neurodegenerative disorders, dentatorubral pallidoluysian atrophy (Naito and Oyanagi 1982; Nagafuchi et al. 1994a, 1994b; Koide et al. 1994). The human homologue was also identified as a binding partner with a serpentine receptor, brain-specific angiogenesis inhibitor 1 (BAI1), and named as BAI1-associated protein 2 (BAIAP2) (Oda et al. 1999). The human homologue not only has a sequence similarity with hamster IRSp53/58, but also has been demonstrated to be phosphorylated upon stimulation with insulin and/or IGF-I (Okamura-Oho et al. 1999). IRSp53 is now highlighted as a key factor in the cytoskeleton reorganization: IRSp53 functions as an adaptor that binds Rho family GTPases (Rho, Rac and cdc42) and their effectors (mDia, WAVE2 and Mena), and mediates the activation of these molecules (Miki et al. 2000; Krugmann et al. 2001; Miki and Takenawa 2002). The cdc42 protein controls the formation of actin bundles in membrane ruffling and filopodia formation at the cellular periphery. IRSp53 is also known to localize at postsynaptic density of the central nerve system, which suggests a role in neurite outgrowth (Abbott et al. 1999; Soltau et al. 2002).

To date, at least four isoforms of IRSp53 have been identified in human. We identified IRSp53-L and IRSp53-S, consisting of 552 and 521 amino acid residues, respectively, as binding partners with DRPLA protein (Okamura-Oho et al. 1999). Oda and co-workers (1999) identified two isoforms, named as BAIAP2-α and -β, which were composed of 521 and 520 amino acids, respectively. BAIAP2-α is identical to IRSp53-S, while BAIAP2-β is unique. Accordingly, we use IRSp53-T in this report rather than BAIAP2-β. The fourth isoform (IRS-58), with 534 amino acid residues, was identified during a cloning process of binding partners with cdc42 (Govind et al. 2001). As the relationship between protein isoforms of 53 or 58 kDa and mRNA isoforms is still uncertain, IRS-M is used in this report for the isoform. The four mRNA isoforms have been repeatedly confirmed in RT-PCR by both others and ourselves, as well as in many expression sequence tags (EST). The four IRSp53 transcripts generate respective protein isoforms sharing the identical 511 amino acid residues from the N-terminus and differing only in short peptide sequences at their C-terminus. Each isoform has distinct functions; for example, IRSp53-L and -S were phosphorylated with insulin but not with IGF-I in transfected cultured cells, while IGF-I phosphorylated only the T-form (Okamura-Oho et al. 2001). Thus, the unique short peptide sequences at the C-terminus have a vital role in its function probably through regulating accessibility to functional sites by intra-molecular binding. This is quite important as there are several discordance results in functional analyses with IRSp53 expression vectors. These isoforms are supposed to be generated by alternative splicing, but it has been proved yet. Here, we report that the four isoforms are indeed generated by alternative splicing by experimental and computational studies. This study on human and rodent genomes solved the issue of whether rodents lack one of the isoforms (L-form).

Materials and methods

DNA analyses

Human genomic DNA was isolated from peripheral leukocytes with the standard phenol/chloroform extraction method. Genomic DNA of mouse and Sprague-Dawley (SD) rat was prepared from tail tissues with the Dneasy Tissue Kit (QUIAGEN, Hilden, Germany). Polymerase chain reaction (PCR) was conducted as previously described (Tadokoro et al. 1992). Briefly, the reaction mixture consisted of 10 ng genomic DNA, 0.5 μM primers, 200 μM of each dNTP and 0.5 U Taq DNA polymerase (Takara, Shiga, Japan) in standard reaction buffer. The PCR conditions consisted of one cycle at 94 °C for 4 min, 30 cycles at 94 °C for 1 min for denaturation, 56 °C for 1 min for annealing and 72 °C for 4 min for extension, and then another step at 72 °C for 6 min to ensure complete extension. The following primer sets were used to amplify the exon 16-AATK region: mouse 5′-TGCAGTCCTGTGCCTTGCGA-3′ (forward) and 5′-AGAGATGCCCTCTGCAGGGTAGT-3′ (reverse); rat 5′-CAGGAATCCCTTCGCCAACGTC-3′ (forward) and 5′-AGATGCCCTCTGCAGGGTAGT-3′ (reverse). PCR products were purified with QIAXII (QIAGEN, Hilden, Germany), and subjected to a direct sequence analysis with CEQ2000 Dye Terminator Cycle Sequencing with the Quick Start Kit (Beckman Coulter, Fullerton, Calif., USA) and a Beckman CEQ2000 automatic sequencer (Beckman Coulter, Fullerton, Calif., USA). Sequence analyses were done with computer software, Genetyx-Win (Genetyx, Tokyo, Japan).

Results and discussion

Genomic organization of human IRSp53

When we started this study, few genomic sequences for IRSp53 were detectable in public databases, which made it impossible to determine the genomic organization of the IRSp53 gene only with computational analyses. Thus, we attempted to isolate genomic clones especially covering the C-terminal region with primer pairs designed based on the cDNA sequences. As the order of exons and their boundaries were unknown at that time, multiple combinations of primers were used to try to clone intronic sequences. Several sets of primers successfully generated DNA fragments by PCR and the nucleotide sequences were determined, some of which were deposited in a public database (see Comments). After more genomic sequences were deposited in public databases, along with the progress of human genome project, it became easier to identify genomic sequences covering the IRSp53 gene by BLAST searches with the cDNA sequences as well as the genomic sequences we determined. The representative clones and sequences covering the IRSp53 gene turned out to be PR11-149I9 (AC115099) and RP13-1277B16 (AC129919). These sequences had no annotation for IRSp53 to date.

Comparing the cDNA (accession numbers NM_017450, NM_017451 and NM_006340 for the S-, L- and T-forms) and genomic sequences, the human IRSp53 gene spanned about 82 kb, and consisted of 17 exons (Fig. 1). Except for the transcriptional termini (exons 14, 16 and 17), all the exon-intron boundaries were accorded for the consensus GT-AG rule (Fig. 2). The common part of the four isoforms was encoded by exon 1 through 13 (Fig. 3). The S-form went through to exon 14 and ended with a polyadenylation (polyA) signal. The nucleotide sequence we previously determined (AB017120) had 2,033 bp followed by the polyA tail, while the NM_017450 sequence had additional 135 bp. There was a typical polyA signal in the genomic sequence near the end of AB017120, but also a continuous A sequence as well in the genome (at 96747 in AC115099). Thus, the exact termini of transcripts are somewhat uncertain, and we use a longer transcript in comparison in Fig. 2. The T-form skipped exon 14 and used exons15 and 16. Both the L and M forms skipped exons14 and 15 and reached to exon 16. The M-form ended with the polyA signal near the downstream boundary of exon 16. In contrast, the L form left exon 16 halfway, just ahead of the stop codon in frame, and resumed at exon 17, resulting in further extension of amino acid coding. The splice donor site in exon 16 for the L-form was also accorded for the consensus GT-AG rule, but slightly irregular as intronic +3 and 4 was CT (see below). It should be noted that there were only 5 bp between exons 16 and 17, but the downstream boundary of exon 16 did not provide a splice donor site.

Fig. 1.
figure 1

Genomic organization of human and mouse IRSp53 genes. Schematic illustrations showing that the genomic organization is generally well conserved between human and mouse, except for exon 17. Note that although we have tried to illustrate it as faithfully as possible, the exons and several introns are too small to draw in scale

Fig. 2.
figure 2

Exon-intron boundaries of the human and mouse IRSp53 genes. The boundaries were defined by alignment of the cDNA and genomic sequences. It should be noted that the downstream boundaries for exons 14, 16 and 17 are the end of the cDNA sequences indicated, and does not necessarily mean the position of the polyA tail. Upper and lower cases indicate exon and intron sequences, respectively, and the position of the boundary nucleotide in the given sequences are indicated. The accession numbers of referenced sequences are AC115099 for human genome, NM_017450 for human S-form cDNA, NM_017451 for human L-form cDNA, NM_006340 for human T-form cDNA, NT_039521 for mouse genome and AF390178 for mouse cDNA

Fig. 3.
figure 3

Pattern of alternative splicing of the human IRSp53 gene to generate four isoforms. The black shaded regions are used by respective isoforms; aaa polyA tails

Genomic organization in rodents

Mouse and human IRSp53 were well conserved despite of long evolutional history. When compared with the S-form cDNA, they were 96% identical over the 522 amino acids and 87% identical at the nucleotide level over the entire coding region. Although three isoforms (S, T and M) were identified in rodents (including mouse, rat and hamster), it has been argued whether the L-form existed in rodents (Alvarez et al. 2002). We determined mouse and rat genomic sequences covering the region downstream of exon 16 (AB105194 and AB105195, respectively). Rat has been frequently used in studies on IRSp53 as its brain, one of the main expression sites, is larger than mouse. Recently, a mouse genomic sequence covering the entire coding region of IRSp53 was deposited in the public database (NT_039521). Comparing the cDNA (AF390178) and genomic sequences, the mouse IRSp53 gene spanned about 64 kb, and consisted of 16 exons (Fig. 1). The genomic organization was also well conserved between human and mouse, although the size of several intron sequences varied. The exon-intron boundaries were similar (Fig. 2), and the generation mechanism of the M, S and T-forms was identical to human. However, there was a notable difference between human and rodent genomes, which affected the generation of the L-form. Mouse and rat sequences were shorter by about 400 bp in the region corresponding to human exon 17. In addition, there were many discordant nucleotides in the distal half of exon 16 and most parts of exon 17 when these were aligned (Fig. 4). The G nucleotide situated at the position corresponding to the splice donor site within exon 16 in human, was replaced with A in rodents (Fig. 4 arrow). Although several computer programs to predict splice sites (Splice view, http://l25.itba.mi.cnr.it/~webgene/wwwspliceview.html and Splice Site Prediction by Neural Network in Berkley Drosophila Genome Project http://www.fruitfly.org/seq_tools/splice.html) poorly recognized the splice donor site in human, the nucleotide substitution in rodents further decreased the possibility as the substitution abolished the GT consensus sequence at the boundary. Together with the finding of lack of the coding sequence specific to the L-form, we conclude that rodents do not generate the L-form.

Fig. 4.
figure 4

Alignment of human, mouse and rat nucleotide sequences covering exons 16 and 17 of IRSp53. Upper and lower cases indicate exon and intron sequences, respectively. The identical nucleotides and gaps are indicate with an asterisk and a dash, respectively. The nucleotide change affecting the generation of the L-form is indicated by an arrow, and is also shown in Fig. 5. The accession numbers of the sequences are AC115099 for human, NT_039521 for mouse and AC105195 for rat. The regions for exon 16 and 17 for IRSp53 are boxed. As the 3′ terminus of the AATK gene is unknown, it is not boxed but just indicated

Fig. 5.
figure 5

Nucleotide change affecting in the generation of the L-form of IRSp53. Upper and lower cases indicate exon and intron sequences, respectively. The identical nucleotides between human and mouse are indicated with an asterisk. The amino acid sequences in the M-form, which reads through the indicated point, and the L-form, which is generated by splicing, are indicated below the nucleotide sequences

We previously detected each protein isoform with specific antibodies recognizing the unique amino acid sequences at the C-terminus, where the L-form-specific antibody recognized some protein species in rat brain tissues (Okamura-Oho et al. 2001). Based on the study described here, the detected protein species were not derived from the IRSp53 L-form, although other results are still valid. The L-form-specific amino acid sequence may be coded for by another gene in rodents, as suggested (Alvarez 2002). The fifth isoform was recently reported in rodents, which lacks 40 amino acids encoded by exon 9 (Alvarez et al. 2002). This is generated by the use of an additional splice acceptor site within exon 9 instead of the start position of exon 9 described in this report. A similar mechanism may be possible in human; however, no such transcripts detectable in human by RT-PCR were reported. The isoforms with and without the 40 amino acids may reflect the size difference in protein between 53 and 58 kDa. However, there is still confusion regarding the identity of protein species, some of which may due to different conditions for SDS-PAGE and as yet unknown posttranscriptional modifications. Therefore further studies will be required to identify protein species.

As regards evolution, it is interesting whether humans (or ancestor species) have gained the additional L-form or rodents have lost one of the isoforms. The former is plausible as each isoform of IRSp53 is involved in fine-tuning of its function, which may contribute to advancement of the central nerve system. This should be clarified by examination of other mammalian genomes, as well as by functional analyses of each isoform.

Finally, the downstream sequence (about 130 bp) of IRSp53 was overlapped with the AATK gene, which encoded apoptosis-associated tyrosine kinase (Gaozza et al. 1997) (Fig. 4). The orientation of the transcripts was opposite (thus, encoded by the complementary strands) and the overlap occurred in their 3′-non-coding regions. This is highlighted by homology between human and rodent sequences in the vicinity of the end of exon 17, although the region was not used for IRSp53 in rodents. The overlap is one of the examples for enriched gene distribution in a particular region of genome.