Introduction

Cannabis sativa L. is an erect, diploid, mostly dieocious (Van Bakel et al. 2011), outcrossing (Forapani et al. 2001) annual herb within the Cannabaceae family (Small and Cronquist 1976). The species is characterised by the production of a large range of biologically active secondary plant metabolites (ElSohly and Slade 2005; Gertsch et al. 2010; Werz et al. 2014), with a subset of over 70 terpenophenolic phytocannabinoid (cannabinoid) compounds (ElSohly and Slade 2005), some of which appear unique to C. sativa (Appendino et al. 2011; Gertsch et al. 2010). Cannabinoids are synthesised in plants in their carboxylic acid forms (Swift et al. 2013) and accumulate principally within glandular trichomes occurring on female inflorescences (Happyana et al. 2013). These form neutral cannabinoids, in a non-enzymatic thermal conversion reaction (Dussy et al. 2005), with the most notable conversion being that of delta-9-tetrahydrocannabinolic acid (THCA) to the psychoactive cannabinoid delta-9-tetrahydrocannabinoid (THC).

Cannabinoids have modulating effects on the human endocannabinoid system and are believed to be beneficial in a number of physiopathological processes (Izzo et al. 2009). The pentyl-alkyl-cannabinoids THCA and cannabidiolic acid (CBDA) are the major constituents found in plants (De Backer et al. 2009; Swift et al. 2013), although a series of propyl-alkyl-cannabinoid homologues [delta-9-tetrahydrocannabivarinic acid (THCVA) and cannabidivarinic acid (CBDVA)] also occur in plants from specific geographical regions (Baker et al. 1980; Hillig and Mahlberg 2004). Pentyl-cannabinoids are formed from cannabigerolic acid (CBGA) (Taura et al. 2007b), while propyl-cannabinoids are formed from cannabigerovarinic acid (CBGVA) (Shoyama et al. 1984). Oxidocyclization reactions catalysed by THCA and CBDA synthase (Shoyama et al. 1984), form THCA + THCVA (THC(V)A) and CBDA + CBDVA (CBD(V)A) respectively (Sirikantaramas et al. 2004; Taura et al. 2007b) (Fig. 1).

Fig. 1
figure 1

A single A and B locus genetic model governing cannabinoid composition in Cannabis sativa L. Adapted from (De Meijer et al. 2009). Black lines indicate alternative cannabinoid biosynthetic pathways. Arrows correspond to associated cannabinoid accumulation and chemotype. The A pentyl allele is associated with the accumulation of pentyl-cannabinoid intermediate CBGA, while the A propyl allele is associated with the accumulation of propyl-cannabinoid intermediate CBGVA (Shoyama et al. 1984). The B T and B D alleles encode functional THCA synthase and CBDA synthase homologs respectively (De Meijer et al. 2003). The B T0 allele encodes a non-functional THCA synthase homolog, while B Dw, B D01, and B D02 encode non-functional CBDA synthase homologs (Onofri et al. 2015). The B T allele is associated with THCA + THCVA (THC(V)A) and chemotype I (B T) and chemotype II (B T B D). The B D allele is associated with CBDA + CBDVA (CBD(V)A) accumulation and chemotype III (B D) and chemotype II (B T B D). The B T0, B Dw, B D01, and B D02 alleles are associated with CBGA + CBGVA (CBG(V)A) accumulation and chemotype IV; cannabidiolic acid (CBDA); cannabidivarinic acid (CBDVA); cannabigerolic acid (CBGA); cannabigerovarinic acid (CBGVA); delta-9-tetrahydrocannabinolic acid (THCA); delta-9-tetrahydrocannabivarinic acid (THCVA)

THC(V)A, CBD(V)A and their derivatized forms exert differing actions on the human endocannabinoid system and are considered separate pharmacological entities (Izzo et al. 2009). Three main cannabinoid chemical phenotypes (chemotypes) are recognised based on CBD(V)A and THC(V)A composition (De Meijer et al. 2003); Chemotype I (CBD(V)A: THC(V)A ratio >1), chemotype II (CBD(V)A: THC(V)A ratio close to 1:1), and chemotype III (CBD(V)A: THC(V)A: ratio <1) (De Meijer et al. 1992). Additional chemotypes IV (CBG(V)A: CBD(V)A/THC(V)A >1) and V (total cannabinoid content <0.02 %) also occur at low levels within a subgroup of the C. sativa gene pool (Pacifico et al. 2006).

The ability to characterise cannabinoid composition is important for the comprehensive screening of germplasm collections, and is especially relevant in planning targeted breeding within obligate outcrossing dioecious species such as C. sativa (Forapani et al. 2001). Accurately characterising chemotypic variation offers the potential to identify allelic variation that can contribute to novel cultivars capable of meeting the demands of quality, safety, and efficacy necessary for the manufacture of cannabinoid-based pharmaceutical-grade botanical extracts (Potter 2014), or indeed for eliminating psychoactive constituents from industrial hemp (Mandolino and Carboni 2004). Genetic metabolic engineering through exploitation of natural occurring allelic and chemotypic diversity within the gene pool (Barker et al. 2007), and subsequent generation of novel recombined breeding lines using marker assisted selection (Mandolino and Carboni 2004; Potter 2014), may lead to the development of standardised multi-targeting botanical drug products from a single plant line. Rigorous characterisation of chemotype may also aid in the development of fibre hemp and seed cultivars which can maintain levels of THC within legal regulatory thresholds of 0.2 % dry weight (w/w) (DW) (Kojoma et al. 2006; Mandolino and Carboni 2004; Pacifico et al. 2006).

A number of chemotyping and genotyping methods have been developed to characterise and predict cannabinoid composition. Gas chromatography is commonly used to characterise chemotype and relies on the thermal conversion of acidic to neutral cannabinoids, although this reaction can vary between laboratories and is only partial (Dussy et al. 2005). Liquid chromatography methods developed more recently can detect both acidic and neutral cannabinoids, therefore providing a more precise characterisation of chemotype (De Backer et al. 2009). Several DNA markers associated with the genes encoding THCA and/or CBDA synthase have been found beneficial in predicting chemotype during early stages of plant development (Kojoma et al. 2006; Pacifico et al. 2006; Rotherham and Harbison 2011; Staginnus et al. 2014), with the most comprehensively studied in terms of genetic linkage and sample population screening being the dominant D589 (Staginnus et al. 2014) and co-dominant B1080/B1192 (Pacifico et al. 2006) DNA sequence characterised amplified region (SCAR) markers respectively.

To date, the available chemotyping and genotyping methods have only been applied to a subset of the C. sativa gene pool (Pacifico et al. 2006; Rotherham and Harbison 2011; Staginnus et al. 2014). Given the extensive genetic (Faeti et al. 1996; Gao et al. 2014; Gilmore et al. 2007; Hillig 2005) and chemotypic variability (Baker et al. 1980; Hillig and Mahlberg 2004) which appears to exist, use of each approach in isolation may not be sufficient to account for the full extent of variation in cannabinoid composition within the species. In the present study, we carried out a representative survey from a wide range of geographically and genetically diverse accessions with differing domestication histories. Using a combination of liquid chromatography–mass spectrometry (LC–MS) cannabinoid profiling and DNA SCAR markers, we explored the relationship between chemotype and genotype in order to develop a comprehensive strategy for characterising chemotypic diversity in C. sativa.

Materials and methods

Genetic resources

All experimental work, acquisition and storage of the prohibited plant was performed in accordance with Sections 23(4)(b) and 41(b) of the NSW Drug Misuse and Trafficking Act 1985, and under the appropriate licences issued by the New South Wales Ministry of Health, Australia. Accessions were sourced from a global germplasm collection owned by Ecofibre Industries Operations (EIO) and managed by Southern Cross University (Table 1). Three genetically diverse C. sativa groupings were selected (Hillig 2005), including seven European fibre-type accessions, seven East Asian fibre-type accessions and eight mixed origin drug-type accessions.

Table 1 Description of 22 Cannabis sativa L. accessions used to examine the relationship between chemotype and genotype

Growth parameters

Seeds were planted in seedling trays at a depth of 1 cm in a soil mix comprising of one part vermiculite, one part peat moss, one part perlite and dolomite (110 g/100L), and supplemented with CANNA® Aqua Vega nutrient solution. Seeds were grown at temperatures between 26 and 30 °C. 500 mL of water was applied to13 seedling trays (40 cells of 5.5 × 4 cm) three times daily for 14 days and were subject to 11 h of 600 w HPS/metal halide light (luminous flux equal to 72,000 lumens) per day. Three female progeny from each dioecious accession were selected, while three hermaphrodite plants with mainly female reproductive tissues from monoecious accessions A (TS1300610), B (TS1300026), D (TS1300041) and G (TS1300287) were selected.

Seedlings were transferred into 8 L pots with 1 part vermiculite, 1 part peat moss, 1 part perlite soil mix and dolomite (110 g/100 L). Each 8 L pot contained 100 g Osmocote® Exact nutrient mix and 8 g of Micromax® micronutrient formula. Plants were grown in bespoke pollen secure environmentally controlled closed loop filtered air system growth chambers fitted with ‘smart valves’ for optimal water regimes. Plants were subject to 11 h of 600 w HPS/metal halide light (luminous flux equal to 72,000 lumens) per day and kept at a temperature between 28 and 32 °C to promote flowering. Plants were harvested at maturation, which was determined when approximately 95 % of the stigma present on the apical inflorescence were browned and shrivelled (Staginnus et al. 2014) and before the onset of seed formation (Pacifico et al. 2006).

DNA marker assays

DNA extraction

Plant leaf material was removed from the apical node of 14 day-old plants. DNA was extracted using a DNeasy® plant mini kit (Qiagen #69104). Tissue disruption was completed manually using a pestle and mortar and ground under liquid nitrogen. Purification of DNA was determined using the Nanodrop 2000 UV–Vis spectrophotometer (Thermo Scientific). Ratio of absorbance at 260/280 nm was determined for all samples, with ratios of 1.7–1.9 and symmetric peaks at 260 nm used to infer optimal DNA purity.

DNA amplification

Amplification of the D589 marker fragment was achieved using the primers F: 5′ CCTGAATTCGACAATACAAAATCTTAGATTCAT 3′; R: 5′ ACTGAATATAGTAGACTTTGATGGGACAGCAACC 3′ (Staginnus et al. 2014). These primers included four SNPs associated with the functional variant of THCAS (Kojoma et al. 2006) (B T allele). Each reaction contained 1.5 mM MgCl2, 0.2 mM dNTPs, 0.2 µM of each of the forward and reverse primers, and 2 U Platinum® Taq DNA Polymerase (Life Technologies #10966-034). Thermocycling parameters used during PCR were 96 °C for 2 min, then 35 cycles of 94 °C for 20 s, 64 °C for 30 s, 72 °C for 1 min 30 s, followed by a final extension of 72 °C for 5 min.

Amplification of B1080/B1192 marker fragments was achieved using a primer common to both synthases F: 5′ AAGAAAGTTGGCTTGCAG 3′, and a THCAS-specific primer R: 5′ TTAGGACTCGCATGATTAGTTTTTC 3′, and a CBDAS-specific primer R: 5′ ATCCAGTTTAGATGCTTTTCGT 3′ (Pacifico et al. 2006). PCR parameters have not been previously reported for the co-dominant DNA marker (Onofri et al. 2015; Pacifico et al. 2006) and these were optimised as follows: each reaction contained 1.5 mM MgCl2, 0.2 mM dNTPs, 0.4 µM for the forward primer and 0.2 µM for THCAS-specific and CBDAS-specific reverse primers, and 2 U Platinum® Taq DNA Polymerase (Life Technologies #10966-034). Thermocycling parameters were 94 °C for 2 min, then 25 cycles of 94 °C for 30 s, 58 °C for 30 s, 72 °C for 1 min 15 s.

PCR reactions were performed in 0.2 mL 96 well PCR plates (Thermo Scientific #AB-0600) sealed with flat cap strips (Thermo Scientific #AB-0786) using a Gradient Palm–Cycler™ (Corbett Life Science) and occurred in a total volume of 50 µL. D589 and B1080/B1192 amplification products were separated by electrophoresis on a 1.5 and 1 % SeaKem® LE agarose gel (Cambrex #50004) stained with GelRed™ (Biotium #41003) respectively. Amplification products were then visualized under UV illumination using the Bio-Rad Molecular Imager® Gel Doc™ XR+ system using Image Lab™ software.

LC–MS cannabinoid profiling

Sample extraction

Bracts, bracteoles and surrounding leaf tissues were collected from the upper 30 cm of female inflorescences [International Union for the Protection of New Varieties of Plants (UPOV) (www.upov.int)]. Sample preparation and extraction followed those of De Backer et al. (2009). Leaf material was dried at 35 °C in a forced ventilation oven for 48 h. Seeds and stalks were separated and discarded manually and 1 g of dried leaf material was ground using a Mixer Mill MM 301 (Retsch GmbH) at 30 rotations/second for 30 s.

Extractions were performed in duplicate for each plant. 250 mg of dried ground leaf tissue was weighed into a 25 mL volumetric flask and extracted by agitation in 25 mL mixture of methanol/chloroform (v/v: 9/1) for 30 min. 1 mL of the extract was evaporated under a gentle stream of nitrogen (N2) and resuspended in 1 mL of water/methanol (v/v: 5/5). Samples were sonicated for 30 s and centrifuged using a Compact centrifuge 2–5 (Sigma) at 3000 rpm for 7 min to remove particulate material. 500 µL of the supernatant was transferred into 2 mL screw cap glass amber vial (Agilent Technologies #5182-0716). All solvents used for extractions and HPLC analysis were HPLC grade.

LC–MS parameters

LC–MS chromatographic runs were performed using an Agilent 1290 Infinity analytical HPLC instrument (Agilent Technologies, Palo Alto, CA, USA) equipped with a vacuum degasser, binary pump, autoinjector, and diode array detector (DAD, 1260), coupled with an Agilent 6120 Quadrupole mass detector (MSD). The LC–MS system was controlled using Agilent ChemStation software (Rev. B.04.03). Absorbance was monitored at 210, 272, 280, 330, and 360 nm. The column used was an Agilent Eclipse plus rapid resolution high definition (RRHD) C18 column (1.8 μm; 50 × 2.1 mm internal diameter). Column temperature was set at 30 °C.

A linear gradient elution program was applied with the mobile phase containing acetonitrile with 0.005 % trifluoroacetic acid (TFA) and Milli-Q water with 0.005 % TFA. The solvent gradient was programmed from 10 to 99 % acetonitrile with 0.005 % TFA in 11.5 min. 99 % acetonitrile with 0.005 % TFA was maintained for 1.5 min, recalibrated to 10 % in 2 min and then held at 10 % for a further 1 min. The total run time was 15 min. Flow rate and injection volume were set at 0.3 mL/minute and 3 µL respectively.

MSD parameters and settings followed those of a previously validated method (Liu et al. 2014). The MSD was operated in atmospheric pressure ionization–electrospray mode using the following parameters: scan mass range, 100–1200; fragmentor, 150; capillary voltage, 3000 V (positive); drying gas flow, 12.0 L/min (N2); vaporizer temperature, 350 °C; nebuliser pressure, 35 psi; drying gas temperature, 350 °C. THCA, CBDA, CBGA, cannabigerol (CBG), delta 9-tetrahydrocannabivarin (THCV) (Weesp, The Netherlands), THC, CBD, and cannabinol (CBN) (Capalaba, Australia) cannabinoid reference standards were scanned to determine the most abundant and representative signal. Cannabinoids were quantified using selected-ion monitoring. Cannabinoids were allocated to one of four available MSD signal channels (Online Resource 1).

Linearity and reproducibility

Concentrations of cannabinoids were determined from the linear regression equation of calibration curves of individual reference standards. Calibration curves were obtained from five standard solutions comprising all eight cannabinoid standards, with standard solution concentrations at 0.08, 0.4, 2, 10, and 50 µg/mL. Calibration curves were calculated using unweighted linear regression analysis and expressed using R 2. Calibration curves for all eight cannabinoids were linear within the concentration range R 2 > 0.99 (Online Resource 2). Calibration curves were conducted every 24 h to allow for changes in response factor. Sample cannabinoid concentrations >50 µg/mL were diluted in water/methanol (v/v: 5/5) in a 1:20 dilution to ensure signals were within calibration range.

To determine the precision of the LC–MS instrument, standard solutions were injected six times. Relative standard deviation (RSD) for each cannabinoid peak area was found to be <2 %. THCtotal (= THC + THCA), CBDtotal (=CBD + CBDA), and CBGtotal (= CBG + CBGA) (Swift et al. 2013; Taura et al. 2007a) between sample replicates were typically <5 % RSD. One way analysis of variance (ANOVA) calculated using GenStat 64-bit Release 16.1 (VSN International 2014) indicated that there was no significant difference in THCtotal F 1, 130 = 0.01, P = 0.936, CBDtotal F 1, 130 = 0.00, P = 0.985, and CBGtotal F 1, 122 = 0.02, P = 0.898 between replicates.

Results

We were able to classify individual plants of each accession according to their chemotype and DNA marker genotypes. A combination of the D589 and B1080/B1192 DNA SCAR marker assays generated a haplotype for each plant which enabled us unequivocally to assign haplotype to chemotype. Associations between chemotype and haplotype were determined specifically at the plant level, as variation in chemotype and haplotype occurred between accessions and segregated within accessions.

Chemotype

Individual plants from each accession were allocated a chemotype on the basis of their THCtotal, CBDtotal, and CBGtotal % DW concentrations (Online Resource 3). A histogram of log10 THCtotal and CBDtotal ratios of individual plants (N = 66) shows three discrete distributions (Fig. 2). Based on these distributions, plants with a log10 CBDtotal: THCtotal ratio −2.64 to −1.88 were assigned to chemotype I, plants with a log10 CBDtotal: THCtotal ratio between −1.15 and 0.46 were assigned to chemotype II, while plants with a log10 CBDtotal: THCtotal ratio of 0.95–1.51 were assigned to chemotype III.

Fig. 2
figure 2

Frequency distribution of CBDtotal: THCtotal DW log10 ratios of individual Cannabis sativa L. plants derived from 22 accessions. Three discrete distributions are associated with chemotypes I–III (Hillig and Mahlberg 2004). Dotted lines indicate log10 ratio divisions between chemotype I, II, and III; cannabidiol (CBD); cannabidiolic acid (CBDA); CBD + CBDA (CBDtotal); dry weight (w/w) (DW); delta-9-tetrahydrocannabinol (THC); delta-9-tetrahydrocannabinolic acid (THCA); THC + THCA (THCtotal)

No plants were found to have levels of CBGtotal that exceeded 10 % of the cannabinoid fraction or with levels greater than either THCtotal or CBDtotal, although variability in the accumulation of CBGtotal was observed across chemotypes (Fig. 3). Therefore, no plants were classified as chemotype IV. Individual plants within eight out of 22 accessions had more than one chemotype. Chemotype I, II, III were evenly distributed within the sample population. CBN contributed negligibly to chemotype and was either <LOQ, or at levels no >0.004 % DW (data not shown).

Fig. 3
figure 3

Penyl-cannabinoid composition within the cannabinoid fraction of individual Cannabis sativa L. plants derived from 22 accessions indicating variability both within and between accessions and chemotypes. Letters indicate accession ID (Table 1). Numbers correspond to each of three plant individuals per accession. Chemotypes determined from the frequency distributions of CBD + CBDA (CBDtotal): THC + THCA (THCtotal) log10 ratios of individual C sativa plants; cannabidiol (CBD); cannabidiolic acid (CBDA); cannabigerol (CBG); cannabigerolic acid (CBGA); CBG + CBGA (CBGtotal); delta-9-tetrahydrocannabinol (THC); delta-9-tetrahydrocannabinolic acid (THCA)

Five individual plants from the East Asian accessions J (TS1300283) and H (TS1300317) had a THCV:THC ratio >1. HPLC chromatograms at 272 nm showed two unknown peaks at 7.5 and 9.1 min for all five plants (Table 2). Using UV maxima data and peak fragment ions determined from a positive MSD scan, these compounds were tentatively identify as THCVA and CBDVA.

Table 2 HPLC–DAD spectrum and MSD fragment ion of acidic cannabinoids

Marker genotypes

Individual plants from each accession were classified according to their marker genotypes, generating a haplotype from a combination of the D589 and B1080/B1192 DNA SCAR marker assays. Plants were scored separately for each marker (Online Resource 3). Conformity between the D589 and B1080/B1192 marker genotypes was observed (Table 3). The D589 marker assay scored plants as either B Tpresent or B Tabsent, while the B1080/B1192 marker assay scored plants as either B THCAS, B THCAS B CBDAS, or B CBDAS. All plants with a B Tpresent D589 marker genotype had either a B THCAS or B CBDAS/THCAS B1080/B1192 marker genotype, while all plants with B Tabsent D589 marker genotype were found to have a B CBDAS B1080/B1192 marker genotype.

Table 3 D589 and B1080/B1192 DNA SCAR marker genotypes and associated chemotype of individual Cannabis sativa L. plants derived from 22 accessions

Chemotype and marker genotype

The chemotype of 65 out of 66 plants were correctly predicted by both D589 and B1080/B1192 DNA SCAR marker assays (Table 3). However, the dominant D589 marker assay was less specific in predicting chemotype than the co-dominant B1080/B1192 marker assay, and was unable to differentiate between chemotype I and II. 45 out of 46 chemotype I and II plants were correctly predicted by the presence of the D589 B Tpresent marker genotype (sensitivity of 97.8 %), while 20 out of 20 chemotype III plants were correctly predicted by the absence of the B Tabsent genotype (specificity of 100.00 %). Similar results were observed with the B1080/B1192 marker assay, with 27 out of 27 chemotype I and 18 out of 19 chemotype II plants correctly predicted by B THCAS and B THCAS B CBDAS marker genotypes respectively, while 20 out of 20 chemotype III plants were correctly predicted by the B CBDAS marker genotype. As expected there were highly significant associations between chemotype I–III and marker genotype, Pearson’s \(\chi^{ 2}_{{ 2,{ 66}}} = 6 1. 6 3\), P < 0.001 (GenStat 64-bit Release 16.1).

A higher level of variability in cannabinoid composition occurred in chemotype II B THCAS B CBDAS genotypes compared with chemotype I B THCAS genotypes and chemotype III B CBDAS (B Tabsent) genotypes (Fig. 4). Chemotype II B THCAS B CBDAS genotypes ranged from 25.1 % THCtotal (72.4 % CBDtotal)/total cannabinoid fraction to 86.3 % THCtotal (13.8 % CBDtotal). In contrast, chemotype I B THCAS genotypes and chemotype III B CBDAS (B Tabsent) genotypes ranged from 91.3 % THCtotal (0.3 % CBDtotal) to 99.5 % THCtotal (0.5 % CBDtotal) and 6.3 % THCtotal (93.7 % CBDtotal) to10.1 % THCtotal (89.9 % CBDtotal)/total cannabinoid fraction respectively.

Fig. 4
figure 4

CBDtotal: THCtotal compositional variability within and between Cannabis sativa L. chemotypes and its relationship with the B1080/B1192 DNA SCAR marker genotype. Black arrow indicates previous range in CBDtotal: THCtotal compositional variability within chemotype II B THCAS B CBDAS genotype (Pacifico et al. 2006). Data points correspond to CBDtotal: THCtotal composition within the total cannabinoid fraction including CBGtotal. Square, circular, and triangular data points correspond to chemotype III, II, and I respectively; sequence characterised amplified region (SCAR); cannabidiol (CBD); cannabidiolic acid (CBDA); CBD + CBDA (CBDtotal); delta-9-tetrahydrocannabinol (THC); delta-9-tetrahydrocannabinolic acid (THCA); THC + THCA (THCtotal)

Cannabinoid homologues

A lack of correspondence between chemotype and genotype was observed in a single individual, H-3, from the East Asian accessions H (TS1300317). This individual had B Tabsent and B CBDAS genotypes associated with chemotype III (Table 3), despite exhibiting a THCtotal: CBDtotal ratio associated with chemotype II (Fig. 3). CBDVA:THCVA log10 ratios, determined from THCA and CBDA standards, corresponded to chemotype in four/five plants which demonstrated a THCV:THC>1. H-3 was found to have CBDVA at levels exceeding other cannabinoids, and produced a CBDVA:THCVA log10 ratio of 1.17 and so was associated with chemotype III. Pentyl- and propyl-chemotypes and B Tabsent + B CBDAS haplotype for H-3 were confirmed through duplicated LC–MS sample replicates, and through repeat DNA extraction and amplification using D589 and B1080/B1192 DNA SCAR markers.

Discussion

Chemotype II variability

Previously it was shown that chemotype II B THCAS B CBDAS genotypes exhibited a CBDtotal: THCtotal log10 ratio ranging from 0.36 to 0.01 (Pacifico et al. 2006). In the present study, chemotype II B THCAS B CBDAS genotypes exhibited a CBDtotal: THCtotal log10 ratio ranging from 0.46 to –1.15, a greater than three-fold increase in variability (Fig. 4). GC analysis of chemotype II has typically demonstrated a more narrow range skewed towards a high CBDtotal to THCtotal ratio (De Meijer et al. 1992; Hillig and Mahlberg 2004; Weiblen et al. 2015), although a low CBDtotal to THCtotal ratio similar to those reported here have been observed in Southeast Asian landraces (Tipparat et al. 2012). Nevertheless, this is the first time that such a wide range in chemotype II variability has been unequivocally assigned to genotype and heterozygosity of THCAS and CBDAS.

In a single (B) locus model, chemotype II individuals are assumed to be heterozygote for the co-dominant B T (THCAS) and B D (CBDAS) alleles (De Meijer et al. 2003). These alleles encode functional sequence variants of THCA and CBDA synthase, and therefore the efficiency (Pacifico et al. 2006) with which CBG(V)A is converted to THC(V)A (B T) and CBD(V)A (B D) (Sirikantaramas et al. 2004; Taura et al. 2007b) (Fig. 1). All things being equal chemotype II individuals would be expected to have a CBDtotal:THCtotal ratio close to 1:1. Deviation from this ratio within chemotype II has previously been proposed to be due either to increased catalytic efficiency of CBDA synthase over THCA synthase (Weiblen et al. 2015), or from genetic vs environment interactions (Potter 2009). Considering that ratios of chemotype II plants deviated both towards CBDtotal and THCtotal, and that all plants were grown in environmentally controlled conditions, it could be suggested that chemotype II variability is largely influenced by genetic control.

Chemotype II B THCAS B CBDAS genotype variability may be generated by the presence of alternative B locus alleles. Four additional alleles encoding reduced or non-functional variants of THCAS (B T0) and CBDAS (B DW; B D01; B D02) have been observed within chemotype IV (Onofri et al. 2015) (Fig. 1). Individuals homozygote for these alleles have a reduced capacity to form THCA/CBDA, resulting in the accumulation of CBGA. These alleles are associated with nonsynonymous substitutions (Onofri et al. 2015), some of which are non-specific to the D589 and B1080/B1192 marker genotypes. It is possible these and potentially other alleles contribute to chemotype II variability. If these alleles were present in sufficient frequency to account for chemotype II variability, one would expect a high frequency of chemotype IV throughout the gene pool. However, this is unlikely to be the case, as the distributions of chemotypes I, II, and III appear to be sufficient to account for the variation observed within the gene pool (Fig. 3) (De Meijer et al. 2003; Hillig and Mahlberg 2004; Pacifico et al. 2006).

Variability within chemotype II B THCAS B CBDAS genotypes may be more comprehensively explained by a multi-locus model. The identification of multiple sequence variants of THCAS or CBDAS within individuals (Kojoma et al. 2006; Onofri et al. 2015; Van Bakel et al. 2011; Weiblen et al. 2015) and segregation in mapping populations, suggests that at least two closely linked loci govern THC(V)A and CBD(V)A composition. Recent evidence indicates that THCAS and CBDAS are positioned 1.1 cM apart (Weiblen et al. 2015). The presence of contrasting functional allelic variants of THCAS and CBDAS occurring at multiple loci may be a possible explanation as to the range and variability in CBDtotal: THCtotal ratios occurring within chemotype II, and is also consistent with the reduced representation of chemotype IV within the gene pool. Analysis of THCAS and CBDAS sequence variants and comparative genomic approaches using next generation sequencing technologies may help to determine the genomic organisation of chemotype, and whether alternative alleles contribute to chemotype II variability.

Characterisation of chemotype

The large range of CBDtotal: THCtotal ratios observed within chemotype II B THCAS B CBDAS genotypes would appear to obscure the distinction between chemotypes. Chemotypes have previously been classified on the basis of CBDtotal: THCtotal log10 histogram frequency distributions (Hillig and Mahlberg 2004; Pacifico et al. 2008; Staginnus et al. 2014; Tipparat et al. 2012), with CBDtotal: THCtotal log10 ratio of 0.0 (Pacifico et al. 2008) and −1.0 being used as an arbitrary division between chemotype I and II (Hillig and Mahlberg 2004; Tipparat et al. 2012). In the present study chemotype II B THCAS B CBDAS genotypes were found to exceed these log10 ratios, albeit the latter log10 ratio of −1.0 was only exceeded by a single individual I-3 (−1.15). Nevertheless, this highlights a potential limitation of characterising chemotype exclusively from cannabinoid profiling, and questions the utility of this approach in determining genotype.

Characterisation and selection for chemotype I B THCAS and chemotype III B CBDAS genotypes may aid in the compositional-selection of uniform plant lines, with B THCAS and B CBDAS genotypes found to vary by only 8.2 % THCtotal (0.2 % CBDtotal) and 3.2 % THCtotal (3.8 % CBDtotal) within the cannabinoid fraction respectively (Fig. 4). Selecting for chemotype III B CBDAS genotypes may prove especially beneficial for development of uniform plant lines for hemp fibre, seed, and pharmacological production, given the strong association between chemotype III and THC content <0.2 % DW (Pacifico et al. 2008; 2006), and growing interest in CBD(V)A and CBD(V) derivatives as pharmacological entities (De Petrocellis et al. 2011; Gallily et al. 2015; Hill et al. 2013; Iseger and Bossong 2015). However, additional cannabinoid profiling is required in order to differentiate cannabinoid homologue compositions and to characterise CBDtotal:THCtotal chemotype II variability accurately.

Due to variability within chemotype II the uniformity of botanical drug product CBDtotal: THCtotal ratios has been achieved by combining cannabinoid extracts from chemotype I and III plants (Potter 2014). However, this approach requires the growth of chemotype I drug-type plants (Swift et al. 2013) and limits compositional control over other potentially pharmacologically relevant cannabinoids (Izzo et al. 2009). Identification and recombination of novel chemotype-determining alleles (Barker et al. 2007) may facilitate the selection of chemotype variants with CBD(V)A:THC(V)A-specific ratios. This approach to genetic metabolic engineering may result in the development of intermediate chemotypes tailored for specific pharmaceutical applications.

Propyl- and pentyl-cannabinoid chemotypes

A single individual from the East Asian accessions H (H-3) was characterised as having a propyl-chemotype III and pentyl-chemotype II profile, while exhibiting a chemotype III B CBDAS B CBDAS + B Tabsent haplotype. THCA and CBDA synthase are considered isoforms of the same enzyme (De Meijer et al. 2003; Taura et al. 2007a), sharing 83.9 % amino acid identity (Taura et al. 2007b). Residues of THCA in chemotype III are thought to be due to an imperfect capacity of CBDA synthase to form CBDA (De Meijer et al. 2003). Chemotype III individuals have been observed to deviate transitionally from III to II during early stages of plant development (Pacifico et al. 2008). THCA residue formation may have occurred in H-3 during this period and the remaining CBGVA substrate catalysed to CBDVA during later developmental stages. Plants which exhibit propyl-chemotype III profiles are not frequently distributed among the C. sativa genepool (Baker et al. 1980; Hillig and Mahlberg 2004). It is possible that two alleles encoding CBGA-specific or CBGVA-specific CBDA synthases exist, although this would fail to explain why THCtotal and CBDtotal were found at an almost equal ratio in an individual presumably carrying a functional variant of CBDAS. Further allelism tests on progenies segregating for propyl- and pentyl-cannabinoids may provide insight into the genetic regulation of homologue variability.

Conclusions

We carried out a representative survey from a wide range of genetically diverse accessions to explore the relationship between chemotype and genotype, and to identify a suitable strategy for characterising cannabinoid compositional variation. A high level of variability in CBDtotal: THCtotal ratio was observed within chemotype II B THCAS B CBDAS genotypes. However, the genetic regulation underlying this variability remains uncertain. More detailed exploration of the relationship between chemotype and genotype using next generation sequencing technologies offers the potential of characterising chemotype with greater accuracy, and may lead to advances in the genetic metabolic engineering of C. sativa for pharmacological development. In the interim, a strategy which adopts the use of both the B1080/B1192 DNA SCAR marker genotype and careful cannabinoid profiling may provide an effective approach for classifying chemotypic variability within diverse germplasm collections, and may also contribute to the development of cultivars with cannabinoid profiles specific for end-use applications.