As scientists involved in risk assessment of transgenic insecticidal plants, we are greatly concerned about the publication by Lövei et al. (2009) implying that insect-protected crops based on the Cry proteins of Bacillus thuringiensis may have substantial negative impacts on non-target organisms. We believe that Lövei et al. (2009) use inappropriate and unsound methods for risk assessment that have led them to reach conclusions that are in conflict with those of several recent comprehensive reviews and meta-analyses (e.g., O’Callaghan et al. 2005; Romeis et al. 2006; Marvier et al. 2007; Wolfenbarger et al. 2008; Naranjo 2009). Lövei et al. (2009) base their findings on an analysis of 55 laboratory studies of Cry proteins and 27 studies of proteinase inhibitors (PIs; including lectins) that were published through mid-2007 and conclude that these proteins “often have non-neutral effects on natural enemies”. They further conclude that “parasitoids were more susceptible than predators to the effects of both (toxins)” and that “conclusions that Bt…gene products have no harm to natural enemies are currently overgeneralized and premature”. We are deeply concerned about the inappropriate methods used in their paper, the lack of ecological context, and the authors’ advocacy of how laboratory studies on non-target arthropods should be conducted and interpreted. Essentially, the authors have conducted a data-mining exercise without prior elaboration of a risk hypothesis framework (Romeis et al. 2008) that can provide context to their findings and interpretations. Therefore, we believe it is very important that readers consider the following points as they read Lövei et al. (2009).

Data selection and analyses

We have a major concern with the authors’ selection and use of multiple non-independent measures of various life history and behavioral traits in the analysis. As an example, they justify the use of development times and survival rates on individual instars as independent measures of effect by testing whether there is evidence of “matching” among the total development time or survivorship and individual times and rates for each of multiple stadia. Based on various criteria, the percentage matching was >50–84% but the test statistic for independence was significant, so all stadia measurements were used in the analyses. Their justification for using this method is that there might be “…complex instar-specific mortality schedules and patterns of development time”. The fundamental effect of such an approach is that it inflates purported effects in the data, and the authors acknowledge this. However, their dismissal of the potential effect that the instar analysis may have on their conclusions is confusing and unjustified. Although they state “data-driven reading of the quantitative data…provided a more accurate picture of the literature…than (the reviews) by O’Callaghan et al. (2005) and Romeis et al. (2006)”, they provide no evidence to support this statement. They cite Bai et al. (2005) as an example of the need to use their analytical methods to tease out negative results, but neglect to note that in the Bai et al. (2005) study total larval development and survivorship of Propylea japonica (Thunberg) (Coleoptera: Coccinellidae) were unaffected by exposure to Cry proteins. These are the quantities that ultimately affect population growth and are of primary importance as measurement endpoints in risk assessment studies (including those conducted for pesticide assessment; Romeis et al. 2008). For this reason, Bai et al. (2005) correctly concluded “Bt rice pollen had no negative impacts on P. japonica fitness…”. Lövei et al. (2009) also fail to justify the many other instances of non-independence in their data set where multiple, correlated life history and behavioral traits were measured on the same cohort of subject organisms. For example, many studies measured oviposition per day and total adult oviposition which are clearly correlated. The quality of independence in a meta-analysis is essential to obtain accurate and unbiased results, and authors need to go to great lengths to ensure independence, even if it means omitting hard-earned data (e.g., Marvier et al. 2007; Wolfenbarger et al. 2008; Naranjo 2009). The use of non-independent data is analogous to pseudo-replication, a well-understood problem in the scientific literature.

Secondly, Lövei et al. (2009) claim to use a weighted effect size estimator “similar to (but not the same as)… Hedge’s g” to quantify experimental effects but do not provide sufficient methodological detail to permit others to repeat their analyses. They cite their prior work (Lövei and Arpaia 2005) for methods on effect size calculation but no details are provided there either. Various effect size estimators have been developed (Hedges and Olkin 1985) and it is important to understand their strengths and limitations when interpreting results derived from these.

Finally, Lövei et al. (2009) describe an arbitrary and inappropriate classification of responses as positive or negative, but not statistically so, and then go on to ascribe importance to such non-statistically valid conclusions. By their own admission, the P values of these comparisons would be roughly 0.30, which would be considered non-significant and devoid of further meaning and interpretation, even if the goal was to increase the statistical power of the test. We believe it is incorrect to draw conclusions or implications based on results that are not statistically valid.

Prey/host-quality mediated effects

Experimental studies must have properly formulated hypotheses, experimental designs and testing methods; otherwise the interpretation of the outcomes of such tests is unreliable. This basic factor of ‘study quality’ is not addressed in the analysis by Lövei et al. (2009); rather their methodology implies that all studies are equally valid and should be given equal weight, providing each study has adequate statistical properties. We believe their approach is fundamentally flawed and does a disservice to environmental risk assessment.

One example of this problem can be seen in reports on Chrysoperla carnea Stephens (Neuroptera: Chrysopidae), a lacewing species that has been the subject of several studies. Hilbeck et al. (1998a) observed reduced fitness of C. carnea larvae when fed on Bt maize-reared lepidopteran larvae and claimed it was associated with the CrylAb protein and that CrylAb is toxic to C. carnea (Hilbeck et al. 1998b). However, subsequent studies clearly demonstrated that Cry1A proteins are not toxic to C. carnea larvae (Romeis et al. 2004; Rodrigo-Simón et al. 2006; Lawo and Romeis 2008) and that these proteins do not bind to the midgut of C. carnea, a prerequisite for toxicity (Rodrigo-Simón et al. 2006). These results strongly indicate that the effects observed by Hilbeck et al. (1998a) were due to C. carnea feeding on poor quality (sick or dying) lepidopteran prey. Additional studies with aphids (which do not ingest Cry1Ab) and spider mites [which contain high concentrations of biologically active Cry1Ab (Obrist et al. 2006)], neither of which is affected when feeding on Bt maize, demonstrated that, when these herbivores fed on Bt maize and were in turn consumed by C. carnea, the predator was not harmed (Dutton et al. 2002). These results emphasize that care must be taken when designing laboratory studies to assess the potential effects of Cry proteins and other insecticidal factors on predators; otherwise the results can easily be misinterpreted. We believe that this is certainly the situation that caused Lövei et al. (2009) to conclude incorrectly that “significant negative effects of Cry1A/Cry2A on C. carnea were 6.2 times more likely to occur than positive ones”.

Teasing out the effects of insecticidal factors on parasitoids is potentially even more difficult due to their close relationship with the host; if the host dies, the parasitoid dies. If Bt-susceptible hosts are fed on a Cry protein source and then parasitized, impacts of host quality on parasitoid fitness are expected and could be confused with toxic effects of the Cry protein (Romeis et al. 2006; Chen et al. 2008a). Using the diamondback moth (DBM) and its major parasitoid, Diadegma insulare (Cresson) (Hymenoptera: Ichneumonidae), it was clearly shown that when DBM resistant to Cry1C were parasitized, there were no effects on parasitoids that fed internally on DBM (Chen et al. 2008b). This study overcame any host-mediated effects to show the complete lack of toxicity of Cry1C to the parasitoid. This result is consistent with previous reports about the lack of toxicity of Cry1 to hymenopteran parasitoids (Schuler et al. 2003, 2004).

Overall, it is critical to account for prey- or host-mediated effects in such toxicological studies. A recent meta-analysis using Hedge’s d, a weighted effect size estimator with a sample size bias-corrector, and based on comparative laboratory studies of Bt Cry toxicity published through November 2008 (Naranjo 2009), clearly shows a negative effect of low quality hosts (susceptible hosts compromised by feeding on Bt plant tissues or purified Cry proteins) on survival, development and reproduction of parasitoids (Fig. 1). In contrast, the overall effects are neutral or even positive when high quality, uncompromised hosts are provided (Bt-resistant hosts or hosts not susceptible to Cry proteins). The effect of prey quality on predators is less pronounced, compared to parasitoids, but even a small negative effect of low prey quality on survival is neutralized when they are provided high quality prey containing Cry proteins.

Fig. 1
figure 1

Meta-analyses of laboratory studies (using Hedge’s d effect size estimator) examining non-target effects of transgenic Bt crops on arthropod predators and insect parasitoids that were exposed to Bt Cry proteins via prey or hosts that had fed on either transgenic plant materials (tritrophic exposure) or pure Cry proteins in artificial diets (direct exposure). Prey or hosts that were partially susceptible to Cry proteins and thus displayed reduced vigor were considered low quality prey. Numbers above or below the bars indicate the total number of observations for each measured biological parameter and error bars denote 95% confidence intervals; error bars that do not include zero indicate significant effect sizes (* P < 0.05). Negative effect sizes are associated with compromised performance on Bt compared with non-Bt controls. Reproduced from Naranjo (2009) with permission from Centre for Agricultural Bioscience International (CABI)

These examples demonstrate that just using a “quantitative” summary of previous laboratory studies can lead to spurious results; studies must be properly designed to tease out the effects of the insecticidal factor versus the quality of the prey or host. However, Lövei et al. (2009) did not assess the quality of the studies they used in their analyses nor did they properly partition the data so that issues of prey/host quality could be separately examined. As a consequence, their conclusion that “parasitoids were more susceptible than predators to the effects of … Cry toxins…” is due to the fact that a large majority of tritrophic studies on parasitoids have been conducted with susceptible, sublethally affected lepidopteran larvae as hosts (Romeis et al. 2006; Naranjo 2009). These indirect and potentially adverse effects are common for any method of pest control and are of minor concern within an environmental risk assessment context (OECD 1993), and they should be differentiated from direct effects of a toxin (EFSA 2006).

Ecological relevance and risk assessment

Laboratory studies, if done properly, can provide relevant information about potential ecological hazards to natural enemies and have an important role in tiered testing, an approach we advocate for conducting risk assessments on transgenic insecticidal plants (Romeis et al. 2008). Generally, the only ecotoxicologically relevant difference between a Bt crop and its non-transformed comparators is the expression of the insecticidal protein. Consequently, this is the factor of concern that needs to be assessed (Macdonald and Yarrow 2003; Raybould 2007; Romeis et al. 2008). Laboratory studies designed to be highly conservative and even unrealistic representations of what might occur in the field provide a powerful tool to assess direct toxic effects of the insecticidal protein. The data derived allow conclusions about whether the abundance and/or ecological function of natural enemies may be altered when such plants are grown in the field. It is unfortunate that Lövei et al. (2009) did not address this fundamental issue, despite the abundance of published information. A total of at least 63 field studies assessing arthropod non-target effects of Bt crops have been published as of late 2008 and several field-level meta-analyses have been completed (Marvier et al. 2007; Wolfenbarger et al. 2008; Naranjo 2009). Overall, these data have demonstrated no effect of Bt crops on biological control function in the field even though several studies have identified minor changes in abundance of some species (e.g., Naranjo 2005a, b). The methodology used by Lövei et al. (2009) is problematic not only because they ignore prey/host quality effects and give equal weight to each response parameter, as noted above, but because they fail to place any putative effect in an ecological context. Thus, the implications of overall effects claimed by Lövei et al. (2009) are not established in ecological terms. They also fail to note the comparative detrimental effect of using broader-spectrum insecticides in the field for pests targeted by transgenic crops, which the cultivation of current Bt crops has significantly reduced (Marvier et al. 2007; Brookes and Barfoot 2008; Wolfenbarger et al. 2008).

Finally, while Lövei et al. (2009) parse out the effects of different families of Cry proteins, they lump proteinase inhibitors, which bind with and deactivate proteinases, with lectins, which bind with sugars, into a single generic group incorrectly labeled as PIs. Both of these groups of compounds are comprised of highly variable insecticidal proteins with generally broader (but different) spectra of activity (Carlini and Grossi-de-Sa 2002; Malone et al. 2008). While Lövei et al. (2009) break out these non-Bt proteins as a group in their analysis, they go on to make sweeping generalizations (most evident in the abstract) that ignore fundamental differences in the spectra of activity of Cry proteins, PIs and lectins. Adverse effects of particular PIs and lectins on some non-target arthropods are not surprising, given their modes of action. However, there is little justification for combining PIs and lectins, and none for combining them with Bt Cry proteins, in any assessment of non-target impacts.

In conclusion, while we think that environmental risk assessments of transgenic insect-resistant crops are important, we believe the paper by Lövei et al. (2009) advocates inappropriate summarization and statistical methods, a negatively biased and incorrect interpretation of the published data on non-target effects, and fails to place any putative effect into a meaningful ecological context. Such erroneous analyses do not serve the scientific or regulatory communities.