Main

Throughout human history, plants have played the role of silent partners in the growth of virtually every civilization1. Humans have exploited wild plants and crops as sources of food2, used trees as combustible material and to craft manufactured goods1,3 and taken inspiration from the beauty of flowers for poetic and artistic endeavours4,5. Since the birth of modern science, plants have also become the subjects of intense investigation. As scientists systematically studied the natural history of plants6, they soon realized that many of these species could function as model organisms to address fundamental scientific questions7. Edward O. Wilson famously stated that ‘[…] for every scientific question, there is the ideal study system to test it’ and thus, the choice of a researcher to study one species or another is often driven by functional criteria (for example, ploidy level for genetics studies and ease of growth under controlled conditions). Still, outside of the laboratory or the greenhouse, field scientists may be challenged in their choice of focus organisms by concerns that exceed strictly scientific research interests. As a result, when plant scientists select to study a specific wild plant among the pool of species available in a given study region, it may be that factors unrelated to the biological question end up influencing species choice and introducing biases in the research outcome. Whereas this is not a problem per se, a disparity in scientific attention towards certain species may become a concern in conservation biology, where it is paramount to ensure a ‘level playing field’ in selecting conservation priorities8,9.

Given their global diversity10 and ecological importance11,12, plants should be prominent in conservation biology’s effort to curb species loss under mounting anthropogenic pressures13,14,15. Yet, it is well documented that plants receive less attention and consequently less funding in conservation than do animals16,17. This particular case of taxonomic bias has been connected to ‘plant blindness’18 or ‘plant awareness disparity’19, two terms proposed to indicate the lack of awareness for plants. Associated with both the evolutionary history of human cognition and the effect of cultural, scientific and educational tendencies, this disparity translates into serious real-life impacts, as it affects the knowledge base of conservation and its policies. As addressing this bias is urgent but also often outside the scope of plant sciences, we want to identify more specific biases that can be addressed from within the scientific community dedicated to plants, thereby informing better research practices.

With this goal in mind, we chose a well-defined case study in which to consider specific traits and factors that could influence the choice of species studied. By combining the strengths of bibliometrics and trait-based approaches, we asked what kind of biases might operate in plant sciences, resulting in some species being more studied than others. To resolve this question, we chose a model system of 113 species typical of the Southwestern Alps, one of the largest biodiversity hotspots within the Mediterranean region20 (Fig. 1a). By focusing on a well-known flora in a delimited area, this study design allowed us to control for several confounding factors related to sampling biases, trait heterogeneity and research interest.

Fig. 1: Study workflow and most important factors in explaining research interest.
figure 1

a, Schematic representation of the data collection and the subdivision of the plant traits in three categories of ecology, morphology and rarity. b, Outcomes of the variance partitioning analysis, whereby the relative contribution of traits related to ecology, morphology and rarity is ruled out, as well as the random effect of species’ taxonomic relatedness at family level. T°, temperature.

We tested whether there is a relationship between research focus on a plant species (measured using bibliometric indicators) and species-specific traits related to ecology, morphology and rarity (Fig. 1a). In the Web of Science, we sourced 280 papers focusing on the selected plant species (average (±s.d.) of 2.15 ± 2.96 scientific papers per plant), published between 1975 and 2020. Given that the number of publications, their average annual number of citations and average h-index were all reciprocally correlated (all Pearson’s r > 0.7; Supplementary Fig. 1), we expressed research attention simply as the total number of publications. By means of variance partitioning analysis21, we ruled out the relative contribution of ecology, morphology and rarity in determining the observed pattern of research attention. This analysis indicated how the choice of investigated species across the literature in the last 45 years has been strongly influenced by plant traits related to aesthetics. Using marginal R2, we observed how morphological and colour traits explain the greater proportion of variance (15.0%), whereas the contribution of ecology and rarity was negligible (Fig. 1b). However, 75.6% of model variance remained unexplained. When reassessing variance partitioning using conditional R2, which describe the proportion of variance explained by both fixed effect and random factors (species taxonomic relatedness), we found that 54% of unexplained variance was due to the random effect. This reveals that certain clusters of closely related plants are more studied than others and share more similar traits than expected from a random pool (examples in Fig. 2a).

Fig. 2: Regression analysis of plant traits.
figure 2

a, Example of representative plant species with different traits and research attention: Gentiana ligustica R. Vilm. & Chopinet (Gentianaceae, blue inflorescence and many published papers), Berardia lanuginosa (Lam.) Fiori (Asteraceae, small and single yellow flower head, short stem and no published papers) and Fritillaria involucrata All. (Liliaceae, large brownish flower, short stem and no published papers). b, Predicted number of papers in relation to stem and flower size by flower colour according to the best-performing GLMM. c, Incidence rate ratios and significance levels (*P < 0.05, **P < 0.01, ***P < 0.001) for all the explanatory variables included in the final model (exact P values: colour (blue), 0.00025; colour (white), 0.04562; colour (red/pink), 0.08603; colour (yellow), 0.53695; range size, 0.03308; stem size, 0.00858; flower size, 0.04987). Error bars indicate standard errors. P values for parametric terms were based on two-sided z-test.

This first result was surprising, as species rarity and scientific interest for narrow-range endemics or International Union for Conservation of Nature (IUCN) listed taxa did not emerge as significant drivers. Moreover, a preference for species with particular ecological features seemed likely, as some of the endemic species of the Southwestern Alps are adapted to stressful habitats characterized by a narrow range of environmental conditions, such as rocky lands and xerophilous grasslands22,23. While these adaptations might be desirable for studies on evolution, ecological niche theory, ecophysiology and conservation, the lack of correlation between variables related to ‘rarity’ and ‘ecology’ highlights the absence of cross-study guidelines to help plant scientists prioritize such research areas in their choice of species studied.

To obtain a more nuanced understanding of which specific traits are driving research attention, we explored the relationships between traits and number of published papers with a Poisson generalized linear mixed model (GLMM) that accounted for taxonomic non-independence among species24 (Supplementary Table 1). Using backward model selection, we identified a best-performing model that included colour, range size, flower size and stem size as fixed terms (Fig. 2b,c). All other variables introduced in the model had no significant effects and were therefore removed during model selection (Supplementary Table 3). We observed a significant relationship between the number of published papers and flower colour, with blue-coloured flowers being the most studied and white and red/pink significantly more studied than the baseline (brown/green flowers that stand out the least from the environmental background). Moreover, there was a significant positive effect of plant stem height and a (rather weak) negative effect of flower size on research interest. A greater stem height implies that species are more conspicuous but also taller; thus their inflorescences are more easily accessible without investigators having to stoop to the ground. Furthermore, several plants with small flowers in the Maritime Alps may have intrinsic human appeal, for example flowers constituting conspicuous inflorescences (such as Gymnadenia corneliana and Saxifraga florulenta) that are more striking than single large flowers, introducing an ‘inflorescence effect’. Finally, there was a positive effect of range size on research interest. Tentatively, this is because a broader distribution could make a species accessible to more researchers and thus more likely to be studied. It is interesting to note that, incidentally, broad geographical ranges generally make species less prone to extinction, in line with our finding that species with greater IUCN extinction risk are not subject to more research interest.

The statistical relevance of similar trends across our dataset, where morphological traits such as bright colours, accessible inflorescences and conspicuousness are shown to drive research attention, highlight what we call an aesthetic bias in plant research. While aesthetics is today used to refer to art and beauty (often in direct opposition to scientific values like objectivity), the Greek root of the word refers to sensory perception (as evident in its cognates ‘anaesthetic’ and ‘synaesthetic’). As such, the term highlights sensorial perception, both in its physiological, evolved cognitive structures and in its learned sociocultural articulations. Here it is interesting to note that humans have evolved trichromacy, that is the separate perception of wavelength ranges corresponding to blue, red and green regions through specialized structures25. It has been speculated that the evolutionary acquisition of colour vision in humans and other primates led to an increased ability to locate ripe fruits against a green background26,27. The human eye is thus optimized to perceive green, red and blue which, according to colour psychology theory28, also greatly impacts people’s affection, cognition and behaviour. The evolved and physiological aspect of human perception is also demonstrably affected by sociocultural factors, since education, class, gender, age, cultural background all shape how we perceive the world29. Yet, while these above speculations about the origin of the aesthetic bias are interesting, they are beyond the scope of this communication. What matters is that this bias affects the representativity of data used to ground research priorities and conservation policies and, as such, risks compromising efforts to effectively focus plant conservation activities and preserve plant biodiversity.

In conclusion, our analysis identified the traits a plant must possess to be attractive to a scientist, emphasizing the trade-off between aesthetic characteristics, research attention and conservation need. While many factors can determine the choice of studied plant species, we showed how research interests and conservation concerns are less important than aesthetic characteristics in driving research attention. This apparently superficial preference has implicit and undesired effects, as it translates into an aesthetic bias in the data that form the basis for scientific research and practices. Whether this bias is grounded in an evolutionary adaptation of human cognition or in cultural and learned preferences or is simply the effect of practical constraints in the field, it would be desirable to develop measures to counteract it, given the potentially negative impact on our understanding of the ecology and evolution of plants and the conservation of vital plant biodiversity such as species of high phylogenetic value or with unique ecological traits and ecosystem functions. Statistical modelling has been widely used in conservation ecology to predict ecological niches in space and time and to develop a practical conservation agenda30. Whereas many potential issues, including geographical-relatedness and sampling biases31 or metrics selection32, have been routinely considered in modelling exercises, the well-known problem of observer-related biases33 is largely overlooked34. Against this background, our study demonstrates the need to consider aesthetic biases more explicitly in experimental design and choice of species studied. As Kéry and Greg35 stated: ‘although plants stand still and wait to be counted, they sometimes hide’. Often in plain view, we would add.

Methods

Species selection

We focused the analysis on the flora of the Italian and French Maritime alps, a plant biodiversity hotspot in the Southwestern Alps20. By restricting the analysis to a flora from a intensively studied and confined area, we were able to control for three confounding factors:

  1. (1)

    Since the Maritime Alps flora has been extensively studied for over two centuries36, the number of described plant species in this area has already reached the asymptote37 compared with under-studied floras outside Europe38.

  2. (2)

    Narrow-range plants on similar substrates and localities are characterized by a restricted range of physicochemical features and would be expected to show similar adaptations. This excludes confounding factors that would occur if a study was undertaken on species from different biomes and ecological regions.

  3. (3)

    Narrow-range species are primarily studied by local researchers (mostly from France, Italy and Switzerland), which are expected to share a similar cultural background and thus share cultural biases. This would not occur in the case of cosmopolitan plants studied by different researchers from mixed cultural backgrounds from around the world.

We selected a representative list of 113 plant species from checklists39,40. For the purpose of this analysis, we excluded subspecies and species of uncertain taxonomic status.

Bibliometric data

We obtained bibliometric data from the Web of Science41. We searched all published works focusing on each of the 113 species, using the accepted Latin names and synonyms reported in The Plant List42. For each species, we derived three values: number of published papers, their average number of citations per year and their average h-index. We acknowledge that our search for papers was not exhaustive: we have only included articles in English43, used a single bibliographic database and focused the bibliometric search to the abstract, title and keywords. This implies, for example, that species with no studies in the Web of Science (n = 43; 38%) may have actually been the focus of grey literature or of studies that did not mention the Latin name in the abstract of keywords. This is a common practice, for example, in multispecies studies. However, we assumed that this bias was homogeneously distributed across species and thus unlikely to affect the observed patterns.

Species traits

We derived flower colour, stem size, flowering duration and altitude data from Tela Botanica44, Actaplantarum (www.actaplantarum.org) and InfoFlora45. We obtained species’ ecological preferences using Landolt indicator values available in Flora Indicativa46. We extracted flower size from FlorAlpes (www.florealpes.com/index.php) and conservation status from the IUCN red list47. We expressed taxonomic uniqueness of each species as the number of congeneric species, on the basis of ref. 42. Finally, we approximated species range size using species occurrences available in the Global Biodiversity Information Facility48. We calculated two measures: (1) the area of the minimum convex polygon (MCP) encompassing all localities (range area) and (2) the dispersion of points around the distribution centroid (range dispersion). The latter measure is a more robust measure of range if distribution data are biased, which is often the case with GBIF datasets where sampling effort is uneven (for example, refs. 49,50). We grouped species traits into three categories (Supplementary Table 2 and Fig. 1a): ecology (minimum altitude, altitude range, maximum altitude and Landolt Indexes), morphology (flower colour, flower diameter, stem size and flowering duration) and rarity (range area, range dispersion, IUCN category and number of congeneric species).

Data analysis

We performed all analyses in R51. We conducted data exploration following ref. 52. We checked homogeneity of continuous variables and log10-transformed non-homogeneous variables, when appropriate (Supplementary Table 2). We verified multicollinearity among predictors with pairwise Pearson’s r correlations (Supplementary Figs. 1 and 3). We visualized potential associations between continuous and categorical variables with boxplots. We summarized the main eight Landolt indicator values variations as the first two principal component (PC) axes of a Principal Component Analysis (PCA), describing the environment in which the different species live. PC1 explained 30.5% of the variance and PC2 explained 17.1% of the variance. We excluded salinity tolerance in the calculation of PCA because it is not applicable in the analysed geographical and ecological context.

IUCN categories were compared with the other ‘rarity’ variables, revealing a strong association with range area and dispersion (extinction risk is often inferred on the basis of range size53) and a consistent association with the number of congeneric species. Also, collinearity analysis revealed a high correlation (|r | > 0.7) between minimum and maximum elevation, and range area and dispersion. We thus excluded the IUCN category, minimum elevation and range area from the analysis. Moreover, to balance the levels of the variable flower colour, we grouped together red with pink and brown with green coloured flowers. The category ‘green/brown’ was used as a baseline in all analyses, being the least prominent colours from the background25,26.

Variance partitioning analysis

We used variance partitioning analysis21 to resolve the relative contribution of ecology, morphology and rarity in determining the observed pattern of research attention. We fitted seven GLMMs (modelling details in the next section), one for each individual set of variables (ecology, morphology and rarity) and their combined effects (ecology + morphology; ecology + rarity; morphology + rarity; ecology + morphology + rarity). In turn, we used the model pseudo R2 (both conditional and marginal)54 to evaluate the contribution of each variable and combination of variables the research attention each species receives, by partitioning their explanatory power using the modEvA55 and results visualized as a Venn diagram.

Regression model

We used regressions to explore relationships between the research attention each species receives and plant traits24. Given that number of published sources, average number of citations and average h-index were all reciprocally correlated (Pearson’s r > 0.7), we only selected the number of publications as a response variable (Supplementary Fig. 1). GLMM with lme4 (ref. 56) were fitted to these data using a Poisson distribution and a log link function. The Poisson distribution is often used for count data (in our case, number of papers in the Web of Science) and the log link function ensures positive fitted values24. We scaled all variables and optimized GLMM with bound optimization by quadratic approximation to facilitate model convergence. We used the family taxonomic rank of each plant species as a random factor, to take into account data dependence under the assumption that species within the same family are more likely to share similar traits. Even though 38% of values in the response variable were zeros (that is, species never studied in scientific papers in the Web of Science), zero-inflation was considered as acceptable because these are ‘true zeros’57.

We built an initial GLMM using all the non-collinear variables and the non-associated factors (Supplementary Table 1) selected after data exploration (the equation is in R notation):

$$\begin{array}{l}{\mathrm{Number}}\,{\mathrm{of}}\,{\mathrm{Papers}}\sim {\mathrm{Flower}}\,{\mathrm{colour}} + {\mathrm{Flowering}}\,{\mathrm{duration}} + {\mathrm{Flower}}\,{\mathrm{diameter}}\\ + {\mathrm{Stem}}\,{\mathrm{size}} + {\mathrm{Landolt}}\,{\mathrm{values}}\,{\mathrm{PC1}} + {\mathrm{Landolt}}\,{\mathrm{values}}\,{\mathrm{PC2}} + {\mathrm{Maximum}}\,{\mathrm{altitude}}\\ + {\mathrm{Altitude}}\,{\mathrm{range}} + {\mathrm{Range}}\,{\mathrm{size}} + {\mathrm{Congeneric}}\,{\mathrm{species}} + {\mathrm{random}}\,\left( {{\mathrm{Family}}} \right)\end{array}$$

Once the initial model had been fitted, we performed model selection by backward elimination. We based model reduction on Aikaike information criterion values (Supplementary Table 3), to simplify the model and avoid overfitting58. We validated models with performance59 by checking overdispersion and standard residuals plots24 (Supplementary Fig. 2).

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.