Reconstruction of plant–pollinator networks from observational data

Young, Jean-Gabriel; Valdovinos, Fernanda S.; Newman, M. E. J.

doi:10.1038/s41467-021-24149-x

Download PDF

Article
Open access
Published: 23 June 2021

Reconstruction of plant–pollinator networks from observational data

Nature Communications volume 12, Article number: 3911 (2021) Cite this article

7557 Accesses
13 Citations
24 Altmetric
Metrics details

Subjects

Abstract

Empirical measurements of ecological networks such as food webs and mutualistic networks are often rich in structure but also noisy and error-prone, particularly for rare species for which observations are sparse. Focusing on the case of plant–pollinator networks, we here describe a Bayesian statistical technique that allows us to make accurate estimates of network structure and ecological metrics from such noisy observational data. Our method yields not only estimates of these quantities, but also estimates of their statistical errors, paving the way for principled statistical analyses of ecological variables and outcomes. We demonstrate the use of the method with an application to previously published data on plant–pollinator networks in the Seychelles archipelago and Kosciusko National Park, calculating estimates of network structure, network nestedness, and other characteristics.

Interaction network structure explains species’ temporal persistence in empirical plant–pollinator communities

Article 01 February 2024

Untangling the seasonal dynamics of plant-pollinator communities

Article Open access 14 August 2020

Integrating multiple sources of ecological data to unveil macroscale species abundance

Article Open access 03 April 2020

Introduction

Network-based methods of analysis have contributed substantially to our understanding of ecological systems by helping us identify structure in the patterns of interaction between species^1,2,3,4. Theoretical studies have shown that such patterns affect the dynamics and stability of ecosystems^5,6,7. This is particularly the case for mutualistic networks such as plant-pollinator interactions-our focus in this paper-whose functions are critical to terrestrial biodiversity^8,9,10 and crop production^10,11,12.

A central prerequisite for quantitative analysis of network structure and function is accurate network data, and significant effort has been invested in recent years in data gathering for ecological networks of many kinds, including mutualistic networks. There is, however, some debate over whether the observed structure of mutualistic networks represents the true interaction patterns produced by evolutionary and ecological mechanisms, at least to a good approximation^4,6,13, or whether, conversely, it is biased by incomplete sampling¹⁴, for instance failing to detect the interactions of rare species^15,16,17,18. In this paper we describe a new technique that aims to give quantitative answers to these questions by applying methods of Bayesian inference to ecological network data. Treating the case of plant–pollinator networks, we show that it is possible to accurately infer interaction network structure from observational data while taking into account confounding variables such as varying species abundances. The output of our calculations is an estimate of the true structure of the network and also a quantification of our uncertainty about this structure. Standard techniques from statistical network science^19,20 and network ecology¹⁸ can then help us make precise statements about the accuracy of any further conclusions we draw from the network structure. Estimates of interaction certainty can also help us identify interactions that would benefit from greater sampling effort.

The structure of mutualistic networks is typified by several characteristic features²¹: moderate connectance, meaning that a modest fraction of all potential interactions are realized; long-tailed degree distributions, meaning that there are many specialist species with a small number of interactions and a few generalist species with many interactions; and nestedness, meaning that the interactions of the least-connected species are often subsets of the interactions of better-connected species. (These features are not necessarily independent. For instance, it has been suggested that nestedness is itself a consequence of the long-tailed degree-distribution²².) A significant volume of research has been devoted to explaining these features in terms of ecological and evolutionary mechanisms—see Bascompte and Jordano⁶ and Vázquez et al.¹³ for reviews. Other work, however, has suggested that they can also be generated merely as artifacts of skewed abundance distributions and incomplete sampling, both very common in ecological systems^15,16. In particular, Blüthgen et al.¹⁵ have shown that nestedness and broad degree distributions can be a result of failure to observe interactions between rare species because of low sampling effort and/or the infrequency of the interactions in question. Findings like this have stimulated further investigations of the effects of sampling bias on network structure⁴, both empirically by varying sampling effort in the field^{23,24,25,26,27} and theoretically using models of network structure^15,28,29,30. These studies suggest that incomplete sampling strongly underestimates the number of interactions in networks and overestimates the degree of specialization. The approach described in this paper offers one way to address these shortcomings and obtain reliable estimates of the structure of mutualistic networks, free of measurement bias.

The paper is organized as follows. We first outline a first-principles statistical model of plant–pollinator interactions and show how it can be used to estimate network structure from error-prone observational data. Then, we demonstrate these methods with an application to two typical plant–pollinator data sets, showing how they give us not only the network structure itself but also statistically principled estimates of quantities such as nestedness. Finally, we give some conclusions and directions for future work.

Results

Network reconstruction from observational data

The typical field study of plant–pollinator interactions involves recording instances of potential pollinators (such as insects) visiting plants within a prescribed observation area and over a prescribed period of time. We will refer to these records as visitation data. Network ecologists analyze visitation data by constructing networks of plant and pollinator species, where a connection between two species indicates that a plant-pollinator interaction exists between them.

However, the meaning of edges in ecological networks is not always clear³¹. One popular way to transform visitation data into networks is to connect two species when they interact “enough”—say when a pollinator species is seen on the reproductive organ of a plant species a specified number of times—but in this case the precise meaning of an edge will depend on the details of the data collection and the choices made in the analysis. How many visits do we take as evidence of a plant–pollinator interaction? A single visit is probably not enough—it might well be an error or misobservation. Is two enough, or ten, or a hundred? And what about observations that were missed entirely? Other methods of analysis transform the data in different ways, for instance encoding them as weighted networks, possibly with some statistical processing along the way³². Even in this case, however, the edges still just count numbers of visits (perhaps transformed in some way), so the resulting networks are effectively histograms in disguise, recording only potential interactions rather than true biological connections.

A more principled approach to network construction begins with a clear definition of what relationship (or relationships) a network’s edges encode³³. We argue that network ecology often calls for a network of preferred interactions. In the context of plant-pollinator networks the edges of such a network indicate that pollinators preferentially visit certain plant species and they encode a variety of mechanisms that constrain species interactions, such as temporal or spatial uncoupling (i.e., species that do not co-occur in either time or space), constraints due to trait mismatches (e.g., proboscis size very different from corolla size), and physiological-biochemical constraints that prevent the interactions (e.g., chemical barriers). (One can regard preferred interactions as being the opposite of the “forbidden links” described in refs. ^34,35,36). Preferred interactions are arguably the relevant ones for instance when analyzing the reaction of a network to abrupt changes: when one removes a plant species from a system, for example, the pollinators that prefer it will have to modify their behavior^7,37,38. The interactions we consider are binary—either a species prefers another species or it doesn’t—so the network does not encode varying strengths of interaction.

While the data gathered in a typical field study are certainly reflective of preferred interactions, they are, for many reasons, not perfect measurements of networks of preferred interactions^13,17. First, there may be observational errors. While the observers performing the work are usually highly trained individuals, they may nonetheless make mistakes. They may confuse one species for another, which is particularly easy to do for small-bodied insects, or smaller species may be overlooked altogether. Observers may make correct observations but record them wrongly. And there will be statistical fluctuations in the number of visits of an insect species to a plant species over any finite time. For rare interactions there may even be no visits at all if we are unlucky. The insects themselves may also appear to make “mistakes” by visiting plants that they typically do not pollinate. These and other factors mean that the record of observed visits is an inherently untrustworthy guide to the true structure of the network of preferred interactions. Here we develop a statistical method for making estimates of network structure despite these limitations of the data.

Model of plant–pollinator data

Consider a typical plant–pollinator study in which some number n_p of plant species, labeled by i = 1…n_p, and some number n_a of animal pollinator species, labeled by j = 1…n_a, are under observation for a set amount of time, producing a record of observed visits such that M_ij is the number of times plant species i is visited by pollinator species j. Collectively the M_ij can be regarded as a data matrix M with n_p rows and n_a columns. This is the input to our calculation.

The unknown quantity, the thing we would like to understand, is the network of plant–pollinator interactions. We can think of this network as composed of two sets of nodes, one representing plant species and the other pollinator species, with connections or edges joining each pollinator to the plants it pollinates. In the language of network science this is a bipartite network, meaning that edges run only between nodes of unlike kinds—plants and pollinators—and never between two plants or two pollinators. Such a network can be represented by a second matrix B, called the incidence matrix, with the same size as the data matrix and elements B_ij = 1 if plant i is preferentially visited by pollinator j and 0 otherwise.

The question we would like to answer is this: What is the structure of the network, represented by B, given the data M? It is not straightforward to answer this question directly, but it is relatively easy to answer the reverse question. If we imagine that we know B, then we can say what the probability is that we make a specific set of observations M. And if we can do this then the methods of Bayesian inference allow us to invert the calculation and compute B from a knowledge of M and hence achieve our goal. The procedure is as follows.

Consider a specific plant-pollinator species pair i, j. How many times do we expect to see j visit i if there is, or is not, a preferred interaction between i and j? The answer will depend on several factors. First, and most obviously, we expect the number of visits to be higher if j is in fact a pollinator of i. That is, we expect M_ij to be larger if B_ij = 1 than if B_ij = 0. Second, we expect there to be more visits if there is greater sampling effort—for instance if the period of observation is longer or if the land area over which observations take place is larger^15,16,26,27. Third, we expect to see more visits for more abundant plant and pollinator species than for less abundant ones, as demonstrated by several studies^28,30. And fourth, as discussed above, we expect there to be some random variation in the number of visits, driven by fluctuations in individual behavior and the environment. These are the primary features that we incorporate into our model. It is possible to add others to handle specific situations (see ref. ³⁹ and the Methods), but we focus on these four here.

We translate these factors into a mathematical model of plant–pollinator interaction as follows. The random variations in the numbers of visits will follow a Poisson distribution for each plant–pollinator pair i, j, parameterized by a single number, the distribution mean μ_ij, provided only that measurements are made sufficiently far apart to be independent (which under normal conditions they will be). We expect μ_ij to depend on the factors discussed above and we introduce additional parameters to represent this dependence. First we introduce a parameter r to represent the change in the average number of visits when two species are connected (B_ij = 1), versus when they are not (B_ij = 0). We write the factor by which the number of visits is increased as 1 + r with r ≥ 0, so that r = 0 implies no increase and successively larger values of r give us larger increases. Second, we represent the effect of sampling effort by an overall constant C that multiplies the mean μ_ij. The same constant is used for all i and j, since the same sampling effort is devoted to all plant–pollinator pairs. Third, we assume that the number of visits is proportional to the abundance of the relevant plant and pollinator species: twice as many pollinators of species j, for instance, will mean twice as many visits by that species, and similarly for the abundance of the plant species¹³. Thus the number of visits will be proportional to σ_iτ_j, for some parameters σ_i and τ_j representing the abundances of plant i and pollinator j, respectively, in suitable units (which we will determine shortly).

Putting everything together, the mean number of observed visits to plant i by pollinator j is

$${\mu }_{ij}=C{\sigma }_{i}{\tau }_{j}(1+r{B}_{ij}),$$

(1)

and the probability of observing exactly M_ij visits is drawn from a Poisson distribution with this mean:

$$P({M}_{ij}| {\mu }_{ij})=\frac{{\mu }_{ij}^{{M}_{ij}}}{{M}_{ij}!}\ {e}^{-{\mu }_{ij}}.$$

(2)

This equation gives us the probability distribution of a single element M_ij of the data matrix. Then, combining Eqs. (1) and (2), the data likelihood—the probability of the complete data matrix M—is given by the product over all species thus:

$$P({\boldsymbol{M}}| {\boldsymbol{B}},\theta )=\mathop{\prod}\limits_{i,j}\frac{{\left[C{\sigma }_{i}{\tau }_{j}(1+r{B}_{ij})\right]}^{{M}_{ij}}}{{M}_{ij}!}\ {e}^{-C{\sigma }_{i}{\tau }_{j}(1+r{B}_{ij})},$$

(3)

where θ is a shorthand collectively denoting all the parameters of the model: C, r, σ and τ. Our model is thus effectively a model of an entire network, rather than single interactions, in contrast with other recent approaches to the modeling of network data reliability^17,18,32.

There are two important details to note about this model. First, the definition in Eq. (1) does not completely determine C, σ, and τ because we can increase (or decrease) any of these parameters by a constant factor without changing the resulting value of μ_ij if we simultaneously decrease (or increase) one or both of the others. In the language of statistics we say that the parameters are not “identifiable.” We can rectify this problem by fixing the normalization of the parameters in any convenient fashion. Here we do this by stipulating that σ_i and τ_j sum to one, thus:

$$\mathop{\sum }\limits_{i=1}^{{n}_{p}}{\sigma }_{i}=\mathop{\sum }\limits_{j=1}^{{n}_{a}}{\tau }_{j}=1.$$

(4)

In effect, this makes σ_i and τ_j measures of relative abundance, quantifying the fraction of individual organisms that belong to each species, rather than the total number. (This definition differs from traditional estimates of pollinator abundance that define the abundance of a pollinator species in terms of its number of observed visits.) Second, there may be other species-level effects on the observed number of visits in addition to abundance, such as the propensity for observers to overlook small-bodied pollinators. There is, at least within the data used in this paper, no way to tell these effects from true variation in abundance—no way to tell for example if there are truly fewer individuals of a species or if they are just hard to see and hence less often observed. As a result, the abundance parameters in our model actually capture a combination of effects on observation frequency. This does not affect the accuracy of the model, which works just as well either way, but it does mean that we have to be cautious about interpreting the values of the parameters in terms of actual abundance. This point is discussed further in the applications below.

Bayesian reconstruction

The likelihood of Eq. (3) tells us the probability of the data M given the network B and parameters θ. What we actually want to know is the probability of the network and parameters given the data, which we can calculate by applying Bayes’ rule in the form

$$P({\boldsymbol{B}},\theta | {\boldsymbol{M}})=\frac{P({\boldsymbol{M}}| {\boldsymbol{B}},\theta )P({\boldsymbol{B}}| \theta )P(\theta )}{P({\boldsymbol{M}})}.$$

(5)

This is the posterior probability that the network has structure B and parameter values θ given the observations that were made. There are three important parts to the expression: the likelihood P(M∣B, θ), the prior probability of the network P(B∣θ), and the prior probability of the parameters P(θ). The denominator P(M) we can ignore because it depends on the data alone and will be constant (and hence irrelevant for our calculations) once M is determined by the observations.

Of the three non-constant parts, the first, the likelihood, we have already discussed—it is given by Eq. (3). For the prior on the network P(B∣θ) we make the conservative assumption—in the absence of any knowledge to the contrary—that all edges in the network are a priori equally likely. If we denote the probability of an edge by ρ, then the prior probability on the entire network is

$$P({\boldsymbol{B}}| \theta )=\mathop{\prod}\limits_{i,j}{(1-\rho )}^{1-{B}_{ij}}{\rho }^{{B}_{ij}}.$$

(6)

We consider ρ an additional parameter which is to be inferred from the data and which we will henceforth include, along with our other parameters, in the set θ.

To complete Eq. (5), we also need to choose a prior P(θ) over the parameters. We expect there to be some limit on the value of r, which we impose using a minimally informative prior with finite mean (this distribution turns out to be the exponential distribution). For the remaining parameters we use uniform priors. With these choices, we then have everything we need to compute the posterior probability, Eq. (5).

Once we have the posterior probability there are a number of things we can do with it. The simplest is just to maximize it with respect to the unknown quantities B and θ to find the most likely structure for the network and the most likely parameter values, given the data. This, however, misses an opportunity for more detailed inference and can moreover give misleading results. In most cases there will be more than one value of B and θ with high probability under Eq. (5): there may be a unique maximum of the probability, a most likely value, but there are often many other values that have nearly as high probability and offer plausible network structures competitive with the most likely one. To get the most complete picture of the structure of the network we should consider all these plausible structures.

For example, if all plausible structures are similar to one another in their overall shape then we can be quite confident that this shape is reflective of the true preferred interactions between plant and pollinator species. If plausible structures are widely varying, however, then we have many different candidates for the true structure and our certainty about that structure is correspondingly lower. In other words, by considering the complete set of plausible structures we can not only make an estimate of the network structure but also say how confident we are in that estimate, in effect putting “error bars” on the network.

How do we specify these errors bars in practice? One way is to place posterior probabilities on individual edges in the network. For example, when considering the edge connecting plant i and pollinator j, we would not ask “Is there an edge?” but rather “What is the probability that there is an edge?” Within the formulation outlined above, this probability is given by the average

$$P({B}_{ij}=1| {\boldsymbol{M}})=\mathop{\sum}\limits_{{\boldsymbol{B}}}\int {B}_{ij}P({\boldsymbol{B}},\theta | {\boldsymbol{M}})d\theta ,$$

(7)

where the sum runs over all possible incidence matrices and the integral over all parameter values. More generally we can compute the average of any function f(B, θ) of the matrix B and/or the parameters θ thus:

$$\left\langle f({\boldsymbol{B}},\theta )\right\rangle =\mathop{\sum}\limits_{{\boldsymbol{B}}}\int f({\boldsymbol{B}},\theta )\ P({\boldsymbol{B}},\theta | {\boldsymbol{M}})d\theta .$$

(8)

Functions of the matrix and functions of the parameters can both be interesting—the matrix tells us about the structure of the network but the parameters, as we will see, can also reveal important information.

Computing averages of the form (8) is unfortunately not an easy task. A closed-form expression appears out of reach and the brute-force approach of performing the sums and integrals numerically over all possible networks and parameters is computationally intractable in all but the most trivial of cases. The sum over B alone involves ${2}^{{n}_{p}{n}_{a}}$ terms, which is normally a very large number.

Instead therefore we use an efficient Monte Carlo sampling technique to approximate the answers. We generate a sample of network/parameter pairs (B₁, θ₁), …, (B_n, θ_n), where each pair appears with probability proportional to the posterior distribution of Eq. (5). Then we approximate the average of f(B, θ) as

$$\left\langle f({\boldsymbol{B}},\theta )\right\rangle \simeq \frac{1}{n}\mathop{\sum }\limits_{i=1}^{n}f({{\boldsymbol{B}}}_{i},{\theta }_{i}).$$

(9)

Under very general conditions, this estimate will converge to the true value of the average asymptotically as the number of Monte Carlo samples n becomes large. Full details of the computations are given in Materials and Methods, and an extensive simulation study of the model is presented in Supplementary Note 1.

Checking the model

Inherent in the discussion so far is the assumption that the data can be well represented by our model. In other words, we are assuming there is at least one choice of the network B and parameters θ such that the model will generate data similar to what we see in the field. This assumption could be violated if our model is a poor one, but there is nothing in the method described above that would tell us so. To be fully confident in our results we need to be able not only to infer the network structure, but also to check whether that structure is a good match to the data. The Bayesian toolbox comes with a natural procedure for doing this. Given a set of high-probability values of B and θ generated by the method, we can use them in Eq. (3) to compute the likelihood P(M∣B, θ) of a data set M and then sample possible data sets from this probability distribution, in effect recreating data as they would appear if the model were in fact correct. We can then compare these data to the original field data to see if they are similar: if they are then our model has done a good job of capturing the structure in the data.

In the parlance of Bayesian statistics this approach is known as a posterior–predictive assessment⁴⁰. It amounts to calculating the probability

$$P({\widetilde{M}}_{ij}| {\boldsymbol{M}})=\mathop{\sum}\limits_{{\boldsymbol{B}}}\int P({\widetilde{M}}_{ij}| {\boldsymbol{B}},\theta )P({\boldsymbol{B}},\theta | {\boldsymbol{M}})d\theta$$

(10)

that pollinator species j makes ${\widetilde{M}}_{ij}$ visits to plant species i in artificial data sets generated by the model, averaged over many sets of values of B and θ. We can then use this probability to calculate the average value of ${\widetilde{M}}_{ij}$ thus:

$$\langle {\widetilde{M}}_{ij}\rangle =\mathop{\sum}\limits_{{\widetilde{M}}_{ij}}{\widetilde{M}}_{ij}\ P({\widetilde{M}}_{ij}| {\boldsymbol{M}}).$$

(11)

The averages for all plant–pollinator pairs can be thought of as the elements of a matrix $\langle \widetilde{{\boldsymbol{M}}}\rangle$, which we can then compare to the actual data matrix M, or alternatively we can calculate a residue ${\boldsymbol{M}}-\langle \widetilde{{\boldsymbol{M}}}\rangle$. If $\langle \widetilde{{\boldsymbol{M}}}\rangle$ and M are approximately equal, or equivalently if the residue is small, then we consider the model a good one.

To quantify the level of agreement between the fit and the data we can also compute the discrepancy⁴⁰ between the artificial data and M as

$${X}^{2}=\mathop{\sum}\limits_{ij}\frac{{({M}_{ij}-{\langle \widetilde{M}_{ij}\rangle })}^{2}}{{\langle \widetilde{M}_{ij}\rangle }}.$$

(12)

Under the hypothesis that the model is correct, X² follows a chi-squared distribution with n_p × n_a degrees of freedom⁴⁰. A good fit between model and data is signified by a value of X² that is much smaller than its expectation value of n_p × n_a. Note that the calculation of $P({\widetilde{M}}_{ij}| {\boldsymbol{M}})$ in Eq. (10) is of the same form as the one in Eq. (8), with $f({\boldsymbol{B}},\theta )=P({\widetilde{M}}_{ij}| {\boldsymbol{B}},\theta )$, which means we can calculate $P({\widetilde{M}}_{ij}| {\boldsymbol{M}})$ in the same way we calculate other average quantities, using Monte Carlo sampling and Eq. (9).

Application to visitation data sets

Well-sampled data

To demonstrate how the method works in practice, we first consider a large data set of plant–pollinator interactions gathered by Kaiser-Bunbury and collaborators⁴¹ at a set of study sites on the island of Mahé in the Seychelles. The data describe the interactions of plant and pollinator species observed over a period of eight months across eight different sites on the island. The data also include measurements of floral abundances for all observation periods and all sites. Our method for inferring network structure does not make use of the abundance measurements, but we discuss them briefly at the end of this section.

The study by Kaiser-Bunbury et al. focused particularly on the role of exotic plant species in the ecosystem and on whether restoring a site by removing exotic species would significantly impact the resilience and function of the plant–pollinator network. To help address these questions, half of the sites in the study were restored in this way while the rest were left unrestored as a control group.

As an illustration of our method we apply it to data from one of the restored sites, as observed over the course of a single month in December 2012 (the smallest time interval for which data were available). We pick the site named “Trois-Frères” because it is relatively small but also well sampled. Our calculation then proceeds as shown in Fig. 1. There were 8 plant and 21 pollinator species observed at the site during the month, giving us an 8 × 21 data matrix M as shown in Fig. 1a. (Following common convention, the plots of matrices in this paper are drawn with rows and columns ordered by decreasing numbers of observed interactions, so that the largest elements of the data matrix—the darkest squares—are in the top and left of the plot.)

**Fig. 1: Illustration of the method of this paper applied to data from the study of Kaiser-Bunbury et al.⁴¹.**

Now we use our Monte Carlo procedure to draw 2000 sets of incidence matrices B and parameters θ from the posterior distribution of Eq. (5) (Fig. 1b). These samples vary in their structure: some edges, like the one connecting the plant N. vanhoutteanum and the pollinator A. mellifera, are present in nearly all samples, while others, like the one between M. sechellarum and A. mellifera, appear only a small fraction of the time. Some others never occur at all. Averaging over these sampled networks we can estimate the probability, Eq. (7), that each connection exists in the network of preferred interactions between plant and animal species—see Fig. 1c. Some connections have high probability, close to 1, meaning that we have a high degree of confidence that they exist. Others have probability close to 0, meaning we have a high degree of confidence that they do not exist. And some have intermediate probabilities, meaning we are uncertain about them (such as the M. sechellarum–A. mellifera connection, which has probability around 0.45). In the latter case the method is telling us that the data are not sufficient to reach a firm conclusion about these connections. Indeed, if we compare with the original data matrix M in Fig. 1a, we find that most of the uncertain connections are ones for which we have very few observations, relative to the total number of observations for these species—say M_ij = 1 or 2 for species with dozens of total observations overall.

As we have mentioned, we also need to check whether the model is a good fit to the data by performing a posterior–predictive test. Figure 2 shows the results of this test. The main plot in the figure compares the values of the 40 largest elements of the original data matrix M with the corresponding elements of the generated matrix $\widetilde{{\boldsymbol{M}}}$. In each case, the original value is well within one standard deviation of the average value generated by the test, confirming the accuracy of the model. The inset of the figure shows the residue matrix ${\boldsymbol{M}}-\widetilde{{\boldsymbol{M}}}$, which reveals no systematic bias unaccounted for by the model. The discrepancy X² of Eq. (12) takes the value 26.94 in this case, well below the expected value of n_pn_a = 168, which indicates that the good fit is not a statistical fluke.

**Fig. 2: Results of a posterior–predictive test on the data matrix M for the example data set analyzed in Fig. 1.**

In addition to inferring the structure of the network itself, our method allows us to estimate many other quantities from the data. There are two primary methods by which we can do this. One is to look at the values of the fitted model parameters, which represent quantities such as the preference r and species abundances σ, τ. The other is to compute averages of quantities that depend on the network structure or the parameters (or both) from Eq. (9).

As an example of the former approach, consider the parameter ρ, which represents the average probability of an edge, also known as the connectance of the network. Figure 3a shows the distribution of values of this quantity over our set of Monte Carlo samples, and neatly summarizes our overall certainty about the presence or absence of edges. If we were certain about all edges in the network, then ρ would take only a single value and the distribution would be narrowly peaked. The distribution we observe, however, is somewhat broadened, indicating significant uncertainty. The most likely value of ρ, the peak of the distribution, turns out to be quite close to the value one would arrive at if one were simply to assume that every pair of species that interacts even once is connected in the network. This does not mean, however, that one could make this assumption and get good results. As we show below, the network one would derive by doing so would be badly in error in other ways.

**Fig. 3: Analyses that can be performed using samples from the posterior distribution of Eq. (5).**

Figure 3b shows the distribution of another of the model parameters, the parameter r, which measures the extent to which pollinators prefer the plants they normally pollinate over the ones they do not. For this particular data set the most likely value of r is around 40, meaning that pollinators visit their preferred plant species about 40 times more often than non-preferred ones, indicating all other things being equal, an impressive level of selectivity on the part of the pollinators.

For the calculation of more complicated network properties we can perform an average over the value of any function f(B, θ), as long as there is an algorithm to compute it. As an example, Fig. 3c shows a calculation of the quantity known as “Nestedness based on Overlap and Decreasing Fill” (NODF), a measure of the nestedness property discussed in the introduction. This quantity measures the extent to which specialist species—those with relatively few interactions—tend to interact with a subset of the partners of generalist species⁴². While it is complicated to compute NODF analytically, due to the fact that one must order the species by degrees²², it is straightforward to calculate it within our framework: we simply calculate the value for each sampled network B and plot the resulting distribution. Interestingly, the most likely value of NODF is significantly different from the one we would calculate had we assumed, as discussed above, that a single interaction is sufficient to consider two species connected. On the contrary, we find that the system is almost certainly more nested than this simple analysis would conclude.

In Fig. 3d, we compare the values of our estimated floral abundance parameters σ to the measured abundances reported by Kaiser-Bunbury et al.⁴¹. These parameters are not measures of abundance in the usual sense, because they combine actual abundance (quantity or density) with other characteristics such as ease of observation. We do find a correlation between the estimated and observed abundances, but it is relatively weak (R² = 0.54), signaling significant disagreement, on which we elaborate in the discussion section.

Undersampled data

As we have pointed out, the connections in the network about which we are most uncertain tend to be ones that are undersampled, i.e., those for which we have only a small amount of data. In an ideal world we could address this problem by taking more data, but it is rare that we have the opportunity to do this. More commonly the data have already been gathered and our task is to produce the best results we can with those data. There are nonetheless some remedies open to us, such as aggregating data over different geographical areas or time windows. In Fig. 4 we compare the edge probabilities estimated from data recorded individually at the four “restored” sites in the Mahé study during October 2012 to the edge probabilities we obtain when we aggregate these observations into a single data matrix and only then estimate the network. (We use restored sites observed during the same month because they are likely to be ecologically similar, meaning the data are measuring approximately the same system.) Comparison of the two distributions shows—as we would hope—that there are fewer uncertain edges in the aggregated network than in its disaggregated parts, i.e., there are fewer edges with probabilities in the middle of the distribution and more with probabilities close to zero or one.

**Fig. 4: Illustration of the effect of data aggregation on edge uncertainty.**

In other cases neither aggregation nor gathering more data is possible, for instance when reanalyzing a data set already collected by others or already maximally aggregated. Such data sets record the results of observational studies that are already over, and may contain too few observations, but our approach still allows us to perform rigorous inference in these circumstances.

For instance, Jordano et al.⁴³ used dozens of existing plant-pollinator and plant-frugivore data sets to argue that the degree distributions of mutualistic networks have a long tail, but this conclusion is undermined by issues with undersampling. As an example, one of the data sets they studied, originally gathered by Inouye and Pyke⁴⁴, records 1314 individual interactions over a period of 3 months in Kosciusko National Park, Australia, between 40 plants and 85 pollinator species, which works out to an average of 0.386 unique observations per species pair. Is this sampling effort sufficient to establish edges with certainty? As a point of reference, the data analyzed in Fig. 1 comprises 201 observations between 8 plants and 21 pollinators species for an average of 1.196 observations per pair of species, and the aggregated data of Fig. 4 contain 1.420 observations for every pair. Nonetheless, there is uncertainty about some of the connections in these reconstructed networks; this suggests that the network of Inouye and Pyke, with less than a third as much data per species pair, will contain significant uncertainty.

Even so, our method allows us to make inferences about this network. In Fig. 5, we show estimates of the degree distributions of both plant and pollinator nodes in the network obtained from the posterior distribution P(B∣M), along with naive estimates calculated by thresholding the (undersampled) data as in the study by Jordano et al.⁴³. As the figure shows, the results derived from the two approaches are very different. The thresholded degree distributions were classified as scale-free by Jordano et al., but this classification no longer holds once we account for the issues with the data; the inferred degree distributions are in this case well-modeled as Poisson distributions of means 5.53 and 2.60 for plants and pollinators respectively and the power-law form is a poor fit. On the other hand, the abundance parameters of the model, shown in Fig. 5, do appear to have a broad distribution, an interesting finding that calls for a rethinking of the relationship between abundances and degree distributions. It is generally thought that interactions will tend to be evenly distributed under an even distribution of abundance¹³ but here the opposite seems to be true.

**Fig. 5: Distributions of species-level parameters for a network of plants and pollinators in Kosciusko National Park, Australia, from the study by Inouye and Pyke⁴⁴.**

Discussion

In this paper, we have proposed a statistical model of plant–pollinator interactions and shown how it can be used to infer the structure and properties of empirical plant–pollinator networks from noisy, error-prone measurements. The model employs elementary ecological insights to create an expressive and versatile structure that can capture the pattern of interactions in a wide range of ecosystems. We use the toolbox of Bayesian statistics to develop both an inference algorithm and a model checking procedure for the model. Our methods explicitly allow for the possibility that there are multiple plausible networks that could fit a given set of observations, a hallmark of Bayesian analysis. Doing this allows us to make accurate deductions even in cases where data sets are small and the number of model parameters is large.

Applying our method to previously published plant-pollinator visitation data, we arrive at a number of conclusions. First, our analysis confirms previous findings that there is some uncertainty in measured values of the connectance^{23,24,25,26,27} and that moderate connectance⁶ seems to hold in plant-pollinator networks even once we account for uncertainty (Fig. 3a). Second, we find that pollinators strongly prefer the plants they normally visit over ones they do not, with pollinators visiting their preferred plant species about 40 times more often than non-preferred ones in our results (Fig. 3b). This highlights the strong selectivity of pollinators for the plant species they usually visit. Third, we find that networks reconstructed using our method are more nested than networks built using thresholds of one or a few visits to determine plant-pollinator interactions, which supports the longstanding claim that plant-pollinator networks are nested⁶. Finally, our analysis suggests that the distribution of number of interactions of a species (the degree distribution) is less skewed than previously thought⁴³. This result supports recent findings showing that incomplete sampling strongly underestimates the number of interactions and overestimates the degree of specialization.

Our model and inference algorithm also give an estimate of species abundances. As we have argued, these estimates actually capture a combination of effects on observation frequency beyond just plain abundance, which helps to explain why, as we have seen, measured and estimated floral abundances are correlated but not strongly so. Disagreements between measured and estimated abundances were observed previously by Vazquez et al.²⁸, who used null models to show that measured abundances cannot in general explain the form of visitation matrices. Taken together, these results indicate that the frequency of observed interaction between plants and pollinators is not in fact proportional to their plain abundances (defined as quantity or density of individuals), but instead incorporates a range of factors potentially including abundance, ease of observation, network effects, and others⁴⁵. One candidate for a possible additional factor that could play a role is adaptive foraging by pollinators, which has been shown to influence the structure of ecological networks^4,46. Adaptive foraging occurs, for example, when pollinators deliberately visit less abundant plants more often if those plants contain more food (such as nectar or pollen) relative to more abundant plants with less food⁷. Our estimated abundance parameters automatically include such factors where traditional field estimates of pollinator abundance—such as the number of visits of a pollinator species—do not. Analyses that use traditional estimates of abundance, as in refs. ^15,16, may as a result fail to control for significant species-level effects on observed visitation rates¹³. We would therefore argue that best practice calls for the use of estimated abundances like those proposed here rather than traditional ones when estimating networks of preferred interactions.

There are a number of ways in which the approach presented here could be extended. The method as described assumes an ecosystem that is more or less static, but ecosystems can change rapidly with the seasons. One could imagine a dynamic variant of the model that allows parameters to evolve over time, or networks with several levels of preference, allowing for more nuanced description of plant–pollinator systems. On the applications side, we have limited our analysis to the important case of plant–pollinator networks, but similar methods could be applied to other types of ecological networks, allowing us to better separate signal from noise in those domains too.

Methods

As outlined in the main text, our method relies on a generative network model in which observed visits to plants by pollinators are considered noisy measurements of an unobserved underlying plant–pollinator network. This formulation allows us to frame the task of determining the network structure as a Bayesian inference problem^31,47,48,49 in which the probability of the network having incidence matrix B given a data matrix M is

$$P({\boldsymbol{B}},\theta | {\boldsymbol{M}})=\frac{P({\boldsymbol{M}}| {\boldsymbol{B}},\theta )P({\boldsymbol{B}}| \theta )P(\theta )}{P({\boldsymbol{M}})},$$

(13)

where θ are model parameters and P(M) is an unimportant normalizing constant. The element M_ij of matrix M records the number of times insects of species j are seen to pollinate plant species i, while B_ij = 0, 1 encodes the presence or absence of an edge between the two species in the plant–pollinator network. Both matrices are of dimension n_p × n_a where n_p is the number of plants and n_a is the number of pollinators.

We model the number of visits M_ij as a Poisson random variable with mean

$${\mu }_{ij}=C{\sigma }_{i}{\tau }_{j}(1+r{B}_{ij}),$$

(14)

and use independent priors on all parameters with

$$P(r)=\lambda {e}^{-\lambda r},\qquad \lambda =0.01,$$

(15)

and uniform priors on C, σ, and τ. We further assume that edges are a priori equally likely with probability ρ and use a uniform prior distribution on ρ itself. This leads to

$$P({\boldsymbol{B}},\theta | {\boldsymbol{M}})\propto P(\theta )\mathop{\prod}\limits_{ij}{(1-\rho )}^{1-{B}_{ij}}{\rho }^{{B}_{ij}}\frac{{\mu }_{ij}^{{M}_{ij}}}{{M}_{ij}!}{e}^{-{\mu }_{ij}},$$

(16)

with P(θ) ∝ P(r). We note that in this Bayesian formulation, one can easily model interaction specific traits³⁹ or account for known biology like trait-matching^13,50 by altering the priors on B_ij for a particular pair of species i, j⁴⁸.

Bayesian reconstruction of networks

Given the probability distribution in Eq. (16) there are a number of approaches we could take. Following recent work^47,48, we could employ an expectation–maximization (EM) algorithm to calculate the distribution over potential network structures and a point estimate of θ, or we could integrate out the parameters θ and then sample from the resulting marginal distribution on B⁴⁹. Neither of these approaches is completely satisfactory here however, the first because point estimates of the parameters can be unreliable for large models such as ours, and the second because the values of the model parameters are actually of interest to us, so we would prefer not to eliminate them.

Instead therefore we make use of a technique from the literature on finite mixture models⁵¹ to sample efficiently from the joint distribution of both B and θ and hence estimate both. First, we sample values of the parameters θ from their marginal distribution

$$P(\theta | {\boldsymbol{M}})=\mathop{\sum}\limits_{{\boldsymbol{B}}}P({\boldsymbol{B}},\theta | {\boldsymbol{M}}).$$

(17)

The sum over B can be carried out analytically because the particular P(B, θ∣M) defined in Eq. (17) can be written in the form

$$\mathop{\sum}\limits_{{\boldsymbol{B}}}P({\boldsymbol{B}},\theta | {\boldsymbol{M}})=\mathop{\sum}\limits_{{\boldsymbol{B}}}\mathop{\prod}\limits_{ij}{x}_{ij}^{{B}_{ij}}{y}_{ij}^{1-{B}_{ij}}=\mathop{\prod}\limits_{ij}({x}_{ij}+{y}_{ij}),$$

where x_ij and y_ij combine all the terms associated with the situation where there is/is not an edge. We then find that

$$P(\theta | {\boldsymbol{M}})\propto {e}^{-C}P(\theta )\mathop{\prod}\limits_{ij}{(C{\sigma }_{i}{\tau }_{j})}^{{M}_{ij}} \left[1-\rho +\rho {(1+r)}^{{M}_{ij}}{e}^{-C{\sigma }_{i}{\tau }_{j}r}\right].$$

(18)

We can now sample from this distribution using standard methods such as Hamiltonian Monte Carlo—see below. This gives us our estimates of the parameter values.

For given values of the parameters we then estimate the network B itself by sampling from the distribution

$$P({\boldsymbol{B}}| {\boldsymbol{M}},\theta )=\frac{P({\boldsymbol{M}}| {\boldsymbol{B}},\theta )P({\boldsymbol{B}}| \theta )}{P({\boldsymbol{M}}| \theta )}.$$

(19)

Using the previous expressions for the likelihood P(M∣B, θ) and P(B∣θ)—Eqs. (3) and (6) of the Results—and noting that the denominator P(M∣θ) is proportional to Eq. (18), we find

$$P({\boldsymbol{B}}| {\boldsymbol{M}},\theta )=\frac{{\prod }_{ij}{(1-\rho )}^{1-{B}_{ij}}{\left[\rho {(1+r)}^{{M}_{ij}}{e}^{-C{\sigma }_{i}{\tau }_{j}r}\right]}^{{B}_{ij}}}{{\prod }_{ij}\left[1-\rho +\rho {(1+r)}^{{M}_{ij}}{e}^{-C{\sigma }_{i}{\tau }_{j}r}\right]}=\mathop{\prod}\limits_{ij}{Q}_{ij}^{{B}_{ij}}{(1-{Q}_{ij})}^{1-{B}_{ij}},$$

(20)

where

$${Q}_{ij}=P({B}_{ij}=1| {\boldsymbol{M}},\theta )=\frac{\rho {(1+r)}^{{M}_{ij}}{e}^{-C{\sigma }_{i}{\tau }_{j}r}}{1-\rho +\rho {(1+r)}^{{M}_{ij}}{e}^{-C{\sigma }_{i}{\tau }_{j}r}}$$

(21)

is the posterior probability of an edge between species i and j, given the parameters θ.

We now simply average Q_ij over our sampled values of the parameters θ to get the expected probability of an edge between any pair of nodes. More generally, we can calculate an estimate of any function f(B, θ) by drawing m samples θ_k of the parameter set and n random incidence matrices B_l(θ_k) for each set, with edges appearing independently with probabilities Q_ij given by (21), then averaging:

$$\left\langle f({\boldsymbol{B}},\theta )\right\rangle \simeq \frac{1}{mn}\mathop{\sum }\limits_{k=1}^{m}\mathop{\sum }\limits_{l=1}^{n}f\left({{\boldsymbol{B}}}_{l}({\theta }_{k}),{\theta }_{k}\right).$$

(22)

Implementation

In our implementation of this approach we sample parameters θ from the distribution of Eq. (18) using the technique known as Hamiltonian Monte Carlo (HMC). In HMC one defines an inertial mechanics in a position space equivalent to the space of the parameters, with auxiliary momenta chosen so that the dynamics under the corresponding Hamilton’s equations samples from the desired distribution⁵². We implement the calculation in Stan, a probabilistic programming language that automatically performs HMC sampling for arbitrary target distributions⁵³. In practice, the program operates on the log of the posterior probability, which for our distribution (18) has the form ${\mathrm{log}}\,P(\theta | {\boldsymbol{M}})=-C+{\sum }_{ij}({X}_{ij}+{Y}_{ij})$ where

$${X}_{ij}={M}_{ij}{\mathrm{log}}\,C{\sigma }_{i}{\tau }_{j},$$

(23)

$${Y}_{ij}={\mathrm{log}}\,\left(1-\rho +\rho {(1+r)}^{{M}_{ij}}{e}^{-C{\sigma }_{i}{\tau }_{j}r}\right).$$

(24)

To avoid potential over- or underflow and ensure numerical stability we rewrite the latter expression slightly by defining

$${\mu }_{ij}={\mathrm{log}}\,(1-\rho ),\quad {\nu }_{ij}={\mathrm{log}}\,\rho +{M}_{ij}{\mathrm{log}}\,(1+r)-Cr{\sigma }_{i}{\tau }_{j},$$

(25)

and then writing

$${Y}_{ij}=\left\{\begin{array}{ll}{\mu }_{ij}+{\mathrm{log}}\,\left(1+{e}^{{\nu }_{ij}-{\mu }_{ij}}\right)&\,\text{if}\,\ {\mu }_{ij}\,> \,{\nu }_{ij},\\ {\nu }_{ij}+{\mathrm{log}}\,\left(1+{e}^{{\mu }_{ij}-{\nu }_{ij}}\right)&\,\text{otherwise,}\,\end{array}\right.$$

(26)

which ensures that Y_ij is always a manageable number.

An important practical consideration is verifying the convergence of the Monte Carlo algorithm. HMC mixes rapidly, but, like all Monte Carlo methods, it can sometimes become trapped at local optima. To ensure representative sampling of the posterior distribution, we therefore perform multiple Monte Carlo runs from random initial states and if any of the runs converges to a region of significantly smaller probability than the others then we repeat the entire calculation. In the example calculations given in the paper we perform four runs, with an equilibration period of 5000 Monte Carlo steps each, followed by taking 500 samples.

Quantifying error using posterior predictive assessment

A crucial part of the model fitting process is assessing whether the model is a good fit to the data. In the main text we argue that a so-called posterior predictive test is a good way of making this assessment. The idea is to generate a new artificial data set $\widetilde{{\boldsymbol{M}}}$ from the model using the values of the model parameters derived from the fit to the input data M. If we find that $\widetilde{{\boldsymbol{M}}}$ looks similar to the input data then our model has done a good job of capturing the structure of the data.

To carry out this procedure we need to calculate the posterior predictive distribution for species pair i, j given by

$$P({\widetilde{M}}_{ij}| {\boldsymbol{M}})=\mathop{\sum}\limits_{{\boldsymbol{B}}}\int P({\widetilde{M}}_{ij}| {\boldsymbol{B}},\theta )P({\boldsymbol{B}},\theta | {\boldsymbol{M}})d\theta .$$

(27)

Since the likelihood P(M∣B, θ), Eq. (3), factors into separate terms for each plant–pollinator pair i, j, this expression can with a little work be simplified to

$$P({\widetilde{M}}_{ij}| {\boldsymbol{M}}) = \int P(\theta | {\boldsymbol{M}})\left[{Q}_{ij}P({\widetilde{M}}_{ij}| {B}_{ij}=1,\theta )+(1-{Q}_{ij})P({\widetilde{M}}_{ij}| {B}_{ij}=0,\theta )\right]d\theta ,$$

(28)

and the integral can then be approximated by simply averaging over the set of sampled values of θ.

Two particularly useful statistics for the posterior predictive test are the mean and the variance of ${\widetilde{M}}_{ij}$, which in this case are equal since ${\widetilde{M}}_{ij}$ by definition has a Poisson distribution for given B and θ. Both are to a good approximation given by

$${\lambda }_{ij}\simeq \frac{1}{n}\mathop{\sum }\limits_{k=1}^{n}\left[{Q}_{ij}({\theta }_{k}){\mu }_{ij}({B}_{ij}=1)+\big(1-{Q}_{ij}({\theta }_{k})\big){\mu }_{ij}({B}_{ij}=0)\right],$$

(29)

where μ_ij is the mean defined in Eq. (14).

Description of the data sets

The data analyzed in Figs. 1–4 were gathered by Kaiser-Bunbury et al.⁴¹ on inselbergs (steep-sided monolithic rocky outcroppings) on the tropical granitic island of Mahé, located in the Indian Ocean. The vegetation on the inselbergs is characterized by short trees, shrubs, and an absence of flowering herbs. The data we analyze includes records of the visits of pollinator species to all plant species found in each of the eight inselbergs, observed between September 2012 and April 2013 during the island’s eight-month-long tropical flowering season. Species visiting flowers were recorded as pollinators if they touched the sexual parts of the flowers within a standard observation window of 30 minutes⁵⁴. Floral abundances were obtained by counting flowers in 1-meter cubes randomly located along transects spanning the inselbergs. The visit data were used to generate 64 data matrices of plant–pollinator interactions, one for each period and location. Our primary analysis focuses on the matrix for the site known as Trois-Frères as observed during the month of December 2012. We chose this data set primarily because it is relatively small and hence easy to visualize.

The data analyzed in Fig. 5 were gathered by Inouye and Pyke⁴⁴ in the Kosciusko National Park, Australia, between December 21, 1983, and March 30, 1984. The observations were made in 26 plots of 2m × 2m, chosen before the flowering season and encompassing an alpine zone at elevations ranging from 1940 to 2040 meters and a montane habitat at elevations of 1860 to 1920 meters. Flowers were counted roughly every second day. Insect visitation data were collected through incidental observations made during the phenological censuses of the flowers as well as dedicated observation periods of 10 minutes length, spread throughout the study duration. The data set of Inouye and Pyke⁴⁴ is only one of several data sets re-analyzed by Jordano et al.⁴³. We chose this data set because it is somewhat undersampled, making it a good example of a situation where our method can improve network estimates.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

The Mahé visitation data used in this study are available as supplementary material to Kaiser-Bunbury et al.⁴¹. The data of Inouye and Pyke⁴⁴ analyzed in Fig. 5 can be downloaded from the Web of Life data base⁵⁵, available at http://www.web-of-life.es, under the network identifier M_PL_019.

Code availability

Reference implementations in stan and python of the methods described in this study are freely available online⁵⁶.

References

Martinez, N. D. Artifacts or attributes? Effects of resolution on the Little Rock Lake food web. Ecol. Monogr. 61, 367–392 (1991).
Article Google Scholar
Bascompte, J., Jordano, P., Melian, C. J. & Olesen, J. M. The nested assembly of plant-animal mutualistic networks. Proc. Natl. Acad. Sci. USA 100, 9383–9387 (2003).
Article ADS CAS PubMed PubMed Central Google Scholar
Thébault, E. & Fontaine, C. Stability of ecological communities and the architecture of mutualistic and trophic networks. Science 329, 853–856 (2010).
Article ADS PubMed CAS Google Scholar
Valdovinos, F. S. Mutualistic networks: Moving closer to a predictive theory. Ecol. Lett. 22, 1517–1534 (2019).
Article PubMed Google Scholar
Brose, U., Williams, R. J. & Martinez, N. D. Allometric scaling enhances stability in complex food webs. Ecol. Lett. 9, 1228–1236 (2006).
Article PubMed Google Scholar
Bascompte, J. & Jordano, P. Mutualistic Networks. (Princeton University Press, Princeton, NJ, 2014).
Google Scholar
Valdovinos, F. S. et al. Niche partitioning due to adaptive foraging reverses effects of nestedness and connectance on pollination network stability. Ecol. Lett. 19, 1277–1286 (2016).
Article PubMed Google Scholar
Thompson, J. N. The Coevolutionary Process. (University of Chicago Press, Chicago, IL, 1994).
Book Google Scholar
Ollerton, J., Winfree, R. & Tarrant, S. How many flowering plants are pollinated by animals? Oikos 120, 321–326 (2011).
Article Google Scholar
Ollerton, J. Pollinator diversity: Distribution, ecological function, and conservation. Annu. Rev. Ecol. Evol. Syst. 48, 353–376 (2017).
Article Google Scholar
Potts, S. G. et al. Safeguarding pollinators and their values to human well-being. Nature 540, 220–229 (2016).
Article ADS CAS PubMed Google Scholar
Garibaldi, L. A. et al. Wild pollinators enhance fruit set of crops regardless of honey bee abundance. Science 339, 1608–1611 (2013).
Article ADS CAS PubMed Google Scholar
Vázquez, D. P., Blüthgen, N., Cagnolo, L. & Chacoff, N. P. Uniting pattern and process in plant-animal mutualistic networks: a review. Ann. Bot. 103, 1445–1457 (2009).
Article PubMed PubMed Central Google Scholar
Gibson, R. H., Knott, B., Eberlein, T. & Memmott, J. Sampling method influences the structure of plant-pollinator networks. Oikos 120, 822–831 (2011).
Article Google Scholar
Blüthgen, N., Fründ, J., Vázquez, D. P. & Menzel, F. What do interaction network metrics tell us about specialization and biological traits? Ecology 89, 3387–3399 (2008).
Article PubMed Google Scholar
Blüthgen, N. Why network analysis is often disconnected from community ecology: a critique and an ecologist’s guide. Basic Appl. Ecol. 11, 185–195 (2010).
Article Google Scholar
Cirtwill, A. R., Eklöf, A., Roslin, T., Wootton, K. & Gravel, D. A quantitative framework for investigating the reliability of empirical network construction. Methods Ecol. Evol. 10, 902–911 (2019).
Article Google Scholar
Poisot, T. et al. The structure of probabilistic networks. Methods Ecol. Evol. 7, 303–312 (2016).
Article Google Scholar
Parchas, P., Gullo, F., Papadias, D. & Bonchi, F. Uncertain graph processing through representative instances. ACM Trans. Database Syst. 40, 1–39 (2015).
Article MathSciNet Google Scholar
Khan, A., Ye, Y. & Chen, L. On Uncertain Graphs. (Morgan & Claypool Publishers, San Rafael, CA, 2018).
Book MATH Google Scholar
Bascompte, J. & Jordano, P. Plant-animal mutualistic networks: the architecture of biodiversity. Annu. Rev. Ecol. Evol. Syst. 38, 567–593 (2007).
Article MATH Google Scholar
Payrató-Borràs, C., Hernández, L. & Moreno, Y. Breaking the spell of nestedness: the entropic origin of nestedness in mutualistic systems. Phys. Rev. X 9, 031024 (2019).
Google Scholar
Nielsen, A. & Bascompte, J. Ecological networks, nestedness and sampling effort. J. Ecol. 95, 1134–1141 (2007).
Article Google Scholar
Petanidou, T., Kallimanis, A. S., Tzanopoulos, J., Sgardelis, S. P. & Pantis, J. D. Long-term observation of a pollination network: Fluctuation in species and interactions, relative invariance of network structure and implications for estimates of specialization. Ecol. Lett. 11, 564–575 (2008).
Article PubMed Google Scholar
Hegland, S. J., Dunne, J., Nielsen, A. & Memmott, J. How to monitor ecological communities cost-efficiently: The example of plant-pollinator networks. Biol. Cons. 143, 2092–2101 (2010).
Article Google Scholar
Chacoff, N. P. et al. Evaluating sampling completeness in a desert plant-pollinator network. J. Anim. Ecol. 81, 190–200 (2012).
Article PubMed Google Scholar
Rivera-Hutinel, A., Bustamante, R. O., Marín, V. H. & Medel, R. Effects of sampling completeness on the structure of plant-pollinator networks. Ecology 93, 1593–1603 (2012).
Article CAS PubMed Google Scholar
Vázquez, D. P. et al. Species abundance and asymmetric interaction strength in ecological networks. Oikos 116, 1120–1127 (2007).
Article Google Scholar
Bartomeus, I. Understanding linkage rules in plant-pollinator networks by using hierarchical models that incorporate pollinator detectability and plant traits. PLoS ONE 8, e69200 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Fründ, J., McCann, K. S. & Williams, N. M. Sampling bias is a challenge for quantifying specialization and network structure: Lessons from a quantitative niche model. Oikos 125, 502–513 (2016).
Article Google Scholar
Brugere, I., Gallagher, B. & Berger-Wolf, T. Y. Network structure inference, a survey: Motivations, methods, and applications. ACM Comput. Surv. 51, 24:1–24:39 (2018).
Google Scholar
Farine, D. R. & Strandburg-Peshkin, A. Estimating uncertainty and reliability of social network data using bayesian inference. R. Soc. Open Sci 2, 150367 (2015).
Article ADS PubMed PubMed Central Google Scholar
Butts, C. T. Revisiting the foundations of network analysis. Science 325, 414–416 (2009).
Article ADS MathSciNet CAS PubMed MATH Google Scholar
Vázquez, D. P., Chacoff, N. & Cagnolo, L. Evaluating multiple determinants of the structure of plant-animal mutualistic networks. Ecology 90, 2039–2046 (2009).
Article PubMed Google Scholar
Jordano, P. Patterns of mutualistic interactions in pollination and seed dispersal: connectance, dependence asymmetries, and coevolution. Am. Nat. 129, 657–677 (1987).
Article Google Scholar
Jordano, P. Sampling networks of ecological interactions. Funct. Ecol. 30, 1883–1893 (2016).
Article Google Scholar
Kaiser-Bunbury, C. N., Muff, S., Memmott, J., Müller, C. B. & Caflisch, A. The robustness of pollination networks to the loss of species and interactions: a quantitative approach incorporating pollinator behaviour. Ecol. Lett. 13, 442–452 (2010).
Article PubMed Google Scholar
Ramos-Jiliberto, R., Valdovinos, F. S., de Espanés, P. M. & Flores, J. D. Topological plasticity increases robustness of mutualistic networks. J. Anim. Ecol. 81, 896–904 (2012).
Article PubMed Google Scholar
Kaiser-Bunbury, C. N., Vázquez, D. P., Stang, M. & Ghazoul, J. Determinants of the microstructure of plant–pollinator networks. Ecology 95, 3314–3324 (2014).
Article Google Scholar
Gelman, A., Meng, X.-L. & Stern, H. Posterior predictive assessment of model fitness via realized discrepancies. Stat. Sin. 6, 733–760 (1996).
MathSciNet MATH Google Scholar
Kaiser-Bunbury, C. N. et al. Ecosystem restoration strengthens pollination network resilience and function. Nature 542, 223–227 (2017).
Article ADS CAS PubMed Google Scholar
Almeida-Neto, M., Guimaraes, P., Guimaraes, P. R. Jr., Loyola, R. D. & Ulrich, W. A consistent metric for nestedness analysis in ecological systems: Reconciling concept and measurement. Oikos 117, 1227–1239 (2008).
Article Google Scholar
Jordano, P., Bascompte, J. & Olesen, J. M. Invariant properties in coevolutionary networks of plant-animal interactions. Ecol. Lett. 6, 69–81 (2003).
Article Google Scholar
Inouye, D. W. & Pyke, G. H. Pollination biology in the snowy mountains of Australia: comparisons with montane Colorado, USA. Aust. J. Ecol. 13, 191–205 (1988).
Article Google Scholar
Ovaskainen, O., Abrego, N., Halme, P. & Dunson, D. Using latent variable models to identify large networks of species-to-species associations at different spatial scales. Methods Ecol. Evol. 7, 549–555 (2016).
Article Google Scholar
Valdovinos, F. S., Ramos-Jiliberto, R., Garay-Narváez, L., Urbani, P. & Dunne, J. A. Consequences of adaptive behaviour for the structure and dynamics of food webs. Ecol. Lett. 13, 1546–1559 (2010).
Article PubMed Google Scholar
Newman, M. E. J. Network structure from rich but noisy data. Nat. Phys. 14, 542–545 (2018).
Article CAS Google Scholar
Newman, M. E. J. Estimating network structure from unreliable measurements. Phys. Rev. E 98, 062321 (2018).
Article ADS CAS Google Scholar
Peixoto, T. P. Reconstructing networks with unknown and heterogeneous errors. Phys. Rev. X 8, 041011 (2018).
Google Scholar
Pichler, M., Boreux, V., Klein, A., Schleuning, M. & Hartig, F. Machine learning algorithms to infer trait-matching and predict species interactions in ecological networks. Methods Ecol. Evol. 11, 281–293 (2019).
Article Google Scholar
Gelman, A. et al. Bayesian Data Analysis. 3rd edition (Chapman and Hall/CRC, New York, NY, 2013).
Book Google Scholar
Betancourt, M. A conceptual introduction to Hamiltonian Monte Carlo. Preprint at https://arxiv.org/abs/1701.02434 (2017).
Carpenter, B. et al. Stan: A probabilistic programming language. J. Stat. Softw. 76, 1–32 (2017).
Article Google Scholar
Kaiser-Bunbury, C. N., Valentin, T., Mougal, J., Matatiken, D. & Ghazoul, J. The tolerance of island plant-pollinator networks to alien plants. J. Ecol. 99, 202–213 (2011).
Article Google Scholar
Fortuna, M. A., Ortega, R. & Bascompte, J. The web of life. Preprint at https://arxiv.org/abs/1403.2575 (2014).
Young, J.-G. plant-pollinator-inference (2021), https://doi.org/10.5281/zenodo.4759159.

Download references

Acknowledgements

We thank Alec Kirkley, George Cantwell, and Maria Riolo for helpful discussions. This work was funded in part by the James S. McDonnell Foundation (J.-G.Y.) and the US National Science Foundation under grants DEB–1834497 (F.S.V.) and DMS–1710848 and DMS–2005899 (M.E.J.N.), as well as University of Michigan MICDE grant U061182 (F.S.V.).

Author information

Authors and Affiliations

Department of Computer Science, University of Vermont, Burlington, VT, USA
Jean-Gabriel Young
Vermont Complex Systems Center, University of Vermont, Burlington, VT, USA
Jean-Gabriel Young
Center for the Study of Complex Systems, University of Michigan, Ann Arbor, MI, USA
Jean-Gabriel Young, Fernanda S. Valdovinos & M. E. J. Newman
Department of Environmental Science and Policy, University of California, Davis, CA, USA
Fernanda S. Valdovinos
Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
Fernanda S. Valdovinos
Department of Physics, University of Michigan, Ann Arbor, MI, USA
M. E. J. Newman

Authors

Jean-Gabriel Young
View author publications
You can also search for this author in PubMed Google Scholar
Fernanda S. Valdovinos
View author publications
You can also search for this author in PubMed Google Scholar
M. E. J. Newman
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

F.S.V. and M.E.J.N. conceptualized the study. All authors developed the model. J.-G.Y. and M.E.J.N. developed the Bayesian inference framework, J.-G.Y. implemented the model and performed the numerical experiments. All authors analyzed the results and contributed to writing the manuscript.

Corresponding author

Correspondence to Jean-Gabriel Young.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Communications thanks Carsten Dormann, Christopher Kaiser-Bunbury and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Young, JG., Valdovinos, F.S. & Newman, M.E.J. Reconstruction of plant–pollinator networks from observational data. Nat Commun 12, 3911 (2021). https://doi.org/10.1038/s41467-021-24149-x

Download citation

Received: 11 December 2019
Accepted: 03 June 2021
Published: 23 June 2021
DOI: https://doi.org/10.1038/s41467-021-24149-x

This article is cited by

All paths lead to hubs in the spectroscopic networks of water isotopologues H216O and H218O
- Roland Tóbiás
- Meissa L. Diouf
- Attila G. Császár
Communications Chemistry (2024)
Compressing network populations with modal networks reveal structural diversity
- Alec Kirkley
- Alexis Rojas
- Jean-Gabriel Young
Communications Physics (2023)
Hypergraph reconstruction from uncertain pairwise observations
- Simon Lizotte
- Jean-Gabriel Young
- Antoine Allard
Scientific Reports (2023)
Statistical inference links data and theory in network science
- Leto Peel
- Tiago P. Peixoto
- Manlio De Domenico
Nature Communications (2022)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.