Improving assessment accuracy for lake biological condition by classifying lakes with diatom typology, varying metrics and modeling multimetric indices
Graphical abstract
Introduction
Assessments of biological condition are important for managing freshwater resources (European Union, 2000, USEPA (U.S. Environmental Protection Agency), 2007a, USEPA (U.S. Environmental Protection Agency), 2007b). In lakes, diatoms have a long history of use in paleoecological studies that document lake responses to a wide variety of human disturbances, because diatoms are sensitive to many environmental changes and current as well as past assemblages are preserved in lake sediments (Smol and Stoermer, 2010). Diatoms are also important primary producers, elements of food webs, and sources of biodiversity in lakes (Mann and Droop, 1996); thus diatoms are important elements of biological condition in lakes (sensu Davies and Jackson, 2006). Diatom assemblages may play a unique role for understanding biological integrity, because they likely respond to different types of disturbances compared to lake invertebrates and fish, as they do in streams (O'Connor et al., 2000, Hering et al., 2006, Carlisle et al., 2008, Beck and Hatch, 2009). As a result, diatoms should be particularly valuable in the assessment of current lake conditions as well as paleoecological studies.
Relationships among natural environment factors, human disturbance and metrics are complicated. Relationships between human disturbance and metrics can be influenced by the effects of natural environment on both metrics and disturbance (Stoddard et al., 2008, Hawkins et al., 2010, Schoolmaster et al., 2013). Thus, one of the challenges with assessing ecological condition across large spatial scales is distinguishing effects of human disturbance from natural variation (Stevenson et al., 2013). Natural variability in diatom assemblage composition is great at continental spatial scales and may be related to species biogeographies and the high sensitivity of diatoms species composition to naturally varying environmental factors. Stevenson et al. (2009) showed that a diatom metric for trophic status was affected as much by natural variability among streams as human disturbance. A priori classification of sites by regions or typologies, site-specific modeling of expected reference condition, and varying metrics in site groups are four approaches that have been used to control natural variation in ecological assessments (Whittier et al., 2007, Hawkins et al., 2010).
Landscape regionalizations and aquatic biota assemblage composition have been used to group sites into classes to account for natural variation among sites (Hawkins et al., 2010). Regionalization scheme, such as Omernik's ecoregions (Omernik, 1987), has been extensively used in freshwater assessment, particularly in the US (USEPA, 2010). Ecoregions and EDUs are assumed to capture a significant amount of the natural variation in metrics or multimetric indices (MMIs) caused by differences in climate, geology, hydrology, soils, and surrounding vegetation. Regionalization schemes, however, cannot account for biotic response to natural variation within an ecoregion (Hawkins et al., 2010). Biological typologies assign sites to groups (i.e. typologies) by similarity in species composition of assemblages at reference sites. Biological typologies are not spatially constrained, so they can account for natural variation within and across regions. Biological typologies are used to account for natural variation in species composition among habitats in RIVPACS (Wright et al., 2000), a widely used approach for stream bioassessment in Europe and Australia.
Site-specific modeling of expected reference condition enables adjusting individual metrics for natural variation among sites. The adjusted metric values are the difference between the unadjusted metric values and the modeled expected reference value of each metric for that site. Models for expected reference metric value for a site are calculated using reference site data including unadjusted metric values and a suite of environmental variables that are affected relatively little by humans. Up to now, a variety of statistical techniques have been used to model relationships between individual metrics or MMIs and natural gradients, such as multiple linear regression (MLR) (Stevenson et al., 2013), classification and regression trees (CART, Cao et al., 2007), and random forest (RF, Hollister et al., 2016). Linear regression and CART have advantages over other techniques, because they are easier to understand by stakeholders. But more advanced modeling techniques that involve machine learning may perform better. For example, both RF and CART can model nonlinear relationships with interactions better than MLR. Moreover, RF is less susceptible to overfitting than CART and would therefore provide more accurate predictions when used with new data than CART (Breiman, 2001, Cutler et al., 2007). The choice of technique might depend on sample size and non-linear interaction of multiple variables (Smith et al., 2013), because machine learning statistical techniques usually require larger sample sizes for precise models.
Performance of MMIs could be increased if different metrics are used in different ecoregions, because human activities and the stressors they produce vary greatly among ecoregions (Ellis and Ramankutty, 2008) and sensitivity of metrics differs among stressors (Whittier et al., 2007). Both the types and intensity of human activities vary among ecoregions, with extensive agriculture in some ecoregions and more patchy urban and agricultural activities in others (USEPA, 2013). Responses of stream diatom metrics to a nutrient dominated agricultural gradient likely differ compared to a multistressor gradient with both urban and agricultural activities (Tang et al., 2016). Whittier et al. (2007) found that using different metrics in different ecoregions provided the best MMI performance, which indicates that some metrics did not respond to human disturbance as much in some ecoregions as others. Performance of MMIs could also be increased if different metrics were used in different groups of sites defined by biological typology. For example, fish and invertebrate species richness differed in cold and warm water habitats (Mebane et al., 2003, Hughes et al., 2004). However, a trade-off exists between consistency and sensitivity when deciding whether to use different biotic metrics for MMIs in different ecoregions. MMIs might become more sensitive to human disturbance if different metrics are used among ecoregions (or site groups defined by biological typology), but changing metrics also changes what we are assessing and therefore reduces consistency in assessments across groups.
In the present study, we evaluated different methods for improving the performance of a nationwide diatom MMI for lakes with the US Environmental Protection Agency's dataset from the 2007 National Lakes Assessment (NLA). We evaluated three hypotheses: (1) performance of MMIs will be greater when grouping sites by diatom typology than by ecoregions; (2) MMIs generated by selecting metrics for each site group (typologies or ecoregions) will perform better than by using the same set of metrics in all site groups; and (3) different statistical techniques (e.g. MLR, CART, RF) for adjusting metrics for natural variability will perform best in different situations. To do this, we grouped sites by ecoregions and diatom typology and calculated site-specific models of expected reference condition for each group of sites by ecoregion or typology. We then compared metric and MMI performance using a standard set of statistics that have been used in other evaluations of ecological assessment methods.
Section snippets
Data sets
The NLA was conducted by the United States Environment Protection Agency (USEPA). The NLA provides a nationwide dataset and analysis in which the same standardized field and laboratory protocols and the same data analyses were used for individual biological assemblages (http://water.epa.gov/type/lakes/lakessurvey_index.cfm). 1031 lakes were sampled for the NLA. 909 were selected randomly with a probabilistic sampling design from the pool of all USA lakes. 122 lakes were hand-picked to serve as
Site grouping by diatom typology
We used 5 centers to group the 144 reference sites with K-means clustering into 5 diatom typologies of reference lakes. The among group variation explained with 5 centers was 60.1% of all variation. A higher number of clusters would have provided much less improvement in variation explained per cluster than the previous five clusters, but a higher number of clusters would not have provided a sufficient number of reference sites in each cluster for metric modeling. The numbers of reference sites
Effect of site grouping methods and metric modeling on MMI performance
Hierarchical metric modeling, with site grouping and site specific metric modeling, improved lake diatom MMI performance compared to accounting for natural variation in MMIs by either ecoregions or site specific models. Stevenson et al. (2013) argued that performance of an adjusted MMI (NLA-MLR) was better than an unadjusted MMI for the USEPA's NLA data, where adjustments for natural variation were made with site specific models for MMIs (not individual metrics as in this study). In that study,
Conclusions
Hierarchical modeling improved diatom MMI performance for lakes with a combination of site grouping and modeling expected reference values of metrics within site groups. Modeled MMIs within diatom typologies had the highest overall performance and sensitivity to HDG when evaluated with all 1031sites at a national scale. However, when MMI performance was evaluated for each site group to assess consistency and to follow common USEPA methods, there was little performance difference between
Acknowledgements
We thank K. A. Blocksom and J. van Sickle for providing an original version of R code to calculate diatom multimetric indices. RJS was partially supported by a cooperative agreement with the USEPA (Grant R835203).
References (45)
- et al.
Developing an optimal river typology for biological elements within the Water Framework Directive
Water Res.
(2005) - et al.
Complementarity-based conservation prioritization using a community classification, and its application to riverine ecosystems
Biol. Conserv.
(2010) - et al.
An algorithmic and information-theoretic approach to multimetric index construction
Ecol. Indic.
(2013) - et al.
A comparison of random forest regression and multiple linear regression for prediction in neuroscience
J. Neurosci. Methods
(2013) - et al.
Accounting for regional variation in both natural environment and human disturbance to improve performance of multimetric indices of lotic benthic diatoms
Sci. Total Environ.
(2016) - et al.
A review of research on the development of lake indices of biotic integrity
Environ. Rev.
(2009) A performance comparison of metric scoring methods for a multimetric index for Mid-Atlantic Highlands streams
Environ. Manag.
(2003)Random forests
Mach. Learn.
(2001)- et al.
Modeling natural environmental gradients improves the accuracy and precision of diatom-based indicators
J. N. Am. Benthol. Soc.
(2007) - et al.
Biological assessments of Appalachian streams based on predictive models for fish, macroinvertebrate, and diatom assemblages
J. N. Am. Benthol. Soc.
(2008)
Paleoecological analysis of lake acidification trends in North America and Europe using diatoms and chrysophytes
Random forests for classification in ecology
Ecology
The biological condition gradient: a descriptive model for interpreting change in aquatic ecosystems
Ecol. Appl.
A comparison of the European Water Framework Directive physical typology and RIVPACS-type models as alternative methods of establishing reference conditions for benthic macroinvertebrates
Hydrobiologia
Assessing water quality changes in the lakes of the northeastern United States using sediment diatoms
Can. J. Fish. Aquat. Sci.
A working guide to boosted regression trees
J. Anim. Ecol.
Putting people in the map: anthropogenic biomes of the world
Front. Ecol. Environ.
Directive 2000/60/EC of the European Parliament and of the Council of 23 October 2000 establishing a framework for Community action in the field of water policy. The European Parliament and the Council of the European Union
Off. J. Eur. Communities
The reference condition: predicting benchmarks for ecological and water-quality assessments
J. N. Am. Benthol. Soc.
Assessment of European streams with diatoms, macrophytes, macroinvertebrates and fish: a comparative metric-based analysis of organism response to stress
Freshw. Biol.
Modelling lake trophic state: a random forest approach
Ecosphere
A biointegrity index (IBI) for coldwater streams of western Oregon and Washington
T. Am. Fish. Soc.
Cited by (13)
Improving biological condition assessment accuracy by multimetric index approach with microalgae in streams and lakes
2021, Science of the Total EnvironmentCitation Excerpt :Therefore, site-grouping by diatom typologies are assumed to be better than ecoregions on accounting for natural variation and generating MMI with good performance. Surprisingly, up to now, the evidence that support a better performance of site grouping by typologies was not strong either in lakes (Table 2, Liu and Stevenson, 2017) or in streams and rivers (Tang et al., 2016). The most possible reason could not be the relatively small amounts of natural variation among sites explained by typology (Stevenson et al., 2018) but due to the insufficient representative nature of typologies based on only diatom metrics (Liu et al., 2020a).
Blue-green algae enhanced performance of diatom-based multimetric index on defining lake condition under high level of human disturbance
2020, Science of the Total EnvironmentCitation Excerpt :However, we did not evaluate the effect of incorporating soft-bodied planktonic algae metrics on MMI performance in lakes because the evaluation was beyond the scope of our previous paper. For lake/stream biological condition assessment, algal assessment is mainly based on structural and functional attributes of either soft-bodied benthic/planktonic algae or diatoms (Phillips et al., 2012; Carvalho et al., 2013; Thackeray et al., 2013; Fetscher et al., 2014; Poikane et al., 2016; Liu and Stevenson, 2017). Soft-bodied algae metrics are not commonly used to develop MMI in the US possibly for three reasons: first, the laboratory procedures of soft-bodied algae taxonomic analysis of the USEPA resulted in a dominant of total algal biovolume of live diatoms for most samples, which makes it harder find qualified soft-bodied algae metrics (Stancheva and Sheath, 2016).
Advancing evaluation of bioassessment methods: A reply to Liu and Cao
2018, Science of the Total Environment