Vegetation data ordination
Nathan Roe
2025-01-02
Source:vignettes/ordinating_vegetation_data.Rmd
ordinating_vegetation_data.Rmd
Ordination is a multivariate statistical technique that is
well-suited for analyzing ecological communities. Ordination allows for
many ecologically useful relationships to be evaluated - all
correlative, of course (this is ecology after all) - including:
-
similarity/dissimilarity in vegetative composition between sites
-
relationship between community classifications and environmental
gradients
- relationship between community classifications and
species
- relationship between species and environmental gradients
This vignette is not intended to explain the mechanics of ordination techniques, distance measures, or how to interpret ordination results. There are many excellent sources for this purpose. I would recommend reading Analysis of Ecological Communities by McCune and Grace for a primer on these topics.
In this vignette I will present the use of Non-metric
Multidimensional Scaling (NMDS) with a Bray-Curtis distance measure for
creating an ordination. Much of the workflow will not be built into
ecositer
functions because it relies heavily on functions
from other packages, and incorporating them into functions will conceal
too many of parameters that can be, and should be, manipulated in this
process.
I will be using the CA792 (Sequoia and Kings Canyon National Park) Soil Survey for this demonstration.
For this example, I will be working with pedon and vegetation data
stored in the ecositer.data
package.
Additional summary soil variable calculation, as demonstrated in the Summarize Ecological Data vignette.
CA792_pedon_summary <- ecositer::summarize_pedon_soil_properties(SS = FALSE,
r_object = CA792_pedon_data,
byDepth = list(c(0, 25), c(0, 50)))
## Warning in max(h[[bottom]][no.contact.idx], na.rm = TRUE): no non-missing
## arguments to max; returning -Inf
## Warning in aqp::texcl_to_ssc(x$texcl): not all the user supplied texcl values
## match the lookup table
## Warning in aqp::texcl_to_ssc(x$texcl): not all the user supplied texcl values
## match the lookup table
## Warning in aqp::texcl_to_ssc(x$texcl): not all the user supplied texcl values
## match the lookup table
Climate variable calculation, as demonstrated in the Summarize Ecological Data vignette.
CA792_clim <- ecositerSpatial::site_prism_annual_normals(site_df = CA792_veg_data,
prism_dir = "C:/Users/Nathan.Roe/Documents/PRISM_R/annual",
id = "siteiid",
x = "utmeasting",
y = "utmnorthing",
EPSG = "EPSG:32611")
Run through QC functions for vegetation data
CA792_veg_data <- ecositer::QC_aggregate_abundance(veg_df = CA792_veg_data)
## Note -> akstratumcoverclasspct is the only abundance column used in this dataset
# manual review process because of multiple vegplots for sites with tied number of records
CA792_veg_data <- CA792_veg_data |> dplyr::filter(!vegplotiid %in% c(276394, 276184, 276395, 276185, 276228, 196369, 990978, 954117, 316789))
# now we can run QC_best_vegplot without error
CA792_veg_data <- ecositer::QC_best_vegplot_for_site(veg_df = CA792_veg_data)
## Warning -> There are sites with multiple vegetation plots. Reviewing these sites is preferable to automated selection. To view which sites have multiple vegetation plots:
## 'Your veg_df' |> dplyr::group_by(siteiid) |>
## dplyr::summarise(unique_vegplots = dplyr::n_distinct(vegplotiid)) |>
## dplyr::filter(unique_vegplots > 1)
CA792_veg_data <- ecositer::QC_update_taxonomy(veg_df = CA792_veg_data)
## Note -> The following taxonomical changes have been made.
## Castanopsis sempervirens changed to Chrysolepis sempervirens
## Yucca whipplei changed to Hesperoyucca whipplei
## Disporum hookeri var. oreganum changed to Prosartes hookeri var. oregana
## Quercus garryana var. breweri changed to Quercus garryana var. fruticosa
## Rhodomyrtus tomentosus changed to Rhodomyrtus tomentosa
## Juncus balticus changed to Juncus arcticus ssp. littoralis
## Salix arctica changed to Salix arctica
## Pinus latifolia changed to Pinus engelmannii
## Castilleja acuminata changed to Castilleja septentrionalis
## Ceanothus integerrimus var. californicus changed to Ceanothus integerrimus
## Juniperus occidentalis ssp. australis changed to Juniperus grandis
## Cymopterus terebinthinus changed to Pteryxia terebinthina var. terebinthina
## Sambucus cerulea changed to Sambucus nigra ssp. cerulea
## Chrysothamnus nauseosus changed to Ericameria nauseosa ssp. nauseosa var. nauseosa
## Vaccinium caespitosum changed to Vaccinium cespitosum
## Acacia constricta changed to Vachellia constricta
## Hosackia oblongifolia changed to Lotus oblongifolius var. oblongifolius
## Vaccinium uliginosum ssp. occidentale changed to Vaccinium uliginosum
## Platanthera leucostachys changed to Platanthera dilatata var. leucostachys
## Cercocarpus betuloides var. betuloides changed to Cercocarpus montanus var. glaber
## Heracleum lanatum changed to Heracleum maximum
## Ribes cereum var. inebrians changed to Ribes cereum var. pedicellare
## Cercis occidentalis changed to Cercis canadensis var. texensis
## Cercocarpus betuloides changed to Cercocarpus montanus var. glaber
## Acacia millefolia changed to Mariosousa millefolia
## Anemone occidentalis changed to Pulsatilla occidentalis
## Smilacina stellata changed to Maianthemum stellatum
## Aster breweri changed to Eucephalus breweri
## Artemisia norvegica ssp. saxatilis changed to Artemisia arctica ssp. arctica
## Calyptridium monospermum changed to Cistanthe monosperma
## Rhamnus rubra changed to Frangula rubra ssp. rubra
## Penstemon newberryi var. newberryi changed to Penstemon newberryi ssp. newberryi
## Deschampsia cespitosa ssp. cespitosa changed to Deschampsia cespitosa
## Caltha leptosepala var. biflora changed to Caltha leptosepala ssp. howellii
## Sedum roseum changed to Rhodiola rosea
## Eremocarpus setigerus changed to Croton setigerus
## Caulanthus coulteri var. lemmonii changed to Caulanthus lemmonii
## Calyptridium umbellatum changed to Cistanthe umbellata var. umbellata
## Micranthes tolmiei changed to Saxifraga tolmiei
CA792_veg_data <- ecositer::QC_completeness_criteria(veg_df = CA792_veg_data,
min_unique_species = 4,
min_perc_to_species = 70,
min_perc_with_abund = 80)
Remove reccords with missing abundance or missing plantsym. Remove genus level observations.
CA792_veg_data <- CA792_veg_data |> dplyr::filter(!is.na(pct_cover) | !is.na(plantsym))
species_level <- grep(pattern = " ", x = CA792_veg_data$plantsciname)
CA792_veg_data <- CA792_veg_data[species_level,]
Sum species abundances together when there are multiple records of the same species in the same vegplotiid. This is usually the result of species occurring in different strata.
CA792_veg_data <- CA792_veg_data |> dplyr::group_by(siteiid, usiteid, vegplotiid, ecositeid, ecostateid, commphaseid, plantsym) |>
dplyr::summarise(pct_cover = sum(pct_cover, na.rm = TRUE), .groups = "drop")
Pivot data wider - preferred for ordination
grouping_columns <- c("siteiid", "usiteid", "vegplotiid", "ecositeid", "ecostateid", "commphaseid", "plantsym", "pct_cover")
CA792_veg_data_wide <- CA792_veg_data |> dplyr::select(dplyr::all_of(grouping_columns)) |>
tidyr::pivot_wider(names_from = plantsym, values_from = pct_cover)
Restrict to just one LRU. Restricting to one LRU make it easier to visualize all the ecosites with colors and provide a more concentrated focus on a smaller number of sites.
AD <- grep(pattern = "AD", x = CA792_veg_data_wide$ecositeid)
#AB <- grep(pattern = "AH", x = CA792_veg_data_wide$ecositeid)
CA792_veg_data_wide <- CA792_veg_data_wide[c(AD),]
Before removing rare species, I am going to assemble the environmental dataframe, as this is likely to have a considerable amount of missing data and will likely restrict what sites are included in the analysis more than rare species.
CA792_site_props <- aqp::site(CA792_pedon_summary$full_profile)
columns_of_interest <- c("siteiid", "slope_field", "elev_field", "d_L_lowest_mineral", "m_L_lowest_mineral",
"o_surf_thk", "full_prof_clay_wtd", "full_prof_ph_wtd",
"full_prof_frag_vol_tot_wtd")
colSums(!is.na(CA792_site_props[columns_of_interest])) / nrow(CA792_site_props) * 100
## siteiid slope_field
## 100.00000 98.43478
## elev_field d_L_lowest_mineral
## 100.00000 88.34783
## m_L_lowest_mineral o_surf_thk
## 88.34783 100.00000
## full_prof_clay_wtd full_prof_ph_wtd
## 98.86957 92.60870
## full_prof_frag_vol_tot_wtd
## 100.00000
I am going to remove d_L_lowest_mineral and m_L_lowest_mineral. Remember, any missing value in any column leads to the entire row (site) being removed. It is a difficult balance deciding what environmental variables to keep at the cost of losing sites.
CA792_site_props_reduced <- CA792_site_props[, columns_of_interest[!columns_of_interest %in% c("d_L_lowest_mineral", "m_L_lowest_mineral")]]
Joining in climate data
CA792_site_props_reduced$siteiid <- as.character(CA792_site_props_reduced$siteiid)
CA792_site_props_reduced <- CA792_site_props_reduced |> dplyr::left_join(CA792_clim)
Remove rows with any NA values
CA792_env_df <- CA792_site_props_reduced[complete.cases(CA792_site_props_reduced), ]
98 sites were lost, 1052 remain - not bad! Now we will reduce the veg data down to just those sites that are in the environmental dataframe.
Create species only dataframe - all group columns removed. This is common in the ordination process.
CA792_comm_df <- CA792_veg_data_wide[, c(1, 7:ncol(CA792_veg_data_wide))] |> as.data.frame()
rownames(CA792_comm_df) <- CA792_comm_df$siteiid
CA792_comm_df <- CA792_comm_df[,-1]
CA792_comm_df[is.na(CA792_comm_df)] <- 0
Remove rare species
frequency_threshold <- 10
CA792_comm_df <- CA792_comm_df[, colSums(CA792_comm_df > 0) >= frequency_threshold]
Remove sites that have not species after rare species are removed
CA792_comm_df <- CA792_comm_df[rowSums(CA792_comm_df != 0) > 0,]
There are 296 siteids in veg dataset, so we are more restricted by veg data and environment, generally speaking. We do need to now reduce the environmental dataframe to only include siteiids in the veg data.
# will improve this to avoid using rownames in this way in the future
CA792_env_df <- CA792_env_df |> dplyr::filter(siteiid %in% rownames(CA792_comm_df))
There are a couple of siteiids in the environmental data that have multiple rows. This is caused by lab data. For this purpose, I am just going to take the first record. For a real analysis, these should be inspected and the best record chosen.
CA792_env_df <- CA792_env_df |> dplyr::distinct(siteiid, .keep_all = TRUE)
Begin ordination
my_nmds <- vegan::metaMDS(CA792_comm_df, autotransform = TRUE)
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.2229301
## Run 1 stress 0.2238546
## Run 2 stress 0.2241674
## Run 3 stress 0.2242332
## Run 4 stress 0.2238695
## Run 5 stress 0.2240236
## Run 6 stress 0.2236443
## Run 7 stress 0.2241277
## Run 8 stress 0.224291
## Run 9 stress 0.2253973
## Run 10 stress 0.2237568
## Run 11 stress 0.223833
## Run 12 stress 0.2242867
## Run 13 stress 0.2233153
## ... Procrustes: rmse 0.007346952 max resid 0.06169108
## Run 14 stress 0.2235029
## Run 15 stress 0.2232925
## ... Procrustes: rmse 0.015232 max resid 0.1798811
## Run 16 stress 0.2246379
## Run 17 stress 0.222944
## ... Procrustes: rmse 0.006576951 max resid 0.05983765
## Run 18 stress 0.2241744
## Run 19 stress 0.2240259
## Run 20 stress 0.2240704
## *** Best solution was not repeated -- monoMDS stopping criteria:
## 19: stress ratio > sratmax
## 1: scale factor of the gradient < sfgrmin
Envfit
env.envfit <- vegan::envfit(my_nmds, CA792_env_df[,-1], choices = c(1,2), permutations = 9999, na.rm = TRUE)
species.envfit <- vegan::envfit(my_nmds, CA792_comm_df, permutation= 9999)
Species scores
species.scores <- as.data.frame(vegan::scores(my_nmds, "species")) |>
tibble::rownames_to_column("species") |> cbind(pval = species.envfit$vectors$pvals) |>
dplyr::filter(pval <= 0.05)
Site scores
site.scrs <- as.data.frame(vegan::scores(my_nmds, display = "sites")) |> tibble::rownames_to_column("siteiid") |>
dplyr::left_join(CA792_veg_data_wide |> dplyr::select(usiteid, siteiid, ecositeid))
Environmental scores
env.scores <- as.data.frame(vegan::scores(env.envfit, display = "vectors")) |>
tibble::rownames_to_column("env.variables")
env.scores <- cbind(env.scores, pval = env.envfit$vectors$pvals,
r = env.envfit$vectors$r)
sig.env.scores <- subset(env.scores, r >= 0.02)
Plotting ordination
# plotting
pointSize = 5
textSize = 10
spaceLegend = 0.3
bar <- ggplot2::ggplot(site.scrs, ggplot2::aes(x = NMDS1, y = NMDS2)) +
ggplot2::geom_point(ggplot2::aes(color = factor(ecositeid),
text = usiteid)) +
ggplot2::labs(title = "NMDS Ordination of SEKI NP Upper Montane Temp. Regime", color = "Ecosite") +
ggplot2::guides(shape = ggplot2::guide_legend(override.aes = list(size = pointSize)),
color = ggplot2::guide_legend(override.aes = list(size = pointSize))) +
ggplot2::theme(legend.title = ggplot2::element_text(size = textSize),
legend.text = ggplot2::element_text(size = textSize),
legend.key.size = ggplot2::unit(spaceLegend, "lines")) +
ggplot2::geom_text(data = species.scores, ggplot2::aes(x = NMDS1, y = NMDS2, label = species), size = 2) +
ggplot2::geom_segment(data = sig.env.scores, ggplot2::aes(x = 0, xend = NMDS1, y = 0,
yend = NMDS2),
arrow = ggplot2::arrow(length = ggplot2::unit(0.25, "cm")),
colour = "grey10", lwd = 0.3) +
ggplot2::geom_text(data = sig.env.scores, ggplot2::aes(x = NMDS1, y = NMDS2,
label = env.variables), cex = 4)
## Warning in ggplot2::geom_point(ggplot2::aes(color = factor(ecositeid), text =
## usiteid)): Ignoring unknown aesthetics: text