Summarizing ecological data

About

ecositer has a category of functions used for summarizing, all of which use a “summarize_” prefix. QC should be performed prior to summarizing - QCing vegetation data.

Summarizing vegetation data by ecosite QC_best_vegplot_for_site() is used to summarize vegetation data by ecosite, state, and phase. This function evaluates several elements of data quality, and in some cases performs QC. An example of performing QC is QC_aggregate_abundance(), which is run by default in this function. An example of evaluating elements of QC is reporting if there are multiple state names associated with a stateid. The messages produced by this function should be read and efforts should be made to address the data quality conditions that are described in the messages.

Load data

D104_veg <- ecositer::create_veg_df(from = "web_report",
                                 ecositeid = "F022AD104CA")

Summarize data

D104_summary <- ecositer::summarize_veg_by_ecosite(veg_df = D104_veg)

## Warning in ecositer::QC_aggregate_abundance(veg_df): Multiple abundance columns
## are used in this dataset: akstratumcoverclasspct, speciescancovpct

## Warning -> There are sites with multiple vegetation plots. Reviewing these sites is preferable to automated selection. To view which sites have multiple vegetation plots:
##               'Your veg_df' |> dplyr::group_by(siteiid) |>
##                                dplyr::summarise(unique_vegplots = dplyr::n_distinct(vegplotiid)) |>
##                                dplyr::filter(unique_vegplots > 1)

## Note -> The following taxonomical changes have been made.

## Castanopsis sempervirens changed to Chrysolepis sempervirens 
## Pinus latifolia changed to Pinus engelmannii

## Warning -> 17 records missing abundance. These records are removed from results.

Note the warnings produced, such as multiple state names being associated with the same stateid. The code to inspect such situations is also supplied.

D104_veg |> dplyr::group_by(siteiid) |>
  dplyr::summarise(unique_vegplots = dplyr::n_distinct(vegplotiid)) |>
  dplyr::filter(unique_vegplots > 1)

## # A tibble: 14 × 2
##    siteiid unique_vegplots
##    <chr>             <int>
##  1 1146902               2
##  2 1169814               2
##  3 1169819               2
##  4 1169825               2
##  5 1169826               2
##  6 1169880               2
##  7 912091                2
##  8 912100                2
##  9 912136                2
## 10 912150                2
## 11 912154                2
## 12 949201                2
## 13 949214                2
## 14 968053                2

In this situation, the different state names are differences in capitalized letters. I have seen many other cases in NASIS where completely different state names are associated with the same stateid, so it is worth checking this. Also, for the sake standardization, it is worth having the names be exactly same (i.e., using the same capitalization).

To examine the summarized data, type D104_summary and put a ‘$’ at the end of it. The data structure output by summarize_veg_by_ecosite() is called a list. Lists are extremely flexible data structure that work well for storing complex, heterogeneous data.

The “$” allows you to access the elements at that location of the list. Here is an example of accessing a dataframe stored within the list structure. In this example, I am using head() to concisely show the output, but I recommend using View() to see the full output and be able to sort by columns.

D104_summary$F022AD104CA$STM$state1$comm1$species_summary |> head()

##    plantsym          plantsciname       plantnatvernm constancy    mean median
##      <char>                <char>              <char>     <num>   <num>  <num>
## 1:     ABCO        Abies concolor           white fir      87.5 31.0875  28.15
## 2:     PIJE        Pinus jeffreyi        Jeffrey pine      62.5  6.3125   2.10
## 3:    ARPA6 Arctostaphylos patula greenleaf manzanita      50.0  1.0250   0.05
## 4:   CADE27  Calocedrus decurrens       incense cedar      50.0  8.9125   0.05
## 5:    GALIU                Galium            bedstraw      50.0  0.5375   0.05
## 6:     PREM     Prunus emarginata       bitter cherry      50.0  0.6500   0.05
##      min   max   sum  20th  80th sites_present sites_absent
##    <num> <num> <num> <num> <num>         <num>        <num>
## 1:     0  84.0 248.7  0.14 55.06             7            1
## 2:     0  28.3  50.5  0.00  9.20             5            3
## 3:     0   7.0   8.2  0.00  0.64             4            4
## 4:     0  47.0  71.3  0.00 12.88             4            4
## 5:     0   3.0   4.3  0.00  0.70             4            4
## 6:     0   4.0   5.2  0.00  0.64             4            4

Summarizing pedon level soil data When analyzing ecological communities, it is often helpful to work with summarized soil properties that are not directly accessible from pedon data. Properties of interest could include depth, texture, color, rock fragments, pH, etc. Users may be interested in these properties within particular depth ranges, across the entire soil profile, or within a particular master horizon. An example could be weighted average clay percentage in the first 100 cms or thickness of the surface O horizons (i.e., using O master horizon).

summarize_pedon_soil_properties() generates a variety of summarized properties that are useful for ecological analysis. I would like to develop this function further, so if you have ideas, please share!

For this example, I am using an example dataset from CA792, the Sequoia and Kings Canyon National Park Soil Survey, from the ecositer.data package.

data(CA792_pedon_data, package = "ecositer.data")

This data can be used in the r_object argument of summarize_pedon_soil_properties(). Alternatively, you could use the SS or static_location arguments to access NASIS data elsewhere.

CA792_pedon_summary <- ecositer::summarize_pedon_soil_properties(SS = FALSE,
                                           r_object = CA792_pedon_data,
                                           byDepth = list(c(0, 25), c(0, 50)))

## Warning in max(h[[bottom]][no.contact.idx], na.rm = TRUE): no non-missing
## arguments to max; returning -Inf

## Warning in aqp::texcl_to_ssc(x$texcl): not all the user supplied texcl values
## match the lookup table
## Warning in aqp::texcl_to_ssc(x$texcl): not all the user supplied texcl values
## match the lookup table
## Warning in aqp::texcl_to_ssc(x$texcl): not all the user supplied texcl values
## match the lookup table

aqp::site(CA792_pedon_summary[["full_profile"]])[, c("siteiid", "d_L_lowest_mineral", "o_surf_thk", "b_clay_wtd", "full_prof_ph_wtd")] |> head()

##   siteiid d_L_lowest_mineral o_surf_thk b_clay_wtd full_prof_ph_wtd
## 1 1076986           51.40723          5  15.411765         6.210000
## 2 1076988           51.41491          5  14.564815         6.092500
## 3 1076989           71.55664          5   7.684211         5.340800
## 4 1076990                 NA         10  12.533333         5.361905
## 5 1076991           51.40723          8  34.000000         6.162000
## 6 1076992           51.40723          5  27.305085         5.693333

Summarizing site climate data Climate data is a critical factor driving the distribution of ecological communities and individual species. ecositerSpatial::site_prism_annual_normals() can be used to extract PRISM climate properties associated with specific sites.

CA792_clim <- ecositerSpatial::site_prism_annual_normals(site_df = CA792_veg_data),
                                           prism_dir = "C:/Users/Nathan.Roe/Documents/PRISM_R/annual",
                                           id = "siteiid",
                                           x = "utmeasting",
                                           y = "utmnorthing",
                                           EPSG = "EPSG:32611")

head(CA792_clim)

##   siteiid      ppt tmean   tmin    tmax tdmean vpdmin vpdmax
## 1 1093605  920.948 10.57 4.8099 16.3299  -0.35   3.81  14.71
## 2 1093606  932.602 10.01 5.3700 14.6500  -0.06   3.99  12.56
## 3 1093607  948.341  9.82 4.7699 14.8700  -0.65   3.86  13.17
## 4 1093608 1080.049  9.34 4.5199 14.1600  -0.84   3.78  12.39
## 5 1093609 1309.006  7.54 1.9600 13.1300  -3.29   3.29  12.26
## 6 1093611 1309.006  7.54 1.9600 13.1300  -3.29   3.29  12.26

Nathan Roe

2025-01-02