Summarizing ecological data
Nathan Roe
2025-01-02
Source:vignettes/summarizing_ecological_data.Rmd
summarizing_ecological_data.Rmd
About
ecositer
has a category of functions used for
summarizing, all of which use a “summarize_” prefix. QC should be
performed prior to summarizing - QCing
vegetation data.
Summarizing vegetation data by ecosite
QC_best_vegplot_for_site()
is used to summarize vegetation
data by ecosite, state, and phase. This function evaluates several
elements of data quality, and in some cases performs QC. An example of
performing QC is QC_aggregate_abundance()
, which is run by
default in this function. An example of evaluating elements of QC is
reporting if there are multiple state names associated with a stateid.
The messages produced by this function should be read and efforts should
be made to address the data quality conditions that are described in the
messages.
Load data
D104_veg <- ecositer::create_veg_df(from = "web_report",
ecositeid = "F022AD104CA")
Summarize data
D104_summary <- ecositer::summarize_veg_by_ecosite(veg_df = D104_veg)
## Warning in ecositer::QC_aggregate_abundance(veg_df): Multiple abundance columns
## are used in this dataset: akstratumcoverclasspct, speciescancovpct
## Warning -> There are sites with multiple vegetation plots. Reviewing these sites is preferable to automated selection. To view which sites have multiple vegetation plots:
## 'Your veg_df' |> dplyr::group_by(siteiid) |>
## dplyr::summarise(unique_vegplots = dplyr::n_distinct(vegplotiid)) |>
## dplyr::filter(unique_vegplots > 1)
## Note -> The following taxonomical changes have been made.
## Castanopsis sempervirens changed to Chrysolepis sempervirens
## Pinus latifolia changed to Pinus engelmannii
## Warning -> 17 records missing abundance. These records are removed from results.
Note the warnings produced, such as multiple state names being associated with the same stateid. The code to inspect such situations is also supplied.
D104_veg |> dplyr::group_by(siteiid) |>
dplyr::summarise(unique_vegplots = dplyr::n_distinct(vegplotiid)) |>
dplyr::filter(unique_vegplots > 1)
## # A tibble: 14 × 2
## siteiid unique_vegplots
## <chr> <int>
## 1 1146902 2
## 2 1169814 2
## 3 1169819 2
## 4 1169825 2
## 5 1169826 2
## 6 1169880 2
## 7 912091 2
## 8 912100 2
## 9 912136 2
## 10 912150 2
## 11 912154 2
## 12 949201 2
## 13 949214 2
## 14 968053 2
In this situation, the different state names are differences in capitalized letters. I have seen many other cases in NASIS where completely different state names are associated with the same stateid, so it is worth checking this. Also, for the sake standardization, it is worth having the names be exactly same (i.e., using the same capitalization).
To examine the summarized data, type D104_summary and put a ‘$’ at
the end of it. The data structure output by
summarize_veg_by_ecosite()
is called a list. Lists are
extremely flexible data structure that work well for storing complex,
heterogeneous data.
The “$” allows you to access the elements at that location of the
list. Here is an example of accessing a dataframe stored within the list
structure. In this example, I am using head()
to concisely
show the output, but I recommend using View()
to see the
full output and be able to sort by columns.
D104_summary$F022AD104CA$STM$state1$comm1$species_summary |> head()
## plantsym plantsciname plantnatvernm constancy mean median
## <char> <char> <char> <num> <num> <num>
## 1: ABCO Abies concolor white fir 87.5 31.0875 28.15
## 2: PIJE Pinus jeffreyi Jeffrey pine 62.5 6.3125 2.10
## 3: ARPA6 Arctostaphylos patula greenleaf manzanita 50.0 1.0250 0.05
## 4: CADE27 Calocedrus decurrens incense cedar 50.0 8.9125 0.05
## 5: GALIU Galium bedstraw 50.0 0.5375 0.05
## 6: PREM Prunus emarginata bitter cherry 50.0 0.6500 0.05
## min max sum 20th 80th sites_present sites_absent
## <num> <num> <num> <num> <num> <num> <num>
## 1: 0 84.0 248.7 0.14 55.06 7 1
## 2: 0 28.3 50.5 0.00 9.20 5 3
## 3: 0 7.0 8.2 0.00 0.64 4 4
## 4: 0 47.0 71.3 0.00 12.88 4 4
## 5: 0 3.0 4.3 0.00 0.70 4 4
## 6: 0 4.0 5.2 0.00 0.64 4 4
Summarizing pedon level soil data When analyzing ecological communities, it is often helpful to work with summarized soil properties that are not directly accessible from pedon data. Properties of interest could include depth, texture, color, rock fragments, pH, etc. Users may be interested in these properties within particular depth ranges, across the entire soil profile, or within a particular master horizon. An example could be weighted average clay percentage in the first 100 cms or thickness of the surface O horizons (i.e., using O master horizon).
summarize_pedon_soil_properties()
generates a variety of
summarized properties that are useful for ecological analysis. I would
like to develop this function further, so if you have ideas, please
share!
For this example, I am using an example dataset from CA792, the Sequoia and Kings Canyon National Park Soil Survey, from the ecositer.data package.
data(CA792_pedon_data, package = "ecositer.data")
This data can be used in the r_object
argument of
summarize_pedon_soil_properties()
. Alternatively, you could
use the SS
or static_location
arguments to
access NASIS data elsewhere.
CA792_pedon_summary <- ecositer::summarize_pedon_soil_properties(SS = FALSE,
r_object = CA792_pedon_data,
byDepth = list(c(0, 25), c(0, 50)))
## Warning in max(h[[bottom]][no.contact.idx], na.rm = TRUE): no non-missing
## arguments to max; returning -Inf
## Warning in aqp::texcl_to_ssc(x$texcl): not all the user supplied texcl values
## match the lookup table
## Warning in aqp::texcl_to_ssc(x$texcl): not all the user supplied texcl values
## match the lookup table
## Warning in aqp::texcl_to_ssc(x$texcl): not all the user supplied texcl values
## match the lookup table
aqp::site(CA792_pedon_summary[["full_profile"]])[, c("siteiid", "d_L_lowest_mineral", "o_surf_thk", "b_clay_wtd", "full_prof_ph_wtd")] |> head()
## siteiid d_L_lowest_mineral o_surf_thk b_clay_wtd full_prof_ph_wtd
## 1 1076986 51.40723 5 15.411765 6.210000
## 2 1076988 51.41491 5 14.564815 6.092500
## 3 1076989 71.55664 5 7.684211 5.340800
## 4 1076990 NA 10 12.533333 5.361905
## 5 1076991 51.40723 8 34.000000 6.162000
## 6 1076992 51.40723 5 27.305085 5.693333
Summarizing site climate data Climate data is a
critical factor driving the distribution of ecological communities and
individual species.
ecositerSpatial::site_prism_annual_normals()
can be used to
extract PRISM climate properties associated with specific sites.
CA792_clim <- ecositerSpatial::site_prism_annual_normals(site_df = CA792_veg_data),
prism_dir = "C:/Users/Nathan.Roe/Documents/PRISM_R/annual",
id = "siteiid",
x = "utmeasting",
y = "utmnorthing",
EPSG = "EPSG:32611")
head(CA792_clim)
## siteiid ppt tmean tmin tmax tdmean vpdmin vpdmax
## 1 1093605 920.948 10.57 4.8099 16.3299 -0.35 3.81 14.71
## 2 1093606 932.602 10.01 5.3700 14.6500 -0.06 3.99 12.56
## 3 1093607 948.341 9.82 4.7699 14.8700 -0.65 3.86 13.17
## 4 1093608 1080.049 9.34 4.5199 14.1600 -0.84 3.78 12.39
## 5 1093609 1309.006 7.54 1.9600 13.1300 -3.29 3.29 12.26
## 6 1093611 1309.006 7.54 1.9600 13.1300 -3.29 3.29 12.26