The goal of this analysis was to visualize the phenology of wild-bee communities in Maryland, Delaware, and DC. Previously, I plotted the proportional abundance over time for the most abundant genera to separate bees into spring (before May 15th) and summer/fall (after May 15th) communities. Here, I wanted to verify that May 15th is a decent split point and show why I chose to analyze only two 'seasons.'
To accomplish this, I used NMDS to visualize the similarities in bee community composition according to sampling date. We'd expect that samples close together in time will be more similar, but how similar? Do specific months or seasons cluster together? When are the major shifts in composition of wild-bee communities?
For all these analyses, I used the proportional abundance of each wild-bee species. I selected proporational abundance rather than raw abundance because there was not equal sampling effort across the dataset.
Also note, that the original data were sampled over an 11 year period (2003-2013), but here I'm taking only the day and month component of the sampling date, pretending that all bees were sampled in the same year. This approach ignores phenological differences between years (year with early spring vs. late spring), which could explain some of the variation within months.
library(reshape2); library(dplyr); library(data.table); library(lubridate); library(tidyr)
mdde <- read.csv('./data/Droege_MDDE_cleaned.csv')
mdde <- dplyr::filter(mdde, name != 'Apis mellifera')
I'm using the midpoint between the beginning and the end of the sampling period, represented by the day of year.
mdde$dateplot <- as.Date(mdde$mid_DOY, origin="2017-12-31")
I trimmed off the very beginning and end of the season because the proportional abundance values are very strange, because total number of bees caught was so low. I summarized proportional abundance per week for a similar reason. Some individual samples were only a few bees, but aggregating samples captured in the same week minimizes this issue.
Rather than summing by week, a more robust way to do this would be to model abundance over time with a smoothing function, like a generalized additive model (GAM). But, for now, I'll use the week aggregate abundance.
toplot <- dplyr::select(mdde, SPECIMEN, name, Genus, Abundance, week2, dateplot) %>%
dplyr::filter(dateplot > '2018-03-13' & dateplot < '2018-11-09') %>%
dplyr::group_by(Genus, name, week2) %>%
dplyr::mutate(SpeciesWeekAbund = sum(Abundance)) %>%
dplyr::ungroup() %>%
dplyr::group_by(week2) %>%
dplyr::mutate(WeekAbund = sum(Abundance), PropAbund = SpeciesWeekAbund/WeekAbund) %>%
dplyr::ungroup() %>%
dplyr::group_by(Genus, name, week2, dateplot) %>%
summarise(PropAbund = max(PropAbund), WeekAbund = max(WeekAbund))
head(toplot)
## # A tibble: 6 x 6
## # Groups: Genus, name, week2 [5]
## Genus name week2 dateplot PropAbund WeekAbund
## <chr> <chr> <int> <date> <dbl> <int>
## 1 Agapostemon Agapostemon sericeus 13 2018-04-01 0.00113 2655
## 2 Agapostemon Agapostemon sericeus 15 2018-04-13 0.000160 6243
## 3 Agapostemon Agapostemon sericeus 16 2018-04-16 0.000204 4903
## 4 Agapostemon Agapostemon sericeus 17 2018-04-27 0.000201 4973
## 5 Agapostemon Agapostemon sericeus 19 2018-05-07 0.000702 4276
## 6 Agapostemon Agapostemon sericeus 19 2018-05-10 0.000702 4276
I filled missing data with zeros, assuming, if a species was not present, that it was a true absence. There are very cool models that account for detection probability (aka what is the probability that an undetected species actually not there?), but that's beyond our scope (plus I don't really know much about the nuts and bolts of those analyses)
I assigned the wide dataframe rownmaes to be the date variable for visualization purposes.
wide_reverse <- dplyr::ungroup(toplot) %>%
dplyr::select(-week2, -Genus, -WeekAbund) %>%
tidyr::pivot_wider(names_from=name, values_from = PropAbund)
wide_reverse[is.na(wide_reverse)] <- 0
dates <- wide_reverse$dateplot
wide_reverse <- wide_reverse[, !names(wide_reverse) %in% c('dateplot', 'Genus')]
rownames(wide_reverse) <- paste(as.character(lubridate::month(dates, label=T)),
as.character(lubridate::mday(dates)), sep=".")
My ordination solution is not converging after 500 iterations, but the stress value is OK, so I'm going to forge ahead. I tried both with and without transformations and didn't see very much difference, so I chose not to transform the data.
ord <- vegan::metaMDS(wide_reverse, distance='bray', trymax=50, autotransform = F, halfchange=T)
ord
#plot NMDS results, all habitat types
vegan::ordiplot(ord, display="sites", type="text")
vegan::ordiplot(ord, display="species", type="text")
Yikes. Maybe there is some separation by date, but we can't see much of anything, especially in the second figure. We need a more informative, prettier version.
This package was written for molecular data, so requires some variable names that don't make sense. But, with the data in the right format, it's very handy to make prettier ordination plots.
The amp_ordinate function requires integer data, so I multiplied my proportional abundance data by 100,000. This code chunk also reorganizes the data into the species (rows) x sample (column) format we need. The last weird quirk is that this function requires kingdom, phylum, class, order, and family metadata for each species. I made up fake data for these columns.
library(ampvis2); library(ggplot2)
#get species data in correct format
speciesdate <- dplyr::mutate(toplot, rounded_abund = round(PropAbund*100000),
Genus_species = as.factor(name),
Date = as.factor(dateplot)) %>%
dplyr::ungroup() %>%
dplyr::select(-week2, -WeekAbund, -dateplot, -PropAbund) %>%
tidyr::spread(key=Date, value=rounded_abund) %>%
replace(is.na(.), 0) %>%
dplyr::mutate(Species=Genus_species, OTU=Genus_species) %>%
dplyr::select(-name, -Genus_species) %>%
dplyr::mutate(Kingdom = 'Animals', Phylum = "A", Class= "B", Order="C", Family='bees') %>%
dplyr::select(Kingdom, Phylum, Class, Order, Family, Genus, Species, OTU, dplyr::everything())
#reorganize columns to be in more logical order
a <- speciesdate[,1:7]
b <- speciesdate[,8:length(speciesdate)]
speciesdate <- cbind(b,a)
In our case, metadata is just the sampling date and a variable indicating the sampling month. I used the sampling month to draw polygons around sampling dates in the same month.
md <- dplyr::ungroup(toplot) %>%
dplyr::select(dateplot) %>%
dplyr::filter(!duplicated(dateplot)) %>%
dplyr::mutate(month = lubridate::month(dateplot, label=T)) %>%
dplyr::rename(SampleID=dateplot)
This function has lots of options to change the appearance of the output. Here's I played with the option to draw polygons around groups of points, label and colorcode the polgyons.
amp_ordinate also accepts arguements fom metaMDS, like 'trymax' which sets the maximum of iterations to try if the ordination does not converge.
ampdata <- amp_load(otutable = speciesdate, metadata = md)
t <- amp_ordinate(data=ampdata, type='NMDS', distmeasure='bray', transform='none',filter_species = 0,
species_plot = F, species_label_taxonomy = 'OTU',
sample_colorframe = 'month',
sample_colorframe_label="month",
sample_colorframe_label_size=5,
sample_color_by='month', #sample_point_size=4,
species_nlabels = 0, print_caption = T,
detailed_output = T,
trymax=250)
amp_ordinate results can be customized with standard ggplot2 code. Here, I used a custom color palette. I generate them with Colorgorical (colorgorical.com), it's quite handy to make palettes with more than 4-5 colors.
colors <- c("#208eb7", "#fb0998", "#298837", "#732576", "#2580fe", "#474556", "#b07fbd", "#4115d0", "#89975b")
t$plot + scale_color_manual(values= colors) +
scale_fill_manual(values= colors) +
guides(color=guide_legend(ncol=1)) +
theme_classic(base_size=16)
#save plot
ggplot2::ggsave(filename='./figures/BeeComm_NMDS_byMonth.svg', width=6.5, height=5)
amp_ordinate also allows the user to label a specified number of the most important species. Here, I chose 20, solely based on ability to read the labels. The only downside to this option is that amp_ordinate is recalculating the ordination results, so the polygon shapes don't exactly match the previous figure.
tlab <- amp_ordinate(data=ampdata, type='NMDS', distmeasure='bray', transform='none',
filter_species = 0,
species_plot = T, species_label_taxonomy = 'OTU',
sample_colorframe = 'month',
#sample_colorframe_label="month",
sample_color_by='month',
species_nlabels = 18,
species_label_size = 3,
print_caption = F,
detailed_output = T,
trymax=250)
tlab$plot + scale_color_manual(values= colors) +
scale_fill_manual(values= colors) +
guides(color=guide_legend(ncol=1)) +
theme_classic(base_size=16)
#save plot
ggplot2::ggsave(filename='./figures/BeeComm_NMDS_byMonth_SpeciesNames.svg', width=6.5, height=5)
I was also interested in looking at the specific dates in May because some are close to Mar/April and some more similar to the summer months. I couldn't figure out any way to just label the dates in May, but amp_ordinate does have a very cool option to generate an interactive plot. Then, the user can hover over any point and see the associated meta-data, which for our datset includes the sampling date.
tlabdate <- amp_ordinate(data=ampdata, type='NMDS', distmeasure='bray', transform='none',
filter_species = 0,
species_plot = F,
sample_colorframe = 'month',
sample_color_by='month',
sample_plotly = T,
print_caption = F,
detailed_output = T,
trymax=250)
tlabdate