Non-metric dimensional scaling (NMDS) analysis of wild bee phenology

The goal of this analysis was to visualize the phenology of wild-bee communities in Maryland, Delaware, and DC. Previously, I plotted the proportional abundance over time for the most abundant genera to separate bees into spring (before May 15th) and summer/fall (after May 15th) communities. Here, I wanted to verify that May 15th is a decent split point and show why I chose to analyze only two 'seasons.'

To accomplish this, I used NMDS to visualize the similarities in bee community composition according to sampling date. We'd expect that samples close together in time will be more similar, but how similar? Do specific months or seasons cluster together? When are the major shifts in composition of wild-bee communities?

For all these analyses, I used the proportional abundance of each wild-bee species. I selected proporational abundance rather than raw abundance because there was not equal sampling effort across the dataset.

Also note, that the original data were sampled over an 11 year period (2003-2013), but here I'm taking only the day and month component of the sampling date, pretending that all bees were sampled in the same year. This approach ignores phenological differences between years (year with early spring vs. late spring), which could explain some of the variation within months.

Load necessary R libraries and bee observation dataset

library(reshape2); library(dplyr); library(data.table); library(lubridate); library(tidyr)

mdde <- read.csv('./data/Droege_MDDE_cleaned.csv')

Remove observations of honey bees (Apis mellifera L.)

mdde <- dplyr::filter(mdde, name != 'Apis mellifera')

Make new 'date' variable that assigns all sampling dates to the same year

I'm using the midpoint between the beginning and the end of the sampling period, represented by the day of year.

mdde$dateplot <- as.Date(mdde$mid_DOY, origin="2017-12-31")

Trim phenology and calculate proportional abundance per species per week

I trimmed off the very beginning and end of the season because the proportional abundance values are very strange, because total number of bees caught was so low. I summarized proportional abundance per week for a similar reason. Some individual samples were only a few bees, but aggregating samples captured in the same week minimizes this issue.

Rather than summing by week, a more robust way to do this would be to model abundance over time with a smoothing function, like a generalized additive model (GAM). But, for now, I'll use the week aggregate abundance.

toplot <- dplyr::select(mdde, SPECIMEN, name, Genus, Abundance, week2, dateplot) %>%
  dplyr::filter(dateplot > '2018-03-13' & dateplot < '2018-11-09') %>%
  dplyr::group_by(Genus, name, week2) %>%
  dplyr::mutate(SpeciesWeekAbund = sum(Abundance)) %>%
  dplyr::ungroup() %>%
  dplyr::group_by(week2) %>%
  dplyr::mutate(WeekAbund = sum(Abundance), PropAbund = SpeciesWeekAbund/WeekAbund) %>%
  dplyr::ungroup() %>%
  dplyr::group_by(Genus, name, week2, dateplot) %>%
  summarise(PropAbund = max(PropAbund), WeekAbund = max(WeekAbund))

head(toplot)
## # A tibble: 6 x 6
## # Groups:   Genus, name, week2 [5]
##   Genus       name                 week2 dateplot   PropAbund WeekAbund
##   <chr>       <chr>                <int> <date>         <dbl>     <int>
## 1 Agapostemon Agapostemon sericeus    13 2018-04-01  0.00113       2655
## 2 Agapostemon Agapostemon sericeus    15 2018-04-13  0.000160      6243
## 3 Agapostemon Agapostemon sericeus    16 2018-04-16  0.000204      4903
## 4 Agapostemon Agapostemon sericeus    17 2018-04-27  0.000201      4973
## 5 Agapostemon Agapostemon sericeus    19 2018-05-07  0.000702      4276
## 6 Agapostemon Agapostemon sericeus    19 2018-05-10  0.000702      4276

Make wide version of dataset in correct format for metaMDS function (vegan package)

I filled missing data with zeros, assuming, if a species was not present, that it was a true absence. There are very cool models that account for detection probability (aka what is the probability that an undetected species actually not there?), but that's beyond our scope (plus I don't really know much about the nuts and bolts of those analyses)

I assigned the wide dataframe rownmaes to be the date variable for visualization purposes.

wide_reverse <- dplyr::ungroup(toplot) %>%
  dplyr::select(-week2, -Genus, -WeekAbund) %>%
  tidyr::pivot_wider(names_from=name,  values_from = PropAbund)

wide_reverse[is.na(wide_reverse)] <- 0

dates <- wide_reverse$dateplot
wide_reverse <- wide_reverse[, !names(wide_reverse) %in% c('dateplot', 'Genus')]
rownames(wide_reverse) <- paste(as.character(lubridate::month(dates, label=T)), 
                                 as.character(lubridate::mday(dates)), sep=".")

Run NMDS ordination and look at ordination plots created with metaMDS

My ordination solution is not converging after 500 iterations, but the stress value is OK, so I'm going to forge ahead. I tried both with and without transformations and didn't see very much difference, so I chose not to transform the data.

ord <- vegan::metaMDS(wide_reverse, distance='bray', trymax=50, autotransform = F, halfchange=T)
ord

#plot NMDS results, all habitat types
vegan::ordiplot(ord, display="sites", type="text")

vegan::ordiplot(ord, display="species", type="text")

Yikes. Maybe there is some separation by date, but we can't see much of anything, especially in the second figure. We need a more informative, prettier version.

Try amp_ordinate (ampvis2 package) to make a prettier ordination plot

This package was written for molecular data, so requires some variable names that don't make sense. But, with the data in the right format, it's very handy to make prettier ordination plots.

The amp_ordinate function requires integer data, so I multiplied my proportional abundance data by 100,000. This code chunk also reorganizes the data into the species (rows) x sample (column) format we need. The last weird quirk is that this function requires kingdom, phylum, class, order, and family metadata for each species. I made up fake data for these columns.

library(ampvis2); library(ggplot2)

#get species data in correct format
speciesdate <- dplyr::mutate(toplot, rounded_abund = round(PropAbund*100000),
                             Genus_species = as.factor(name),
                             Date = as.factor(dateplot)) %>%
  dplyr::ungroup() %>%
  dplyr::select(-week2, -WeekAbund, -dateplot, -PropAbund) %>%
  tidyr::spread(key=Date, value=rounded_abund) %>%
  replace(is.na(.), 0) %>%
  dplyr::mutate(Species=Genus_species, OTU=Genus_species) %>%
  dplyr::select(-name, -Genus_species) %>%
  dplyr::mutate(Kingdom = 'Animals', Phylum = "A", Class= "B", Order="C", Family='bees') %>%
  dplyr::select(Kingdom, Phylum, Class, Order, Family, Genus, Species, OTU, dplyr::everything())

#reorganize columns to be in more logical order
a <- speciesdate[,1:7]
b <- speciesdate[,8:length(speciesdate)]

speciesdate <- cbind(b,a)

Make meta-data data frame for amp_ordinate function

In our case, metadata is just the sampling date and a variable indicating the sampling month. I used the sampling month to draw polygons around sampling dates in the same month.

md <- dplyr::ungroup(toplot) %>%
  dplyr::select(dateplot) %>%
  dplyr::filter(!duplicated(dateplot)) %>%
  dplyr::mutate(month = lubridate::month(dateplot, label=T)) %>%
  dplyr::rename(SampleID=dateplot)

Run amp_ordinate function

This function has lots of options to change the appearance of the output. Here's I played with the option to draw polygons around groups of points, label and colorcode the polgyons.

amp_ordinate also accepts arguements fom metaMDS, like 'trymax' which sets the maximum of iterations to try if the ordination does not converge.

ampdata <- amp_load(otutable = speciesdate, metadata = md)

t <- amp_ordinate(data=ampdata, type='NMDS', distmeasure='bray', transform='none',filter_species = 0,
                  species_plot = F, species_label_taxonomy = 'OTU',
                  sample_colorframe = 'month',
                  sample_colorframe_label="month",
                  sample_colorframe_label_size=5,
                  sample_color_by='month', #sample_point_size=4,
                  species_nlabels = 0, print_caption = T,
                  detailed_output = T,
                  trymax=250)

Plot ordination results showing month groupings

amp_ordinate results can be customized with standard ggplot2 code. Here, I used a custom color palette. I generate them with Colorgorical (colorgorical.com), it's quite handy to make palettes with more than 4-5 colors.

colors <- c("#208eb7", "#fb0998", "#298837", "#732576", "#2580fe", "#474556", "#b07fbd", "#4115d0", "#89975b")

t$plot + scale_color_manual(values= colors) + 
  scale_fill_manual(values= colors) + 
  guides(color=guide_legend(ncol=1)) +
  theme_classic(base_size=16)

#save plot
ggplot2::ggsave(filename='./figures/BeeComm_NMDS_byMonth.svg', width=6.5, height=5)

Run amp_ordinate function to show some species

amp_ordinate also allows the user to label a specified number of the most important species. Here, I chose 20, solely based on ability to read the labels. The only downside to this option is that amp_ordinate is recalculating the ordination results, so the polygon shapes don't exactly match the previous figure.

tlab <- amp_ordinate(data=ampdata, type='NMDS', distmeasure='bray', transform='none', 
                     filter_species = 0,
                     species_plot = T, species_label_taxonomy = 'OTU',
                     sample_colorframe = 'month',
                     #sample_colorframe_label="month",
                     sample_color_by='month',
                     species_nlabels = 18, 
                     species_label_size = 3,
                     print_caption = F,
                     detailed_output = T,
                     trymax=250)

tlab$plot + scale_color_manual(values= colors) + 
  scale_fill_manual(values= colors) + 
  guides(color=guide_legend(ncol=1)) +
  theme_classic(base_size=16)

#save plot
ggplot2::ggsave(filename='./figures/BeeComm_NMDS_byMonth_SpeciesNames.svg', width=6.5, height=5)

Use amp_ordinate to make cool interactive ordination plot

I was also interested in looking at the specific dates in May because some are close to Mar/April and some more similar to the summer months. I couldn't figure out any way to just label the dates in May, but amp_ordinate does have a very cool option to generate an interactive plot. Then, the user can hover over any point and see the associated meta-data, which for our datset includes the sampling date.

tlabdate <- amp_ordinate(data=ampdata, type='NMDS', distmeasure='bray', transform='none', 
                     filter_species = 0,
                     species_plot = F,
                     sample_colorframe = 'month',
                     sample_color_by='month',
                     sample_plotly = T,
                     print_caption = F,
                     detailed_output = T,
                     trymax=250)

tlabdate