Differential abundance

Overview

Teaching: 0 min
Exercises: 30 min
Questions
  • Which features vary in abundance with categorical variables?

Objectives
  • use ALDEx2 and ANCOM to perform differential abundance analysis.

Load data

We will use the core FeatureTable object we saved earlier for this task.

load("data/pond_core_featuretable.Rdata")

Differential abundance

In addition to wondering which ASVs vary in abundance with continuous variables (e.g. salinity, dissolved oxygen), you probably also want to know which ASVs vary in abundance with categorical variables (e.g. Month, Fraction). For differential abundance testing, we’ll use ALDEx2 (ANOVA-Like Differential Expression) here. Another common method you’ll see in papers is ANCOM. Both are CoDA compatible, so long as your counts have been log-ratio transformed. ALDEx2 will do the transformation for us, so we can feed it an untransformed ASV table. Unfortunately, ALDEx2 doesn’t currently support testing more than two groups at once, so we’ll have to test some combinations by hand. Luckily, the feature table makes it really easy to select samples based on metadata.

# October by fraction
oct_one_ft  <-  pond_core_ft$keep_samples(Fraction == "1um" & Month ==
"October")
oct_two_ft <-pond_core_ft$keep_samples(Fraction == "0.2um" & Month ==
"October")
# November by fraction
nov_one_ft <- pond_core_ft$keep_samples(Fraction == "1um" & Month ==
"November")
nov_two_ft <- pond_core_ft$keep_samples(Fraction == "0.2um" & Month ==
"November")
# December by fraction
dec_one_ft <- pond_core_ft$keep_samples(Fraction == "1um" & Month ==
"December")
dec_two_ft <- pond_core_ft$keep_samples(Fraction == "0.2um" & Month ==
"December")
# Fraction
one_ft <- pond_core_ft$keep_samples(Fraction == "1um")
two_ft <- pond_core_ft$keep_samples(Fraction == "0.2um")
# October
oct_ft <- pond_core_ft$keep_samples(Month == "October")
# November
nov_ft <- pond_core_ft$keep_samples(Month == "November")
# December
dec_ft <- pond_core_ft$keep_samples(Month == "December")

Let’s start by comparing ASV abundances between the size fractions. We’ll stick the two fraction tables back together, make a conditions vector, and run ALDEx2.

# combine the ASV tables from two fraction feature tables
fraction <- rbind(one_ft$data, two_ft$data)
# make a conditions vector so aldex knows which samples belong in each
# category (there are 12 1 μm samples and 12 0.2 μm samples)
conds_fraction <- c(rep("1um", 12), rep("0.2um", 12))
# run aldex
aldex_fraction <- aldex(t(fraction), # aldex wants samples to be columns
conds_fraction,
test = "t", # use a t-test for diff. abundance
effect = TRUE) # calculate ASV effect size

If you look at the aldex_fraction table, you’ll notice the columns have sort of weird names. Here’s what they mean:

# plot the results
par(mfrow=c(1,2)) # a graphical parameter that sets up the following plots to line up on one row and in two columns
aldex.plot(aldex_fraction, type = "MA") # Bland-Altman style plot
aldex.plot(aldex_fraction, type = "MW") # Effect plot

The points on the plots can be interpreted in the same way. Each dot is an ASV. Red dots are ASVs with significantly different abundances in the two groups. Gray dots are ASVs that are highly abundant, but not significant. Black dots are rare ASVs that are not significant. The first plot (type = "MA") is a Bland-Altman or Tukey Mean Difference plot (different names are used in different fields). The x-axis is the centered log-ratio abundance of the ASVs and the y-axis is the median difference in abundance of the ASV in each size fraction. ASVs near the top of the plot are more abundant in the 1μm fraction and ASVs nearer the bottom are more abundant in the 0.2μm fraction. The second plot (type = "MW") is an effect plot. It has the same y-axis, but the x-axis is the “dispersion” of each ASV. The dispersion is really the estimated pooled standard deviation for each ASV. The gray dotted lines represent an estimated effect size of  1. Effect size is a measure of the strength of a relationship. The effect size metric used in ALDEx2 is more robust than the p-values, so the authors recommend examining ASVs based on effect size rather than p-value. Specifically, they recommend examining ASVs with an effect size of ≥1 or ≤-1 to avoid analyzing false positives. You can make a volcano plot of the data by plotting dif.btw (median difference between groups) on the x-axis and p-value on the y-axis. I’m not really a fan of volcano plots, but they are common (especially with ANCOM), so I’ll show you how to make one.

aldex_fraction %>%
ggplot()+
geom_point(aes(x = diff.btw,
y = we.eBH,
color = ifelse(effect >= 1 | effect <= -1, "Effect size ≥ 1 or ≤ -1", "Effect size  1 and ≥ -1"))) +
geom_hline(yintercept = 0.05,
color = "gray70") +
scale_color_manual(values = c("black", "red")) +
labs(color = "Effect size",
title = "Volcano plot",
x = "Median difference between groups",
y = "Expected BH adjusted p-value") +
theme_bw()

Again, each point represents one ASV. ASVs to the left of x = 0 are more abundant in the 0.2μm fraction and ASVs to the right are more abundant in the 1μm fraction. The gray line indicates a p-value of 0.05, so any ASVs below that line returned a significant p-value. ASVs are colored based on effect size, with red points indicating ASVs with an effect size of ≥ 1 or ≤ -1. As you can see, some of the ASVs that have significant p-values but small effect sizes are less likely to be of biological interest. If you wanted to identify the ASVs in red in the Volcano plot, you can subset the ALDEx2 table like this:

fraction_effective_asvs <- subset(aldex_fraction, effect >= 1 | effect
<= -1)
dim(fraction_effective_asvs) # displays the table dimensions

Looking at the table dimensions, we can see that there are 80 ASVs with an effect size

= 1 or <= -1 that would warrant further consideration. As practice, compare ASVs between months and between size fractions within months. For at least one comparison, construct the plots and do the subsetting. For Month, you will have to test three combinations: October-November, October-December, and November-December.

Key Points

  • Use clr transformation to transform the data