Compiled: 07-03-2024

Source: vignettes/basic_tutorial.Rmd

Data-driven models are most useful when they are generalizable across different datasets. Thinking about co-expression networks from this perspective, we need a way to quantify and test the conservation of co-expression patterns identified in one dataset across external datasets. In this tutorial, we demonstrate module preservation analysis with hdWGCNA, a statistical framewoerk that allows us to test the degree to which co-expression modules identified in one dataset are “preserved” in another dataset. This tutorial builds off of a previous tutorial, where we projected the co-expression modules from a reference dataset to a query dataset.

First we must load the data and the required libraries:

# single-cell analysis package
library(Seurat)

# plotting and data science packages
library(tidyverse)
library(cowplot)
library(patchwork)

# co-expression network analysis packages:
library(WGCNA)
library(hdWGCNA)

# network analysis & visualization package:
library(igraph)

# using the cowplot theme for ggplot
theme_set(theme_cowplot())

# set random seed for reproducibility
set.seed(12345)

# load the Zhou et al snRNA-seq dataset
seurat_ref <- readRDS('data/Zhou_control.rds')

# load the Morabito & Miyoshi 2021 snRNA-seq dataset
seurat_query <- readRDS(file=paste0(data_dir, 'Morabito_Miyoshi_2021_control.rds'))

Module preservation analysis

In their 2011 paper titled “Is My Network Module Preserved and Reproducible?”, Langfelder et al discuss statistical methods for module preservation analysis in co-expression network analysis. Notably, module preservation analysis can be used to assess the reproducibility of co-expression networks, but it can also be used for specific biological analyses, for example to identify which modules are significantly preserved across different disease conditions, tissues, developmental stages, or even evolutionary time.

The first step in module preservation analysis is to project modules from a reference to a query dataset, which is explained in more detail in the previous tutorial.

seurat_query <- ProjectModules(
  seurat_obj = seurat_query,
  seurat_ref = seurat_ref,
  wgcna_name = "tutorial",
  wgcna_name_proj="projected",
  assay="RNA" # assay for query dataset
)

Next, we set up the expression matrices for the query and reference datasets. Either the single-cell or the metacell gene expression matrices can be used here by setting the use_metacells flag in SetDatExpr. In general we recommend using the same matrix that was used to construct the network for the reference (in this case, the metacell matrix). If you have not already constructed metacells in the query dataset, you can do that prior to module preservation analysis.

See code to consruct metacells in the query dataset
# construct metacells:
seurat_query <- MetacellsByGroups(
  seurat_obj = seurat_query,
  group.by = c("cell_type", "Sample"),
  k = 25,
  max_shared = 12,
  reduction = 'harmony',
  ident.group = 'cell_type'
)
seurat_query <- NormalizeMetacells(seurat_query)
# set expression matrix for reference dataset
seurat_ref <- SetDatExpr(
  seurat_ref,
  group_name = "INH",
  group.by = "cell_type"
)

# set expression matrix for query dataset:
seurat_query <- SetDatExpr(
  seurat_query,
  group_name = "INH",
  group.by = "cell_type"
)

Now we can run the ModulePreservation function. Note that this function does take a while to run since it is a permutation test, but it can be sped up by lowering the n_permutations parameter, but we do not recommend any lower than 100 permutations. Also note that the parallel option currently does not work. For demonstration purposes, we ran the code for this tutorial using only 20 permutations so it would run quickly.

ModulePreservation takes a name parameter, which is used to store the results of this module perturbation test.

# run module preservation function
seurat_query <- ModulePreservation(
  seurat_query,
  seurat_ref = seurat_ref,
  name="Zhou-INH",
  verbose=3,
  n_permutations=250 # n_permutations=20 used for the tutorial
)

Visualize preservation stats

The module perturbation test produces a number of different preservation statistics, which you can inspect in the following tables.

# get the module preservation table
mod_pres <- GetModulePreservation(seurat_query, "Zhou-INH")$Z
obs_df <- GetModulePreservation(seurat_query, "Zhou-INH")$obs

In the past we linked to the original WGCNA documentation website for more detailed explanations of thes statistics, but unfortunately at the time of writing this tutorial the website has been taken offline. Please refer to the original publication in PLOS Computational Biology for further explanations.

hdWGCNA includes the function PlotModulePreservation to visualize the statistics we computed in the previous section. This function generates a scatter plot showing the module size versus the module preservation stats. The summary statistics aggregate the individual preservation and quality statistics into a single metric so we generally recommend primarily interpreting the results with these plots.

plot_list <- PlotModulePreservation(
  seurat_query,
  name="Zhou-INH",
  statistics = "summary"
)

wrap_plots(plot_list, ncol=2)

Here we can see three different shades in the background. These represent cutoff values for interpreting the Z summary statistics. The following guidelines help us understand the preservation of these modules.

  • Z > 2, the module is not preserved in the query dataset.
  • 10 > Z > 2, the module is moderately preserved in the query dataset.
  • Z > 10, the module is highly preserved in the query dataset.

The following code shows how to plot additional module preservation stats.

# plot ranking stats
plot_list <- PlotModulePreservation(
  seurat_query,
  name="Zhou-INH",
  statistics = "rank"
)


wrap_plots(plot_list, ncol=2)

Plot all of the different stats:

plot_list <- PlotModulePreservation(
  seurat_query,
  name="Zhou-INH",
  statistics = "all",
  plot_labels=FALSE
)

wrap_plots(plot_list, ncol=6)