Compiled: 26-02-2024

Source: vignettes/ST_basics.Rmd

This tutorial covers the basics of using hdWGCNA to perform co-expression network analysis on spot-based spatial transcriptomics (ST) data like 10X Genomics Visium. There are many different ST protocols, and datasets arising from these protocols tend to have different properties. For example, some ST protocols have single-cell resolution but only profile a subset of the transcriptome. On the other hand, some ST protocols like Visium and Slide-Seq (Curio Bioscience) profile the whole transcriptome. Aside from hdWGCNA, in general datasets from different ST protocols may require different data analysis steps.

In this tutorial, we demonstrate hdWGCNA using two ST datasets from different technologies. First, we use hdWGCNA to analyze a Visium ST dataset containing an anterior and posterior saggital section from the mouse brain. Next, we use hdWGCNA to analyze a Curio Seeker (Slide-Seq) dataset of the mouse hippocampus. Overall this tutorial is similar to the hdWGCNA in single-cell data tutorial, and we recommend exploring that tutorial prior to this tutorial.

Load required libraries

First we will load the required R libraries for this tutorial.

# single-cell analysis package
library(Seurat)

# package to install the mouse brain dataset
library(SeuratData)

# plotting and data science packages
library(tidyverse)
library(cowplot)
library(patchwork)

# co-expression network analysis packages:
library(WGCNA)
library(hdWGCNA)

# enable parallel processing for network analysis (optional)
enableWGCNAThreads(nThreads = 8)

# using the cowplot theme for ggplot
theme_set(theme_cowplot())

# set random seed for reproducibility
set.seed(12345)

10X Visium Dataset

In this section we demonstrate hdWGCNA in ST datasets using a publicly available Visium dataset of the mouse brain.

Download data and perform clustering analysis

Here we use SeuratData to download the mouse brain ST dataset, and we will process the dataset using Seurat.

Note on SeuratData download

In our own testing, we had difficulty running InstallData on our institution’s compute cluster, so we ran these commands locally and then copied the .rds file containing the Seurat object to the cluster for subsequent analysis.

# download the mouse brain ST dataset (stxBrain)
SeuratData::InstallData("stxBrain")

# load the anterior and posterior samples
brain <- LoadData("stxBrain", type = "anterior1")
brain$region <- 'anterior'
brain2 <- LoadData("stxBrain", type = "posterior1")
brain2$region <- 'posterior'

# merge into one seurat object
seurat_obj <- merge(brain, brain2)
seurat_obj$region <- factor(as.character(seurat_obj$region), levels=c('anterior', 'posterior'))

# save unprocessed object
saveRDS(seurat_obj, file='mouse_brain_ST_unprocessed.rds')

hdWGCNA requires the spatial coordinates to be stored in the seurat_obj@meta.data slot. Here we extract the image coordinates for the two samples, merge into a dataframe, and add it into the seurat_obj@meta.data. Specifically, the seurat_obj@meta.data must have columns named row, col, imagerow, and imagecol (shown below), otherwise the downstream steps will not work.

# make a dataframe containing the image coordinates for each sample
image_df <- do.call(rbind, lapply(names(seurat_obj@images), function(x){
  seurat_obj@images[[x]]@coordinates
}))

# merge the image_df with the Seurat metadata
new_meta <- merge(seurat_obj@meta.data, image_df, by='row.names')

# fix the row ordering to match the original seurat object
rownames(new_meta) <- new_meta$Row.names
ix <- match(as.character(colnames(seurat_obj)), as.character(rownames(new_meta)))
new_meta <- new_meta[ix,]

# add the new metadata to the seurat object
seurat_obj@meta.data <- new_meta

head(image_df)
    tissue row col imagerow imagecol
AAACAAGTATCTCCCA-1_1      1  50 102     7475     8501
AAACACCAATAACTGC-1_1      1  59  19     8553     2788
AAACAGAGCGACTCCT-1_1      1  14  94     3164     7950
AAACAGCTTTCAGAAG-1_1      1  43   9     6637     2099
AAACAGGGTCTATATT-1_1      1  47  13     7116     2375
AAACATGGTGAGAGGA-1_1      1  62   0     8913     1480

Now we perform clustering analysis using Seurat.

# normalization, feature selection, scaling, and PCA
seurat_obj <- seurat_obj %>%
  NormalizeData() %>%
  FindVariableFeatures() %>%
  ScaleData() %>%
  RunPCA()

# Louvain clustering and umap
seurat_obj <- FindNeighbors(seurat_obj, dims = 1:30)
seurat_obj <- FindClusters(seurat_obj,verbose = TRUE)
seurat_obj <- RunUMAP(seurat_obj, dims = 1:30)

# set factor level for anterior / posterior
seurat_mouse_vis$region <- factor(as.character(seurat_mouse_vis$region), levels=c('anterior', 'posterior'))

# show the UMAP
p1 <- DimPlot(seurat_obj, label=TRUE, reduction = "umap", group.by = "seurat_clusters") + NoLegend()
p1
p2 <- SpatialDimPlot(seurat_obj, label = TRUE, label.size = 3)
p2
We used the following cluster labels for this analysis.
See cluster labels
seurat_clusters annotation
0 Caudoputamen
1 White matter
2 Cortex L5
3 White matter
4 Cortex L6
5 Medulla
6 Pons
7 Cortex L2/3
8 Cerebellum molecular layer
9 Hippocampus
10 Cortex L1, Vasculature
11 Olfactory bulb outer
12 Cerebellum arbor vitae
13 Thalamus
14 Nucleus accumbens
15 Piriform area
16 Hypothalamums
17 Fiber tracts
18 Olfactory bulb inner
19 Cerebellum granular layer
20 Ventricles
21 Olfactory bulb fiber tracts
# add annotations to Seurat object
annotations <- read.csv('annotations.csv')
ix <- match(seurat_obj$seurat_clusters, annotations$seurat_clusters)
seurat_obj$annotation <- annotations$annotation[ix]

# set idents
Idents(seurat_obj) <- seurat_obj$annotation

p3 <- SpatialDimPlot(seurat_obj, label = TRUE, label.size = 3)
p3 + NoLegend()

Construct metaspots with hdWGCNA

Visium ST generates sparse gene expression profiles in each spot, thus introducing the same potential pitfalls as single-cell data for co-expression network analysis. To alleviate these issues, hdWGCNA includes a data aggregation approach to produce spatial metaspots, similar to our metacell algorithm. This approach aggregates neighboring spots based on spatial coordinates rather than their transcriptomes. This procedure is performed in hdWGCNA using the MetaspotsByGroups function.

Here we set up the data for hdWGCNA and run MetaspotsByGroups. Similar to MetacellsByGroups, the group.by parameter slices the Seurat object to construct metaspots separately for each group. Here we are just grouping by the ST slides to perform this step separately for the anterior and posterior sample, but you could specify cluster or anatomical regions as well to suit your analysis.

seurat_obj <- SetupForWGCNA(
  seurat_obj,
  gene_select = "fraction",
  fraction = 0.05,
  wgcna_name = "vis"
)

seurat_obj <- MetaspotsByGroups(
  seurat_obj,
  group.by = c("region"),
  ident.group = "region",
  assay = 'Spatial'
)
seurat_obj  <- NormalizeMetacells(seurat_obj)

The metaspot object is used everywhere that the metacell object would be used for downstream analysis. For example, to extract the metaspot object, you can run the GetMetacellObject function.

m_obj <- GetMetacellObject(seurat_obj)
m_obj
An object of class Seurat
31053 features across 1505 samples within 1 assay
Active assay: Spatial (31053 features, 0 variable features)

Co-expression network analysis

Now we are ready to perform co-expression network analysis using an identical pipeline to the single-cell workflow. For this analysis, we are performing brain-wide network analysis using all spots from all regions, but this analysis could be adjusted to perform network analysis on specific regions.

# set up the expression matrix, set group.by and group_name to NULL to include all spots
seurat_obj  <- SetDatExpr(
  seurat_obj,
  group.by=NULL,
  group_name = NULL
)

# test different soft power thresholds
seurat_obj <- TestSoftPowers(seurat_obj)
plot_list <- PlotSoftPowers(seurat_obj)

wrap_plots(plot_list, ncol=2)
# construct co-expression network:
seurat_obj <- ConstructNetwork(
  seurat_obj,
  tom_name='test',
  overwrite_tom=TRUE
)

# plot the dendrogram
PlotDendrogram(seurat_obj, main='Spatial hdWGCNA dendrogram')

Next, we compute module eigengenes (MEs) and eigengene-based connectivities (kMEs) using the ModuleEigengenes and ModuleConnectivity functions respectively.

seurat_obj <- ModuleEigengenes(seurat_obj)
seurat_obj <- ModuleConnectivity(seurat_obj)

Here we reset the module names with the prefix “SM” (spatial modules). This step is optional.

seurat_obj <- ResetModuleNames(
  seurat_obj,
  new_name = "SM"
)

Data visualization

Here we visualize module eigengenes using the Seurat functions DotPlot and SpatialFeaturePlot. For network visualization, please refer to the Network visualization tutorial.

# get module eigengenes and gene-module assignment tables
MEs <- GetMEs(seurat_obj)
modules <- GetModules(seurat_obj)
mods <- levels(modules$module); mods <- mods[mods != 'grey']

# add the MEs to the seurat metadata so we can plot it with Seurat functions
seurat_obj@meta.data <- cbind(seurat_obj@meta.data, MEs)

# plot with Seurat's DotPlot function
p <- DotPlot(seurat_obj, features=mods, group.by = 'annotation', dot.min=0.1)

# flip the x/y axes, rotate the axis labels, and change color scheme:
p <- p +
  coord_flip() +
  RotatedAxis() +
  scale_color_gradient2(high='red', mid='grey95', low='blue') +
  xlab('') + ylab('')

p

We can visualize the MEs directly on the spatial coordinates using SpatialFeaturePlot.

p <- SpatialFeaturePlot(
  seurat_obj,
  features = mods,
  alpha = c(0.1, 1),
  ncol = 8
)

p

Next we visualize the co-expression network using UMAP. Please refer to the network visualization tutorial for more details about visualizing the co-expression network.

# perform UMAP embedding on the co-expression network
seurat_obj <- RunModuleUMAP(
  seurat_obj,
  n_hubs = 5,
  n_neighbors=15,
  min_dist=0.3,
  spread=1
)

# make the network plot
ModuleUMAPPlot(
  seurat_obj,
  edge.alpha=0.5,
  sample_edges=TRUE,
  keep_grey_edges=FALSE,
  edge_prop=0.075, 
  label_hubs=5 
)

Curio Seeker Dataset

In this section we perform a similar analysis using a mouse hippocampus dataset from Curio Bioscience. This dataset is not publicly available, but we obtained this dataset directly from Curio by filling out this form on their website. Curio provided a fully processed Seurat object, so for this tutorial we are simply using the Seurat object and clustering that they provided.

Load dataset

First we load the Seurat object containing the mouse hippocampus dataset provided by Curio. We will also make a few plots to show the clustering in transcriptome space (UMAP reduction) and in the biological coordinates. Note that this dataset provides a “dim reduction” called “SPATIAL” which contains the 2D biological coordinates.

# load the Seurat object
seurat_curio <- readRDS(paste0(data_dir, 'curio_datasets/Mouse_hippocampus_v1pt1/Mouse_hippocampus_seurat.rds'))

# make a dimplot in transcriptome space
p1 <- DimPlot(seurat_curio, label=TRUE, reduction = "umap") +
  NoLegend() + umap_theme() + ggtitle('UMAP')

# make a dimplot in biological space
p2 <- DimPlot(seurat_curio, reduction='SPATIAL', pt.size=0.5) + 
  umap_theme() + ggtitle('Spatial')

p1 | p2

This plot is a little crowded so can make another DimPlot split by cluster to see each
cluster on its own.

p <- DimPlot(
  seurat_curio, 
  split.by='seurat_clusters',
  reduction = 'SPATIAL',
  ncol=6
) + NoLegend() + umap_theme() 

p

Construct metacells with hdWGCNA

While Visium ST uses a grid of spots to assign molecular barcodes in space, Curio Seeker uses small beads to capture transcriptome information at 10 microns. Importantly, these beads are not regularly spaced like in Visium. Therefore, the MetaspotsByGroups function that we used for the Visium dataset is not appropriate for Curio Seeker data. Instead, we will use MetacellsByGroups but we will specify the reduction as the spatial information, so cells that are close together in biological space will be aggregated into metacells.

# Change the default assay to "RNA", SCTransform is generally not recommended for hdWGCNA.
seurat_curio <- DefaultAssay(seurat_curio, 'RNA')

seurat_curio <- SetupForWGCNA(
  seurat_curio,
  gene_select = "fraction", 
  fraction = 0.01, 
  wgcna_name = "curio" 
)

# construct metacells  in each group
seurat_curio <- MetacellsByGroups(
  seurat_obj = seurat_curio,
  group.by = c("seurat_clusters"), 
  ident.group = 'seurat_clusters',
  reduction = 'SPATIAL',
  k = 50, 
  max_shared = 15, 
  assay = 'RNA'
)

# normalize metacell expression matrix:
seurat_curio <- NormalizeMetacells(seurat_curio)

We can calculate the average X and Y coordinates for each metacell and plot them to check if they are aggregated based on spatial proximity for each cluster as we expect.

Show code for calculating metacell spatial coordinates
# get the metacell object
m_obj <- GetMetacellObject(seurat_curio)
m_obj$seurat_clusters <- factor(
  m_obj$seurat_clusters,
  levels = levels(seurat_curio$seurat_clusters)
)

# get the image coordinates from the seurat obj and add to the metadata
spatial_coords <- seurat_curio@images$slice1@coordinates
seurat_curio@meta.data$spatial_x <- spatial_coords$x * -1
seurat_curio@meta.data$spatial_y <- spatial_coords$y
seurat_curio$barcode <- colnames(seurat_curio)

# loop over each metacell and calculate the average X and Y
meta_spatial_coords <- do.call(rbind, lapply(1:ncol(m_obj), function(i){
  x <- m_obj@meta.data$cells_merged[i]
  cur_bcs <- str_split(x, ',')[[1]]
  cur_df <- subset(seurat_curio@meta.data, barcode %in% cur_bcs)
  cur_x <- mean(cur_df$spatial_x)
  cur_y <- mean(cur_df$spatial_y)
  data.frame(spatial_1 = cur_y, spatial_2=cur_x)
}))
rownames(meta_spatial_coords) <- colnames(m_obj)

# add this as a dim reduct
m_obj@reductions$spatial <- CreateDimReducObject(
  embeddings = as.matrix(meta_spatial_coords),
  key = 'spatial'
)

p <- DimPlot(
  m_obj, 
  split.by='seurat_clusters',
  reduction = 'spatial',
  ncol=6
) + NoLegend() + umap_theme() 

p

Co-expression network analysis

Now that we have our metacells, we next carry out co-expression network analysis with hdWGCNA, similar to as we have done before. Here we perform network analysis on cluster 6, which appears to contain the pyramidal layer of the hippocampus and the granular layer of the dentate gyrus.

# set up the expression matrix
seurat_curio  <- SetDatExpr(
  seurat_curio,
  group.by='seurat_clusters',
  group_name = '6'
)

# test different soft power thresholds
seurat_curio <- TestSoftPowers(seurat_curio)

# construct co-expression network:
seurat_curio <- ConstructNetwork(
  seurat_curio,
  tom_name='curio',
  overwrite_tom=TRUE
)

# compute module eigengenes & connectivity
seurat_curio <- ModuleEigengenes(seurat_curio)
seurat_curio <- ModuleConnectivity(
  seurat_curio,
  group.by = 'seurat_clusters', 
  group_name = '6'
)

# plot the dendrogram 
PlotDendrogram(seurat_curio, main='Curio hdWGCNA dendrogram')

Next we plot the expression of these modules in the tissue coordinates.

plot_list <- ModuleFeaturePlot(
  seurat_curio,
  reduction = 'SPATIAL',
  restrict_range=FALSE
)

wrap_plots(plot_list, ncol=4)

This concludes the part of the tutorial for the Curio dataset, but from here we could proceed to perform downstream visualization and analysis.

Conclusion and next steps

In this tutorial we demonstrated the core functions for performing co-expression network analysis in two types of ST datasets: Visium and Slide-Seq. From here we encourage you to explore our other tutorials for downstream analysis of these co-expression networks.

Session Info
SessionInfo()
Put it here