Building a custom reference atlas for ProjecTILs
ProjecTILs comes with a number of pre-calculated reference atlases (see TIL atlas and viral infection atlas). However, you may want to use your own custom atlas as a reference for dataset projection. This vignette will walk you through the steps required to convert your custom single-cell map into a reference atlas for ProjecTILs.
We are going to use a custom T cell reference described in this paper, and constructed with the following code: STACAS tutorial.
You can download the integrated map as a Seurat object from: stacas.cd4cd8.integrated.rds
Note 1: For speed and simplicity, here we are using a downsampled version of the original data.
Note 2: This is an illustrative example for object conversion and is not meant to be a robust reference atlas of T cell transcriptomics states.
Read in single-cell data
After loading the required packages, read the single-cell data into memory.
Check structure of this object:
An object of class Seurat
80681 features across 7008 samples within 2 assays
Active assay: integrated (26867 features, 500 variable features)
1 other assay present: RNA
2 dimensional reductions calculated: pca, umap
If you apply the following procedure to your own data, just make sure that you have a complete Seurat object with PCA and UMAP embeddings. We assume here that data have been integrated for batch correction (therefore the integrated assay); you may have to adapt the code accordingly if that is not the case for your custom atlas.
Prepare atlas for ProjecTILs
To be able to use the custom atlas as a reference in ProjecTILs, we need to recalculate PCA and UMAP embeddings externally of Seurat, using respectively the prcomp and umap packages. This is because the umap package implements a neat function (predict) that allows projection of new data into a pre-computed UMAP space.
- Re-compute PCA embeddings using prcomp
set.seed(1234)
which.assay="integrated"
varfeat <- data.seurat@assays[[which.assay]]@var.features
refdata <- data.frame(t(data.seurat@assays[[which.assay]]@data[varfeat,]))
refdata <- refdata[, sort(colnames(refdata))]
ref.pca <- prcomp(refdata, rank. = 50, scale. = TRUE, center = TRUE, retx=TRUE)
ref.pca$rotation[1:5,1:5]
PC1 PC2 PC3 PC4 PC5
AA467197 -0.054431991 0.053409005 -0.052268912 0.011614551 -0.022038049
Abi3bp -0.005532799 -0.014533122 0.005754474 -0.002245197 -0.001308135
Actb -0.042069262 0.011908243 -0.021182447 0.074690838 -0.021145342
Actbl2 -0.005367201 0.010318191 0.022109412 -0.028107566 -0.005608879
Actg1 -0.075794214 -0.005539866 -0.018269632 0.083783079 0.004597240
- Re-compute UMAP embeddings using umap
library(umap)
seed=1234
n.neighbors=30
min.dist=0.3
metric="cosine"
ndim=10
umap.config <- umap.defaults
umap.config$n_neighbors = n.neighbors
umap.config$min_dist = min.dist
umap.config$metric = metric
umap.config$n_components = 2
umap.config$random_state = seed
umap.config$transform_state = seed
ref.umap <- umap(ref.pca$x[,1:ndim], config=umap.config)
colnames(ref.umap$layout) <- c("UMAP_1","UMAP_2")
ref.umap
- Overwrite Seurat UMAP configuration with the one calculated externally
data.seurat@reductions$umap@cell.embeddings <- ref.umap$layout
data.seurat@reductions$pca@cell.embeddings <- ref.pca$x
data.seurat@reductions$pca@feature.loadings <- ref.pca$rotation
colnames(data.seurat@reductions$pca@cell.embeddings) <- gsub("PC(\\d+)", "PC_\\1", colnames(ref.pca$x), perl=TRUE)
colnames(data.seurat@reductions$pca@feature.loadings) <- gsub("PC(\\d+)", "PC_\\1", colnames(ref.pca$rotation), perl=TRUE)
#Store the complete PCA and UMAP object in @misc
data.seurat@misc$pca_object <- ref.pca
data.seurat@misc$umap_object <- ref.umap
data.seurat@misc$projecTILs="custom_atlas"
See the new data embedding
To use the integrated data as a reference atlas, we may want to annotate clusters of cells that form cell subtypes/states. First we perform unsupervised clustering:
set.seed(1234)
ndim=10
resol=0.7
#1) Find neighbors from PCA reduction
DefaultAssay(data.seurat) <- "integrated"
data.seurat <- FindNeighbors(data.seurat, reduction = "pca", dims = 1:ndim, k.param = 30)
#2) Find the clusters
data.seurat <- FindClusters(data.seurat, resolution = resol)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 7008
Number of edges: 396876
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.8402
Number of communities: 10
Elapsed time: 0 seconds
data.seurat@meta.data$cluster <- factor(data.seurat@meta.data[[sprintf("%s_snn_res.%s",which.assay,resol)]])
DimPlot(data.seurat, reduction="umap", label = TRUE, group.by = "cluster")
There are many types of analysis that can be applied to annotate cell clusters (e.g. differential expression analysis, expression of known markers, signature analysis, etc.) and it is beyond the scope of this toy example to do a thorough cluster annotation.
Regardless of the type of analysis performed, clusters or groups of clusters that reflect biologically meaningful properties of the cells can be assigned a label. In ProjecTILs we use the “functional.cluster” metadata field to store this cluster annotation for reference atlases.
f.clusters <- rep("Unknown", dim(data.seurat)[2])
names(f.clusters) <- colnames(data.seurat)
f.clusters[data.seurat$cluster %in% c(0)] = "Treg"
f.clusters[data.seurat$cluster %in% c(2)] = "CD8_Exhausted"
f.clusters[data.seurat$cluster %in% c(6,7)] = "CD8_PrecExhausted"
f.clusters[data.seurat$cluster %in% c(3,9)] = "CD4_Tfh"
f.clusters[data.seurat$cluster %in% c(5,8)] = "CD4_Th1"
f.clusters[data.seurat$cluster %in% c(4)] = "CD8_EM"
f.clusters[data.seurat$cluster %in% c(1)] = "CD8_NaiveLike"
data.seurat <- AddMetaData(data.seurat, as.factor(f.clusters), col.name = "functional.cluster")
table(f.clusters)
f.clusters
CD4_Tfh CD4_Th1 CD8_EM CD8_Exhausted
996 796 903 1088
CD8_NaiveLike CD8_PrecExhausted Treg
1142 876 1207
Idents(data.seurat) <- "functional.cluster"
genes.show <- c("Cd4","Cd8b1","Foxp3","Pdcd1","Ccr7","Havcr2","Xcl1","Tox","Gzmk")
VlnPlot(data.seurat, features=genes.show, pt.size = 0, assay="RNA", ncol=3)
Save your custom atlas to disk.
Now the atlas is ready for use as a ProjecTILs reference. Load the custom reference atlas from any script and start projecting new data!
[1] "Loading Custom Reference Atlas..."
[1] "Loaded Custom Reference map custom_atlas"
Conclusion
In summary, the essential steps needed to convert your custom atlas as a reference for ProjecTILs are:
- Recalculating the PCA and UMAP embeddings of your Seurat object, and store them in the appropriate slots.
- Populating the “functional.cluster” metadata field with your reference cell state annotations.
ProjecTILs GitHub page - Repository
ProjecTILs case studies - INDEX - Repository