This demo will walk you through a complete dataset projection and analysis using ProjecTILs. We will use as a query set the Tetramer+ CD8+ T cells from the study by Miller et al., Nature Immunol (2019)

First, install ProjecTILs. For detailed installation instructions refer to the ProjecTILs Github page

remotes::install_github("carmonalab/ProjecTILs")
library(ProjecTILs)

Load reference atlas and query data

First, load the default reference TIL atlas. If no reference map file is provided, the function load.reference.map() will automatically download it from https://doi.org/10.6084/m9.figshare.12478571

ref <- load.reference.map()
## [1] "Loading Default Reference Atlas..."
## [1] "/Users/mass/Documents/Projects/Cell_clustering/ProjecTILs.demo/ref_TILAtlas_mouse_v1.rds"
## [1] "Loaded Reference map ref_TILAtlas_mouse_v1"

Load query data - Miller et al., Nature Immunol (2019)

#A sample data set is provided with the ProjecTILs package
querydata <- ProjecTILs::query_example_seurat

More generally, it is possible to load a query matrix with gene names and barcodes (e.g. 10X format or raw counts)

##10X format
fname2 <- "./sample_data"
querydata2 <- read.sc.query(fname2, type="10x")

##Raw count matrix from GEO
library(GEOquery)
geo_acc <- "GSE86028"
getGEOSuppFiles(geo_acc)

fname3 <- sprintf("%s/GSE86028_TILs_sc_wt_mtko.tpm.log2.txt.gz", geo_acc)
querydata3 <- read.sc.query(fname3, type = "raw.log2")

Run Projection algorithm

In this case, the query data was previously normalized, so we can set skip.normalize=T

query.projected <- make.projection(querydata, ref=ref, skip.normalize = T)
## [1] "Using assay RNA for query"
## celltype.pred
##          CD4T          CD8T     Non-Tcell Tcell_unknown          Treg 
##             1          1288           183            17            11 
##       unknown 
##             1 
## CD8Tstate
##          Naive EffectorMemory     MemoryLike      Exhausted        unknown 
##              5             15             62           1201              5 
## [1] "184 out of 1501 ( 12 % ) non-pure T cells removed.  Use filter.cells=FALSE to avoid pre-filtering (NOT RECOMMENDED)"
## [1] "Aligning query to reference map for batch-correction..."
## 
## Projecting corrected query onto Reference PCA space
## [1] "Projecting corrected query onto Reference UMAP space"

Plot projection of new data over the reference in UMAP space. The contour lines display the density of projected query cells onto the reference map.

plot.projection(ref, query.projected)

We can also use different cell annotations to colour the reference map. For example, show the cell types predicted by the supervised classifier TILPRED

plot.projection(ref, query.projected, labels.col = "TILPRED")

Predict cell states

Predict the cell states in the query set using a nearest-neighbor algorithm

query.projected <- cellstate.predict(ref=ref, query=query.projected)
table(query.projected$functional.cluster)
## 
## CD8_EffectorMemory      CD8_NaiveLike            CD8_Tex           CD8_Tpex 
##                 51                 12               1188                 58 
##               Treg 
##                  8

Plot the predicted composition of the query in terms of reference T cell subtypes

plot.statepred.composition(ref, query.projected)

How do the gene expression levels compare between reference and query for the different cell states?

plot.states.radar(ref, query=query.projected)

Find discriminant dimensions

The dimensions in UMAP space summarize the main axes of variability of the reference map. What if the query data contains novel states? We can search for additional, maximally discriminant dimensions (either in ICA or PCA space) that explain new variability in the query set.

top.ica <- find.discriminant.dimensions(ref=ref, query=query.projected, reduction="ica")
head(top.ica)
##             stat  stat_abs p_val
## ICA_22 0.7298784 0.7298784     0
## ICA_7  0.7105181 0.7105181     0
## ICA_21 0.7017093 0.7017093     0
## ICA_33 0.6907275 0.6907275     0
## ICA_40 0.5934850 0.5934850     0
## ICA_25 0.5748989 0.5748989     0

See driver genes for the top 3 discriminant components

VizDimLoadings(ref, reduction = "ica", nfeatures = 10, dims=c(22,7,21), ncol=3)

Plot UMAP with extra dimension on the z-axis

Plot the most discriminant ICA component. These 3D plots are based on plotly and are interactive, when you generate them on your machine you will able to rotate the view and explore the cloud of points.

plot3d <- plot.discriminant.3d(ref, query.projected, extra.dim="ICA_22")
plot3d

Plot the cycling score calculated by the TILPRED cycling signature

plot3d <- plot.discriminant.3d(ref, query.projected, extra.dim="cycling.score")
plot3d

Focus the plot only on a specific state

plot3d <- plot.discriminant.3d(ref, query.projected, extra.dim="cycling.score", query.state="CD8_Tex")
plot3d

Perturbation vs. control

If we have a control vs. a perturbed sample (e.g. a genetically engineered strain), we can search for discriminant dimensions of the perturbed sample vs. the control (otherwise, by default this analysis is performed against the reference map)

#Simulate a condition which e.g. increases Gzmb expression compared to control
query.control <- subset(query.projected, subset=`Gzmb` < 1.5)
query.perturb <- subset(query.projected, subset=`Gzmb` >= 1.5)

plot.states.radar(ref, query=list("Control" = query.control, "Query" = query.perturb))

In this toy example, where we simulated a condition that increases Gzmb expression compared to control, we expect some gene module associated with granzymes to drive the discriminant analysis:

top.ica.wcontrol <- find.discriminant.dimensions(ref=ref, query=query.perturb, query.control=query.control)
head(top.ica.wcontrol)
##             stat  stat_abs        p_val
## ICA_26 0.4458653 0.4458653 0.000000e+00
## ICA_24 0.2915033 0.2915033 1.451378e-08
## ICA_42 0.2550725 0.2550725 2.932201e-06
## ICA_19 0.2523444 0.2523444 4.240935e-06
## ICA_28 0.2450128 0.2450128 1.121095e-05
## ICA_6  0.2190395 0.2190395 2.786904e-04
VizDimLoadings(ref, reduction = "ica", nfeatures = 10, dims=c(26,24,42), ncol=3)

Now we can plot the ICA dimension that captured the genetic changes associated to the perturbation of increasing Gzmb

plot3d <- plot.discriminant.3d(ref, query=query.perturb, query.control=query.control, extra.dim="ICA_26")
plot3d

Using a random subsetting, p-values should not be significant:

rand.list <- ProjecTILs:::randomSplit(query.projected, n=2, seed=1)
top.ica.ks.rand <- find.discriminant.dimensions(ref=ref, query=rand.list[[1]], query.control=rand.list[[2]], reduction="ica")
top.ica.ttest.rand <- find.discriminant.dimensions(ref=ref, query=rand.list[[1]], query.control=rand.list[[2]], reduction="ica", test = "t-test")

Further information

ProjecTILs repository

ProjecTILs case studies - INDEX - Repository

Publication: pre-print