FAQ
FAQ.Rmd
1. What type of input does scX accept?
scX is a “plug and play” tool that pre-processes the data using only
the raw count matrix, allowing the dataset to be explored in depth.
Nonetheless, it is advisable to input an SCE object
when using the createSCEobject()
function. The function,
however, also accepts the raw count matrix (with genes
as rows and cells as columns) or a Seurat object. In
the latter two cases, it automatically transforms the inputs into an SCE
object.
If you supply cell metadata, it must be contained in a dataframe with the cell names as rownames. Typical examples of metadata are cell type, sampleid, percentage of mitochondrial reads.
# Input = SingleCellExperiment object (sce)
cseo <- createSCEobject(xx = sce)
# Input = Seurat object (seu)
cseo <- createSCEobject(xx = seu)
# Alternatively, Input = combine a matrix (mat) from one of these classes:
# c("dgCMatrix", "Matrix", "matrix") and metadata as data.frame (df)
cseo <- createSCEobject(xx = mat, metadata = df)
scX is an extremely valuable tool for visualising and analysing
single-cell or single-nuclei RNA sequencing data. To maximise its
potential, it is important to have a pre-processed object, i.e. one that
has already undergone QC filtering for genes and cells, as well as the
appropriate normalisation for the type of data. If you have already
computed partitions, such as a Louvain-Leiden partition to identify cell
types or states, you can pass them as input to be visualised in PCA,
UMAP, TSNE and obtain markers, DEGs between the different clusters, and
so on. In that case use the partitionVars
parameter. To
visualize any additional information in plots, it must be passed through
the metadataVars
parameter.
cseo <- createSCEobject(xx = scXample,
partitionVars = c("inferred_cell_type"),
metadataVars = c("pseudotime", "sex"))
1.1 How to pass a normalized count matrix as input of
createSCEobject()
?
If you possess the normalized count matrix rather than the raw count
matrix, you may transform it into an SCE object before providing it to
createSCEobject()
sce <- SingleCellExperiment(list(logcounts=norm.mat))
cseo <- createSCEobject(xx = sce)
1.2 How to use Scanpy (.h5ad) or .loom objets?
For this it’s necessary to convert them to a SingleCellExperiment or Seurat objects. In this tutorial we will explain how to convert them as SingleCellExperiment using the zellkonverter package.
sce <- readH5AD(file,
reader = "R"
#Then run the createSCEobject as in 1)
)
From .loom, using the sceasy package
sceasy::convertFormat('filename.loom', from="loom", to="sce",
outFile='filename.rds')
sce <- readRDS('filename.rds')
2. Is there a maximum size limit for input objects that scX is capable of processing?
The constraints of scX are determined by the hardware of the computer, including the RAM and CPU. A typical PC (16GB, i5-8400) can efficiently handle datasets of medium size, containing up to 30k cells. Be aware that the number of partitions and clusters will also affect the processing time of a dataset.
To ensure smooth application and visualisation plots for datasets
with more than 50k cells, the createSCEobject()
function
subsamples the SCE object with 50k random cells, storing the cell names
in the CELLS2KEEP
output.
- If you have a cluster with only a few cells that you wish to retain,
you can pass their names to the
cells2keep
parameter.
c2k <- colnames(scXample)[scXample$inferred_cell_type=="COP"]
cseo <- createSCEobject(xx = scXample,
partitionVars = c("inferred_cell_type"),
cells2keep = c2k) # Keeping COP cells in case of subsampling data for visualization
- If you need to utilize over 50k cells and possess a 200k cell SCE object, implementing a stratified subsampling to preserve half the cells in each cluster is a viable option. This will result in the app exhibiting 100k cells during visualization.
library(dplyr)
# sce: SCE object of a dataset with 200k cells
sce$cellnames <- colnames(sce)
c2k <- colData(sce) %>% as.data.frame %>%
group_by(inferred_cell_type) %>%
sample_frac(0.5,replace = F) # sample half of the cells in every cluster
cseo <- createSCEobject(xx = sce,
partitionVars = c("inferred_cell_type"),
metadataVars = c("pseudotime", "sex"))
# Change the output of the subsampling function to be able to use 100k cells:
cseo$CELLS2KEEP <- c2k$cellnames
launch_scX(cseo)
- If you are willing to attempt to utilize all cells within the
application with the potential drawback of running low on memory or
experiencing extended waiting periods between the plots, you can modify
the parameter to
all
prior to launchinglaunch_scX()
.
cseo <- createSCEobject(xx = sce,
partitionVars = c("inferred_cell_type"),
metadataVars = c("pseudotime", "sex"))
# Change the output of the subsampling function:
cseo$CELLS2KEEP <- 'all'
launch_scX(cseo)
Important note: All the calculations in the
preprocessing stage apply to every cell within the dataset, regardless
of the subsampling provided by the cells2keep
parameter for
visualization purposes.
3. How to deploy a shiny app in shinyapps.io?
One way to deploy an app online is to use the shinyapps service. The first step is to create an account at https://www.shinyapps.io/. A user guide can be found here.
- Save the SCE object in
/data
folder - Create an app.R script:
library(scX)
# Loading SCE object
load('data/data.Rdata')
launch_scX(cseo,point.size = 50)
- Then run:
library(rsconnect)
rsconnect::setAccountInfo(name="<ACCOUNT>",
token="<TOKEN>",
secret="<SECRET>")
rsconnect::deployApp('path/to/your/app.R')
Another option could be R Shiny Server, for more details please visit the following link. Finally, Amazon Web Services (AWS) can also be used. A step-by-step tutorial on how a shiny app can be deployed with AWS can be found here.
4. If I want to change any preprocessing parameters, do I need to go through all the steps again to relaunch the app?
No! createSCEobject() can take as input a SingleCellExpression object previously normalized (logcounts assay) and with reduced dims already estimated. In this case normalization and dimensionality reduction steps could be avoided.
For instance, if you initiated the application by using test.type = “wilcox” in paramFindMarkers but now intend to re-launch it utilizing a different sort of test, like t-test, just use the createSCEobject() function with the updated parameter, indicating that you do not require the other features to be recalculated.
createSCEobject(
xx = scXample,
assay.name.raw = "counts",
assay.name.normalization = "logcounts",
partitionVars = "inferred_cell_type",
metadataVars = c("source_name", "age", "sex", "strain", "treatment", "pseudotime"),
calcRedDim = FALSE,
paramFindMarkers = list(test.type = "t", pval.type = "all", direction = "up")
)
5. Can I remove bad quality cells within the app?
You cannot filter cells inside the app. Instead you can identify and download the cells id selected with lasso or box tool in “Find new markers” tab inside “Markers”. Then you can run these lines of code to filter those cells from the SCE object and redo the preprocessing steps.
## Following the quick start guide example:
#library(scX)
#cseo <- createSCEobject(xx = scXample,
# partitionVars = "inferred_cell_type",
# metadataVars = c("source_name", "age", "sex", "strain", "treatment", "pseudotime"),
# descriptionText = "Quick Start Guide")
#launch_scX(cseo)
## Inside the app select some cells with lasso or box tool in "Find new markers" tab and click 'Download Selected Cells'
# Load cells id:
cells2filter <- read.csv("/path/to/downloaded/file.csv")
# Subsetting cells from SCE object:
sce <- cseo$SCE[,!colnames(cseo$SCE) %in% cells2filter$Selected_Cells]
# We run the function to compute all calculations again with the new object. The input object now has "logcounts" assay.
# In order to compute a new normalization we set 'assay.name.normalization' to empty string and it will overwrite it:
cseo <- createSCEobject(xx = sce,
assay.name.normalization = "",
partitionVars = "inferred_cell_type",
metadataVars = c("source_name", "age", "sex", "strain", "treatment", "pseudotime"),
descriptionText = "Quick Start Guide")
launch_scX(cseo)
6. Can I use a list of marker genes to analyze within the scX app?
Yes, you can pass a list of marker genes for every cluster in any
partition of the data in the form of DataFrame through
markerList
. The preprocess function will compute marker
statistics per gene for all clusters in the partitions passed through
partitionVars
but it will subset the list of genes to those
present in the DataFrame.
# Custom marker genes for some clusters
COP <- c("Neu4", "Bmp4", "Gpr17")
NFOL1 <- c("Tcf7l2", "Frmd4a", "Pik3r3")
NFOL2 <- c("Fam107b", "Rras2", "Tmem2", "Cnksr3")
MFOL1 <- c("Ctps", "Mag", "RIPTIDE")
MFOL2 <- c("Wfdc18")
# data.frame preparation
partition <- c(rep("inferred_cell_type", times = length(COP) + length(NFOL1) + length(NFOL2) + length(MFOL1) + length(MFOL2)))
cluster <- c(rep("COP", length(COP)), rep("NFOL1", length(NFOL1)), rep("NFOL2", length(NFOL2)), rep("MFOL1", length(MFOL1)), rep("MFOL2", length(MFOL2)))
gene <- c(COP, NFOL1, NFOL2, MFOL1, MFOL2)
# data.frame creation
ml <- data.frame(Partition = partition, Cluster = cluster, Gene = gene)
# Code Example:
library(scX)
cseo <- createSCEobject(xx = scXample,
partitionVars = "inferred_cell_type",
metadataVars = c("source_name", "age", "sex", "strain", "treatment", "pseudotime"),
markerList = ml
descriptionText = "Quick Start Guide")
launch_scX(cseo)
7. How can I use scX docker image for singlecell analysis?
Firstly, you have to pull scX docker image from DockerHub:
docker pull chernolabs/scx
After that you will need an R script to run scX analysis and a Dockerfile to build the container. Suppose you have your SCE object saved in /path/to/host/directory/data/sce.rds and an R script called analysis_scX.R as follows:
# Library
suppressPackageStartupMessages(library(scX))
# Loading SCE object
sce <- readRDS("/home/data/sce.rds")
# Analysis
cseo <- createSCEobject(
xx = sce,
descriptionText = "Using scX with Docker!"
)
#Launch app
launch_scX(cseo, port = 9192, host = "0.0.0.0", launch.browser = FALSE)
We are setting the port 9192 (you can choose your favorite port) and launch.browser = FALSE. In the same directory, you have to create a Dockerfile similar to this one:
FROM msbeckel/scx:0.3.0
RUN mkdir /home/shiny-app
COPY scx_analysis.R /home/shiny-app/make_scX.R
RUN chmod -R 755 /home/shiny-app/
RUN mkdir /data && chown shiny:shiny /data
EXPOSE 9192
CMD Rscript /home/shiny-app/scx_analysis.R
With the Dockerfile complete, image is built with:
docker build -t scx_example ./path/to/Dockerfile/directory
and can be run as a standalone container with:
docker run --name scx_example_app \
-v /path/to/host/directory/data:/home/data \
-p 9192:9192 \
scx_example