FAQ

1. What type of input does scX accept?

scX is a “plug and play” tool that pre-processes the data using only the raw count matrix, allowing the dataset to be explored in depth. Nonetheless, it is advisable to input an SCE object when using the createSCEobject() function. The function, however, also accepts the raw count matrix (with genes as rows and cells as columns) or a Seurat object. In the latter two cases, it automatically transforms the inputs into an SCE object.

If you supply cell metadata, it must be contained in a dataframe with the cell names as rownames. Typical examples of metadata are cell type, sampleid, percentage of mitochondrial reads.

# Input = SingleCellExperiment object (sce)
cseo <- createSCEobject(xx = sce)

# Input = Seurat object (seu)
cseo <- createSCEobject(xx = seu)

# Alternatively, Input = combine a matrix (mat) from one of these classes:
# c("dgCMatrix", "Matrix", "matrix") and metadata as data.frame (df)
cseo <- createSCEobject(xx = mat, metadata = df)

scX is an extremely valuable tool for visualising and analysing single-cell or single-nuclei RNA sequencing data. To maximise its potential, it is important to have a pre-processed object, i.e. one that has already undergone QC filtering for genes and cells, as well as the appropriate normalisation for the type of data. If you have already computed partitions, such as a Louvain-Leiden partition to identify cell types or states, you can pass them as input to be visualised in PCA, UMAP, TSNE and obtain markers, DEGs between the different clusters, and so on. In that case use the partitionVars parameter. To visualize any additional information in plots, it must be passed through the metadataVars parameter.

cseo <- createSCEobject(xx = scXample, 
                        partitionVars = c("inferred_cell_type"), 
                        metadataVars = c("pseudotime", "sex"))

1.1 How to pass a normalized count matrix as input of `createSCEobject()`?

If you possess the normalized count matrix rather than the raw count matrix, you may transform it into an SCE object before providing it to createSCEobject()

sce <- SingleCellExperiment(list(logcounts=norm.mat))
cseo <- createSCEobject(xx = sce)

1.2 How to use Scanpy (.h5ad) or .loom objets?

For this it’s necessary to convert them to a SingleCellExperiment or Seurat objects. In this tutorial we will explain how to convert them as SingleCellExperiment using the zellkonverter package.

sce <- readH5AD(file,
        reader =  "R"
#Then run the createSCEobject as in 1)
)

From .loom, using the sceasy package

sceasy::convertFormat('filename.loom', from="loom", to="sce",
                       outFile='filename.rds')
sce <- readRDS('filename.rds')

2. Is there a maximum size limit for input objects that scX is capable of processing?

The constraints of scX are determined by the hardware of the computer, including the RAM and CPU. A typical PC (16GB, i5-8400) can efficiently handle datasets of medium size, containing up to 30k cells. Be aware that the number of partitions and clusters will also affect the processing time of a dataset.

To ensure smooth application and visualisation plots for datasets with more than 50k cells, the createSCEobject() function subsamples the SCE object with 50k random cells, storing the cell names in the CELLS2KEEP output.

If you have a cluster with only a few cells that you wish to retain, you can pass their names to the cells2keep parameter.

c2k <- colnames(scXample)[scXample$inferred_cell_type=="COP"]
cseo <- createSCEobject(xx = scXample, 
                        partitionVars = c("inferred_cell_type"), 
                        cells2keep = c2k) # Keeping COP cells in case of subsampling data for visualization

If you need to utilize over 50k cells and possess a 200k cell SCE object, implementing a stratified subsampling to preserve half the cells in each cluster is a viable option. This will result in the app exhibiting 100k cells during visualization.

library(dplyr)
# sce: SCE object of a dataset with 200k cells
sce$cellnames <- colnames(sce)
c2k <- colData(sce) %>% as.data.frame %>% 
        group_by(inferred_cell_type) %>% 
        sample_frac(0.5,replace =  F) # sample half of the cells in every cluster
cseo <- createSCEobject(xx = sce, 
                        partitionVars = c("inferred_cell_type"), 
                        metadataVars = c("pseudotime", "sex"))

# Change the output of the subsampling function to be able to use 100k cells:
cseo$CELLS2KEEP <- c2k$cellnames
launch_scX(cseo)

If you are willing to attempt to utilize all cells within the application with the potential drawback of running low on memory or experiencing extended waiting periods between the plots, you can modify the parameter to all prior to launching launch_scX().

cseo <- createSCEobject(xx = sce, 
                        partitionVars = c("inferred_cell_type"), 
                        metadataVars = c("pseudotime", "sex"))

# Change the output of the subsampling function:
cseo$CELLS2KEEP <- 'all'
launch_scX(cseo)

Important note: All the calculations in the preprocessing stage apply to every cell within the dataset, regardless of the subsampling provided by the cells2keep parameter for visualization purposes.

3. How to deploy a shiny app in shinyapps.io?

One way to deploy an app online is to use the shinyapps service. The first step is to create an account at https://www.shinyapps.io/. A user guide can be found here.

Save the SCE object in /data folder
Create an app.R script:

library(scX)
# Loading SCE object
load('data/data.Rdata')
launch_scX(cseo,point.size = 50)

Then run:

library(rsconnect)

rsconnect::setAccountInfo(name="<ACCOUNT>", 
                          token="<TOKEN>",
                          secret="<SECRET>")

rsconnect::deployApp('path/to/your/app.R')

Another option could be R Shiny Server, for more details please visit the following link. Finally, Amazon Web Services (AWS) can also be used. A step-by-step tutorial on how a shiny app can be deployed with AWS can be found here.

4. If I want to change any preprocessing parameters, do I need to go through all the steps again to relaunch the app?

No! createSCEobject() can take as input a SingleCellExpression object previously normalized (logcounts assay) and with reduced dims already estimated. In this case normalization and dimensionality reduction steps could be avoided.

For instance, if you initiated the application by using test.type = “wilcox” in paramFindMarkers but now intend to re-launch it utilizing a different sort of test, like t-test, just use the createSCEobject() function with the updated parameter, indicating that you do not require the other features to be recalculated.

createSCEobject(
  xx = scXample,
  assay.name.raw = "counts",
  assay.name.normalization = "logcounts",
  partitionVars = "inferred_cell_type", 
  metadataVars = c("source_name", "age", "sex", "strain", "treatment", "pseudotime"),
  calcRedDim = FALSE,
  paramFindMarkers = list(test.type = "t", pval.type = "all", direction = "up")
)

5. Can I remove bad quality cells within the app?

You cannot filter cells inside the app. Instead you can identify and download the cells id selected with lasso or box tool in “Find new markers” tab inside “Markers”. Then you can run these lines of code to filter those cells from the SCE object and redo the preprocessing steps.

## Following the quick start guide example:
#library(scX)
#cseo <- createSCEobject(xx = scXample, 
#                        partitionVars = "inferred_cell_type", 
#                        metadataVars = c("source_name", "age", "sex", "strain", "treatment", "pseudotime"),
#                        descriptionText = "Quick Start Guide")
#launch_scX(cseo)
## Inside the app select some cells with lasso or box tool in "Find new markers" tab and click 'Download Selected Cells'

# Load cells id:
cells2filter <- read.csv("/path/to/downloaded/file.csv")
 
# Subsetting cells from SCE object:
sce <- cseo$SCE[,!colnames(cseo$SCE) %in% cells2filter$Selected_Cells]

# We run the function to compute all calculations again with the new object. The input object now has "logcounts" assay. 
# In order to compute a new normalization we set 'assay.name.normalization' to empty string and it will overwrite it:
cseo <- createSCEobject(xx = sce,
                        assay.name.normalization = "",
                        partitionVars = "inferred_cell_type",
                        metadataVars = c("source_name", "age", "sex", "strain", "treatment", "pseudotime"),
                        descriptionText = "Quick Start Guide")
launch_scX(cseo)

6. Can I use a list of marker genes to analyze within the scX app?

Yes, you can pass a list of marker genes for every cluster in any partition of the data in the form of DataFrame through markerList. The preprocess function will compute marker statistics per gene for all clusters in the partitions passed through partitionVars but it will subset the list of genes to those present in the DataFrame.

# Custom marker genes for some clusters
COP <- c("Neu4", "Bmp4", "Gpr17")
NFOL1 <- c("Tcf7l2", "Frmd4a", "Pik3r3")
NFOL2 <- c("Fam107b", "Rras2", "Tmem2", "Cnksr3")
MFOL1 <- c("Ctps", "Mag", "RIPTIDE")
MFOL2 <- c("Wfdc18")

# data.frame preparation
partition <- c(rep("inferred_cell_type", times = length(COP) + length(NFOL1) + length(NFOL2) + length(MFOL1) + length(MFOL2)))
cluster <- c(rep("COP", length(COP)), rep("NFOL1", length(NFOL1)), rep("NFOL2", length(NFOL2)), rep("MFOL1", length(MFOL1)), rep("MFOL2", length(MFOL2)))
gene <- c(COP, NFOL1, NFOL2, MFOL1, MFOL2)

# data.frame creation
ml <- data.frame(Partition = partition, Cluster = cluster, Gene = gene)

# Code Example:
library(scX)
cseo <- createSCEobject(xx = scXample, 
                        partitionVars = "inferred_cell_type", 
                        metadataVars = c("source_name", "age", "sex", "strain", "treatment", "pseudotime"),
                        markerList = ml
                        descriptionText = "Quick Start Guide")
launch_scX(cseo)

7. How can I use scX docker image for singlecell analysis?

Firstly, you have to pull scX docker image from DockerHub:

docker pull chernolabs/scx

After that you will need an R script to run scX analysis and a Dockerfile to build the container. Suppose you have your SCE object saved in /path/to/host/directory/data/sce.rds and an R script called analysis_scX.R as follows:

# Library
suppressPackageStartupMessages(library(scX))

# Loading SCE object
sce <- readRDS("/home/data/sce.rds")

# Analysis
cseo <- createSCEobject(
    xx = sce,
    descriptionText = "Using scX with Docker!"
)

#Launch app
launch_scX(cseo, port = 9192, host = "0.0.0.0", launch.browser = FALSE)

We are setting the port 9192 (you can choose your favorite port) and launch.browser = FALSE. In the same directory, you have to create a Dockerfile similar to this one:

FROM msbeckel/scx:0.3.0

RUN mkdir /home/shiny-app

COPY scx_analysis.R /home/shiny-app/make_scX.R

RUN chmod -R 755 /home/shiny-app/
RUN mkdir /data && chown shiny:shiny /data

EXPOSE 9192

CMD Rscript /home/shiny-app/scx_analysis.R

With the Dockerfile complete, image is built with:

docker build -t scx_example ./path/to/Dockerfile/directory

and can be run as a standalone container with:

docker run --name scx_example_app \
    -v /path/to/host/directory/data:/home/data \
    -p 9192:9192 \
    scx_example