For this tutorial, we demonstrate the conversion utilities in scanalysis
to streamline the analysis process by using functions from Bioconductor and Seurat interchangably. In the current implementation of Seurat::as.SingleCellExperiment
and Seurat::as.Seurat
, lots of information is lost, preventing downstream analysis and causing errors if the object was converted at some point. Some examples of this are shown below before we begin to use the functionality in this package.
In this example, we aim to integrate two Seurat
objects with the overall analysis plan as below:
SCTransform
(includes identifying highly variable genes)SingleCellExperiment
and back to Seurat
objects (many possible reasons for this in practice, but we keep it general by just converting back and forth)We will note some roadblocks using the Seurat
conversion implementation and introduce our implementation. Note that our implementation utilizes the Seurat
implementation but adds upon it.
First we load scanalysis and load our example Seurat
object.
library(scanalysis)
seurat_original = Seurat::pbmc_small
Next we normalize using SCTransform
and run PCA on the highly variable genes.
seurat_normalized = seurat_original %>%
Seurat::SCTransform(verbose = FALSE)
Next we convert to a SingleCellExperiment
object, using the Seurat
implementation.
sce_converted = seurat_normalized %>%
Seurat::as.SingleCellExperiment()
We convert this back into a Seurat
object now, and note the information lost in the conversion process:
seurat_converted = sce_converted %>%
Seurat::as.Seurat()
First we try to look for the highly variable genes in the converted object.
# Variable Features post-conversion:
head(Seurat::VariableFeatures(seurat_converted))
## logical(0)
#Variable Features pre-conversion:
head(Seurat::VariableFeatures(seurat_normalized))
## [1] "NKG7" "PPBP" "GNLY" "PF4" "GNG11" "GZMB"
In the converted object, the HVGs are lost - this makes sense as they were never stored in the SCE object either.
Let’s look at some assay specific information now also. When SCTransform
is run, it creates a new assay named SCT
(or alternate experiment in SingleCellExperiment
terminology). The details shown below generalize to any other assays, such as hashtags (commonly named HTO
) or antibodies in a CITE-seq experiment (commonly named ADT
).
# Alternate Experiment (SingleCellExperiment) names:
SingleCellExperiment::altExps(sce_converted)
## List of length 0
## names(0):
# Assay (Seurat) names post-conversion:
Seurat::Assays(seurat_converted)
## [1] "RNA"
# Assay (Seurat) names pre-conversion:
Seurat::Assays(seurat_normalized)
## [1] "RNA" "SCT"
Here, we see that the SCT
assay was not converted into an alternate experiment in the SingleCellExperiment
object, and that the converted Seurat
object subsequently lost that information. This means that downstream analysis on the normalized data would not be possible after converting to SingleCellExperiment
and back. If there were other assays such as ADT
or HTO
in the Seurat
object, this would be lost in the SingleCellExperiment
and subsequently the Seurat
object.
We illustrate an example where this is an issue in an integration workflow. For simplicity and to not go out of scope of this tutorial, we begin to integrate an object with itself, according to this vignette under the SCTransform
tab. In practice, integrating an object with itself would not be actually done, but it still illustrates the need for comprehensive conversion between object types.
integration_list = list(seurat_converted, seurat_converted)
features = Seurat::SelectIntegrationFeatures(object.list = integration_list, nfeatures = 3000)
integration_list = Seurat::PrepSCTIntegration(integration_list, anchor.features = features)
## Error: The following assays have not been processed with SCTransform:
## object: 1 - assay: RNA
## object: 2 - assay: RNA
This fails, because the SCT
assay information was not carried over. In fact, even if the assay was converted into an alternate experiment and back into an assay, it would still fail. It requires the information from each of the slots in an Assay
object to be converted into a SingleCellExperiment
and vice versa. In this particular case, the data from the misc
slot of each assay must be transferred to the metadata
of the SingleCellExperiment
. For example, some information like VariableFeatures
in the Seurat
object, requires the addition of some new slots to the SingleCellExperiment
object, which we detail below. For now, we illustrate how our implementation avoids all the above problems.
We start with the normalized object, and convert into SCE and immediately back into Seurat.
sce_converted_new = seurat_normalized %>%
seurat_to_sce()
seurat_converted_new = sce_converted_new %>%
sce_to_seurat()
Now we try to look for the highly variable genes in the converted object.
# Variable Features post-conversion:
head(Seurat::VariableFeatures(seurat_converted_new))
## [1] "NKG7" "PPBP" "GNLY" "PF4" "GNG11" "GZMB"
# Variable Features pre-conversion:
head(Seurat::VariableFeatures(seurat_normalized))
## [1] "NKG7" "PPBP" "GNLY" "PF4" "GNG11" "GZMB"
The highly variable genes were successfully retained during the conversion process.
Now we look at the assay information:
# Alternate Experiment (SingleCellExperiment) names:
SingleCellExperiment::altExps(sce_converted_new)
## List of length 1
## names(1): RNA
# Assay (Seurat) names post-conversion:
Seurat::Assays(seurat_converted_new)
## [1] "SCT" "RNA"
# Assay (Seurat) names pre-conversion:
Seurat::Assays(seurat_normalized)
## [1] "RNA" "SCT"
# Default Assay post-conversion:
Seurat::DefaultAssay(seurat_converted_new)
## [1] "SCT"
# Default Assay pre-conversion:
Seurat::DefaultAssay(seurat_normalized)
## [1] "SCT"
The assays once again were successfully retained during the conversion process.
And the integration doesn’t fail:
integration_list = list(seurat_converted_new, seurat_converted_new)
features = Seurat::SelectIntegrationFeatures(object.list = integration_list, nfeatures = 3000)
integration_list = Seurat::PrepSCTIntegration(integration_list, anchor.features = features)
We don’t finish the integration, as this was just to illustrate an error caused by the Seurat
conversion implementation.
Now we describe briefly the implementation of conversion and corresponding places for information in the Seurat
and SingleCellExperiment
objects. As both of these data types may evolve over time, these conversion implementations will also need to, and if there is any loss of data not covered by this implementation, we ask that you create an issue so that we are aware and can implement a solution.
Each Assay
in the Seurat
object is stored as an altExp
in the SingleCellExperiment
object. Of note, the SingleCellExperiment
has the altExps
stored as SingleCellExperiment
s under one main SingleCellExperiment
. However, the Seurat
object treats all Assay
s at the same level, but there is a DefaultAssay
attribute. When objects are converted, the Assay
to be treated as the main one can be specified in the default_assay
argument to seurat_to_sce
. Right now it is not supported to interactively swap the main experiment in the SingleCellExperiment
as would be done using the DefaultAssay
function in Seurat
. However, all functions in this package take an altExp
argument where it makes sense, and if altExp = NULL
, it uses the main highest-level data.
Within each object, information from each Assay
is added into each (altExp
) SingleCellExperiment
. Information from the misc
slot in the Seurat
objects is transferred to the metadata
attribute of the SingleCellExperiment
. There is no slot for the scaled data in a SingleCellExperiment
, so this is added to the metadata(sce)$scaled
. This is also the case for the variable features, so this is added to the metadata(sce)$variable_features
.