This repository describes how to run a pySCENIC gene regulatory network inference analysis alongside a basic "best practices" expression analysis for single-cell data. This includes: * Standalone Jupyter notebooks for an interactive analysis * A Nextflow DSL1 workflow, which provides a semi-automated and streamlined method for running these steps * Details on pySCENIC installation, usage, and downstream analysis
See also the associated publication in Nature Protocols: https://doi.org/10.1038/s41596-020-0336-2.
For an advanced implementation of the steps in this protocol, see VSN Pipelines, a Nextflow DSL2 implementation of pySCENIC with comprehensive and customizable pipelines for expression analysis. This includes additional pySCENIC features (multi-runs, integrated motif- and track-based regulon pruning, loom file generation).
We recommend using this notebook as a template for running an interactive analysis in Jupyter. See the installation instructions for information on setting up a kernel with pySCENIC and other required packages.
The following tools are required to run the steps in this Nextflow pipeline: * Nextflow * A container system, either of: * Docker * Singularity
The following container images will be pulled by nextflow as needed: * Docker: aertslab/pyscenic:latest. * Singularity: aertslab/pySCENIC:latest. * See also here.
A quick test can be accomplished using the test
profile, which automatically pulls the testing dataset (described in full below):
nextflow run aertslab/SCENICprotocol \
-profile docker,test
This small test dataset takes approximately 70s to run using 6 threads on a standard desktop computer.
Alternately, the same data can be run with a more verbose approach (this is more illustrative for how to substitute other data into the pipeline). Download a minimum set of SCENIC database files for a human dataset (approximately 78 MB).
mkdir example && cd example/
# Transcription factors:
wget https://raw.githubusercontent.com/aertslab/SCENICprotocol/master/example/test_TFs_tiny.txt
# Motif to TF annotation database:
wget https://raw.githubusercontent.com/aertslab/SCENICprotocol/master/example/motifs.tbl
# Ranking databases:
wget https://raw.githubusercontent.com/aertslab/SCENICprotocol/master/example/genome-ranking.feather
# Finally, get a tiny sample expression matrix (loom format):
wget https://raw.githubusercontent.com/aertslab/SCENICprotocol/master/example/expr_mat_tiny.loom
Either Docker or Singularity images can be used by specifying the appropriate profile (-profile docker
or -profile singularity
).
Please note that for the tiny test dataset to run successfully, the default thresholds need to be lowered.
nextflow run aertslab/SCENICprotocol \
-profile docker \
--loom_input expr_mat_tiny.loom \
--loom_output pyscenic_integrated-output.loom \
--TFs test_TFs_tiny.txt \
--motifs motifs.tbl \
--db *feather \
--thr_min_genes 1
By default, this pipeline uses the container specified by the --pyscenic_container
parameter.
This is currently set to aertslab/pyscenic:0.9.19
, which uses a container with both pySCENIC and Scanpy 1.4.4.post1
installed.
A custom container can be used (e.g. one built on a local machine) by passing the name of this container to the --pyscenic_container
parameter.
The output of this pipeline is a loom-formatted file (by default: output/pyscenic_integrated-output.loom
) containing:
* The original expression matrix
* The pySCENIC-specific results:
* Regulons (TFs and their target genes)
* AUCell matrix (cell enrichment scores for each regulon)
* Dimensionality reduction embeddings based on the AUCell matrix (t-SNE, UMAP)
* Results from the parallel best-practices analysis using highly variable genes:
* Dimensionality reduction embeddings (t-SNE, UMAP)
* Louvain clustering annotations
On conda installation page it says conda create -n scenic_protocol python=3.6
however when doing pip install pyscenic
it tells me no available pandas1.3.5, and according to pandas their 1.3.5 is only available for python 3.7 or higher version.
(installed successfully with re-creating a python=3.7 env, opening issue to suggest maybe modify the installation instruction?)
I've generated cistarget database for zebrafish. but I'm stopped at the initializeScenic step.
May I ask how to bypass the requirement in initializeScenic()
that'org' should be one of: mgi, hgnc, dmel?
Bumps certifi from 2019.9.11 to 2022.12.7.
9e9e840
2022.12.07b81bdb2
2022.09.24939a28f
2022.09.14aca828a
2022.06.15.2de0eae1
Only use importlib.resources's new files() / Traversable API on Python ≥3.11 ...b8eb5e9
2022.06.15.147fb7ab
Fix deprecation warning on Python 3.11 (#199)b0b48e0
fixes #198 -- update link in license9d514b4
2022.06.154151e88
Add py.typed to MANIFEST.in to package in sdist (#196)Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps joblib from 0.14.0 to 1.2.0.
Sourced from joblib's changelog.
Release 1.2.0
Fix a security issue where
eval(pre_dispatch)
could potentially run arbitrary code. Now only basic numerics are supported. joblib/joblib#1327Make sure that joblib works even when multiprocessing is not available, for instance with Pyodide joblib/joblib#1256
Avoid unnecessary warnings when workers and main process delete the temporary memmap folder contents concurrently. joblib/joblib#1263
Fix memory alignment bug for pickles containing numpy arrays. This is especially important when loading the pickle with
mmap_mode != None
as the resultingnumpy.memmap
object would not be able to correct the misalignment without performing a memory copy. This bug would cause invalid computation and segmentation faults with native code that would directly access the underlying data buffer of a numpy array, for instance C/C++/Cython code compiled with older GCC versions or some old OpenBLAS written in platform specific assembly. joblib/joblib#1254Vendor cloudpickle 2.2.0 which adds support for PyPy 3.8+.
Vendor loky 3.3.0 which fixes several bugs including:
robustly forcibly terminating worker processes in case of a crash (joblib/joblib#1269);
avoiding leaking worker processes in case of nested loky parallel calls;
reliability spawn the correct number of reusable workers.
Release 1.1.0
Fix byte order inconsistency issue during deserialization using joblib.load in cross-endian environment: the numpy arrays are now always loaded to use the system byte order, independently of the byte order of the system that serialized the pickle. joblib/joblib#1181
Fix joblib.Memory bug with the
ignore
parameter when the cached function is a decorated function.
... (truncated)
5991350
Release 1.2.03fa2188
MAINT cleanup numpy warnings related to np.matrix in tests (#1340)cea26ff
CI test the future loky-3.3.0 branch (#1338)8aca6f4
MAINT: remove pytest.warns(None) warnings in pytest 7 (#1264)067ed4f
XFAIL test_child_raises_parent_exits_cleanly with multiprocessing (#1339)ac4ebd5
MAINT add back pytest warnings plugin (#1337)a23427d
Test child raises parent exits cleanly more reliable on macos (#1335)ac09691
[MAINT] various test updates (#1334)4a314b1
Vendor loky 3.2.0 (#1333)bdf47e9
Make test_parallel_with_interactively_defined_functions_default_backend timeo...Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
The example for creating a SCope compatible loom file calls an objects metadata. However, the loomfile, at least as it is outputed by PySCENIC, seems to contain no metadata field.
Changes summary:
* GRN inference step: multiprocessing is now used by default in place of dask to run GRNBoost2
* Images are now always pulled from DockerHub
* Requirements file updated
* New tiny test dataset (reduced run time) and test profile
* Additions to the PBMC10k tutorial (AUC threshold)
* Code and readme cleanup
* Additional (optional) parameter passes a fixed seed to the GRNBoost2 algorithm (e.g. --seed 777
)