A scalable SCENIC workflow for single-cell gene regulatory network analysis

aertslab, updated 🕥 2022-12-08 07:04:22

A scalable SCENIC workflow for single-cell gene regulatory network analysis

This repository describes how to run a pySCENIC gene regulatory network inference analysis alongside a basic "best practices" expression analysis for single-cell data. This includes: * Standalone Jupyter notebooks for an interactive analysis * A Nextflow DSL1 workflow, which provides a semi-automated and streamlined method for running these steps * Details on pySCENIC installation, usage, and downstream analysis

See also the associated publication in Nature Protocols: https://doi.org/10.1038/s41596-020-0336-2.

For an advanced implementation of the steps in this protocol, see VSN Pipelines, a Nextflow DSL2 implementation of pySCENIC with comprehensive and customizable pipelines for expression analysis. This includes additional pySCENIC features (multi-runs, integrated motif- and track-based regulon pruning, loom file generation).

Overview

SCENIC workflow diagram


Quick start

Running the pySCENIC pipeline in a Jupyter notebook

We recommend using this notebook as a template for running an interactive analysis in Jupyter. See the installation instructions for information on setting up a kernel with pySCENIC and other required packages.

Running the Nextflow pipeline on the example dataset

Requirements (Nextflow/containers)

The following tools are required to run the steps in this Nextflow pipeline: * Nextflow * A container system, either of: * Docker * Singularity

The following container images will be pulled by nextflow as needed: * Docker: aertslab/pyscenic:latest. * Singularity: aertslab/pySCENIC:latest. * See also here.

Using the test profile

A quick test can be accomplished using the test profile, which automatically pulls the testing dataset (described in full below):

nextflow run aertslab/SCENICprotocol \
    -profile docker,test

This small test dataset takes approximately 70s to run using 6 threads on a standard desktop computer.

Download testing dataset

Alternately, the same data can be run with a more verbose approach (this is more illustrative for how to substitute other data into the pipeline). Download a minimum set of SCENIC database files for a human dataset (approximately 78 MB).

mkdir example && cd example/
# Transcription factors:
wget https://raw.githubusercontent.com/aertslab/SCENICprotocol/master/example/test_TFs_tiny.txt
# Motif to TF annotation database:
wget https://raw.githubusercontent.com/aertslab/SCENICprotocol/master/example/motifs.tbl
# Ranking databases:
wget https://raw.githubusercontent.com/aertslab/SCENICprotocol/master/example/genome-ranking.feather
# Finally, get a tiny sample expression matrix (loom format):
wget https://raw.githubusercontent.com/aertslab/SCENICprotocol/master/example/expr_mat_tiny.loom

Running the example pipeline

Either Docker or Singularity images can be used by specifying the appropriate profile (-profile docker or -profile singularity). Please note that for the tiny test dataset to run successfully, the default thresholds need to be lowered.

Using loom input
nextflow run aertslab/SCENICprotocol \
    -profile docker \
    --loom_input expr_mat_tiny.loom \
    --loom_output pyscenic_integrated-output.loom \
    --TFs test_TFs_tiny.txt \
    --motifs motifs.tbl \
    --db *feather \
    --thr_min_genes 1

By default, this pipeline uses the container specified by the --pyscenic_container parameter. This is currently set to aertslab/pyscenic:0.9.19, which uses a container with both pySCENIC and Scanpy 1.4.4.post1 installed. A custom container can be used (e.g. one built on a local machine) by passing the name of this container to the --pyscenic_container parameter.

Expected output

The output of this pipeline is a loom-formatted file (by default: output/pyscenic_integrated-output.loom) containing: * The original expression matrix * The pySCENIC-specific results: * Regulons (TFs and their target genes) * AUCell matrix (cell enrichment scores for each regulon) * Dimensionality reduction embeddings based on the AUCell matrix (t-SNE, UMAP) * Results from the parallel best-practices analysis using highly variable genes: * Dimensionality reduction embeddings (t-SNE, UMAP) * Louvain clustering annotations

General requirements for this workflow

  • Python version 3.6 or greater
  • Tested on various Unix/Linux distributions (Ubuntu 18.04, CentOS 7.6.1810, MacOS 10.14.5)

References and more information

SCENIC

SCope

Scanpy

Issues

Installation says python3.6 but pandas1.3.5 requires python3.7+

opened on 2023-03-01 10:05:36 by Jiayi-Zheng

On conda installation page it says conda create -n scenic_protocol python=3.6 however when doing pip install pyscenic it tells me no available pandas1.3.5, and according to pandas their 1.3.5 is only available for python 3.7 or higher version.

(installed successfully with re-creating a python=3.7 env, opening issue to suggest maybe modify the installation instruction?)

how to apply SCENIC to another species(zebrafish) except for human,mouse and fly?

opened on 2023-01-04 06:35:10 by kimjisoo18

I've generated cistarget database for zebrafish. but I'm stopped at the initializeScenic step. May I ask how to bypass the requirement in initializeScenic() that'org' should be one of: mgi, hgnc, dmel?

Bump certifi from 2019.9.11 to 2022.12.7

opened on 2022-12-08 07:04:21 by dependabot[bot]

Bumps certifi from 2019.9.11 to 2022.12.7.

Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/aertslab/SCENICprotocol/network/alerts).

Problem to install VSN_Pipelines

opened on 2022-10-25 21:47:16 by StevenTur None

Bump joblib from 0.14.0 to 1.2.0

opened on 2022-09-30 18:35:40 by dependabot[bot]

Bumps joblib from 0.14.0 to 1.2.0.

Changelog

Sourced from joblib's changelog.

Release 1.2.0

  • Fix a security issue where eval(pre_dispatch) could potentially run arbitrary code. Now only basic numerics are supported. joblib/joblib#1327

  • Make sure that joblib works even when multiprocessing is not available, for instance with Pyodide joblib/joblib#1256

  • Avoid unnecessary warnings when workers and main process delete the temporary memmap folder contents concurrently. joblib/joblib#1263

  • Fix memory alignment bug for pickles containing numpy arrays. This is especially important when loading the pickle with mmap_mode != None as the resulting numpy.memmap object would not be able to correct the misalignment without performing a memory copy. This bug would cause invalid computation and segmentation faults with native code that would directly access the underlying data buffer of a numpy array, for instance C/C++/Cython code compiled with older GCC versions or some old OpenBLAS written in platform specific assembly. joblib/joblib#1254

  • Vendor cloudpickle 2.2.0 which adds support for PyPy 3.8+.

  • Vendor loky 3.3.0 which fixes several bugs including:

    • robustly forcibly terminating worker processes in case of a crash (joblib/joblib#1269);

    • avoiding leaking worker processes in case of nested loky parallel calls;

    • reliability spawn the correct number of reusable workers.

Release 1.1.0

  • Fix byte order inconsistency issue during deserialization using joblib.load in cross-endian environment: the numpy arrays are now always loaded to use the system byte order, independently of the byte order of the system that serialized the pickle. joblib/joblib#1181

  • Fix joblib.Memory bug with the ignore parameter when the cached function is a decorated function.

... (truncated)

Commits
  • 5991350 Release 1.2.0
  • 3fa2188 MAINT cleanup numpy warnings related to np.matrix in tests (#1340)
  • cea26ff CI test the future loky-3.3.0 branch (#1338)
  • 8aca6f4 MAINT: remove pytest.warns(None) warnings in pytest 7 (#1264)
  • 067ed4f XFAIL test_child_raises_parent_exits_cleanly with multiprocessing (#1339)
  • ac4ebd5 MAINT add back pytest warnings plugin (#1337)
  • a23427d Test child raises parent exits cleanly more reliable on macos (#1335)
  • ac09691 [MAINT] various test updates (#1334)
  • 4a314b1 Vendor loky 3.2.0 (#1333)
  • bdf47e9 Make test_parallel_with_interactively_defined_functions_default_backend timeo...
  • Additional commits viewable in compare view


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/aertslab/SCENICprotocol/network/alerts).

Metadata for aucell output

opened on 2022-09-06 23:02:49 by LinearParadox

The example for creating a SCope compatible loom file calls an objects metadata. However, the loomfile, at least as it is outputed by PySCENIC, seems to contain no metadata field.

Releases

v0.2.0 2020-02-14 10:14:19

Changes summary: * GRN inference step: multiprocessing is now used by default in place of dask to run GRNBoost2 * Images are now always pulled from DockerHub * Requirements file updated * New tiny test dataset (reduced run time) and test profile * Additions to the PBMC10k tutorial (AUC threshold) * Code and readme cleanup * Additional (optional) parameter passes a fixed seed to the GRNBoost2 algorithm (e.g. --seed 777)