Chanjo is coverage analysis for clinical sequencing. It's implemented in Python with a command line interface that adheres to UNIX pipeline philosophy.
If you find chanjo useful in your project, please cite the article.
Chanjo is distributed through pip
. Install the latest stable release by
running:
bash
pip install chanjo
... or locally for development:
bash
git clone https://github.com/Clinical-Genomics/chanjo.git
cd chanjo
conda install --channel bioconda sambamba
pip install -r requirements-dev.txt --editable .
Chanjo exposes a decomposable command line interface with a nifty config file implementation.
bash
chanjo init --setup
chanjo load /path/to/sambamba.output.bed
chanjo calculate mean
{"metrics": {"completeness_10": 90.92, "mean_coverage": 193.85}, "sample_id": "sample1"}
When running the dockerized version of Chanjo the setup process is slightly different. Chanjo depends on a configuration file config.yaml
and either a sqlite database chanjo.coverage.sqlite3
or a MySQL database
, which are created at initialization. For convenience, we provide a docker-compose file containing a mariadb (MySQL-based) service and the chanjo-command line that can be used to set up a demo instance of Chanjo.
Since the database set up (chanjo init command) and sample data insertion are executed by two distinct instances of the same service (chanjo-cli), Docker volumes must be used to make sure that the database instance has data continuity during the two steps.
The following examples demonstrate how to set up Chanjo using the docker-compose file using the default definition of exons (init demo files are present in folder chanjo/init/demo-files
). The config file and the creted database will be stored on the host in a folder named data
, which is mirrored by folder /home/worker/data
in the chanjo container . Other exon definitions can be used by mounting them to the container.
```bash
docker-compose build
bash
docker-compose run --rm -v "${PWD}/data:/home/worker/data" -v "${PWD}/data/database:/home/worker/data/database" chanjo-cli bash -c "chanjo -d mysql+pymysql://chanjoUser:[email protected]/chanjo4_test init --auto /home/worker/data && chanjo --config /home/worker/data/chanjo.yaml link /home/worker/data/hgnc.grch37p13.exons.bed"
``
This initial step will create a
datafolder containing 2 files:
- hgnc.grch37p13.exons.bed --> Exons definitions
- chanjo.yaml --> Contains the database URI, so in the next step you can use this config file instead of
-d mysql+pymysql://chanjoUser:[email protected]/chanjo4_test`
```bash
docker-compose run --rm -v "${PWD}/data:/home/worker/data" -v "${PWD}/data/database:/home/worker/data/database" chanjo-cli bash -c "chanjo --config /home/worker/data/chanjo.yaml load /home/worker/app/chanjo/init/demo-files/sample1.coverage.bed" ```
```bash
docker-compose run --rm -v "${PWD}/data:/home/worker/data" -v "${PWD}/data/database:/home/worker/data/database" chanjo-cli bash -c "chanjo init --auto /home/worker/data && chanjo --config /home/worker/data/chanjo.yaml link /home/worker/data/hgnc.grch37p13.exons.bed"
docker-compose run --rm -v "${PWD}/data/chanjo.coverage.sqlite3:/home/worker/app/chanjo.coverage.sqlite3" -v "${PWD}/data:/home/worker/data" chanjo-cli bash -c "chanjo --config /home/worker/data/chanjo.yaml load /home/worker/app/chanjo/init/demo-files/sample1.coverage.bed"
Read the Docs is hosting the official documentation.
If you are looking to learn more about handling sequence coverage data in clinical sequencing, feel free to download and skim through my own Master's thesis and article references.
Chanjo leverages Sambamba to annotate coverage and completeness for a general BED-file. The output can then easily to loaded into a SQL database that enables investigation of coverage across regions and samples. The database also works as an API to downstream tools like the Chanjo Coverage Report generator.
Chanjo is not the right choice if you care about coverage for every base across the entire genome. Detailed histograms is something BEDTools already handles with confidence.
MIT. See the LICENSE file for more details.
Anyone can help make this project better - read CONTRIBUTION to get started!
Bumps ipython from 7.13.0 to 8.10.0.
Sourced from ipython's releases.
See https://pypi.org/project/ipython/
We do not use GitHub release anymore. Please see PyPI https://pypi.org/project/ipython/
15ea1ed
release 8.10.0560ad10
DOC: Update what's new for 8.10 (#13939)7557ade
DOC: Update what's new for 8.10385d693
Merge pull request from GHSA-29gw-9793-fvw7e548ee2
Swallow potential exceptions from showtraceback() (#13934)0694b08
MAINT: mock slowest test. (#13885)8655912
MAINT: mock slowest test.a011765
Isolate the attack tests with setUp and tearDown methodsc7a9470
Add some regression tests for this changefd34cf5
Swallow potential exceptions from showtraceback()Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps certifi from 2020.4.5.1 to 2022.12.7.
9e9e840
2022.12.07b81bdb2
2022.09.24939a28f
2022.09.14aca828a
2022.06.15.2de0eae1
Only use importlib.resources's new files() / Traversable API on Python ≥3.11 ...b8eb5e9
2022.06.15.147fb7ab
Fix deprecation warning on Python 3.11 (#199)b0b48e0
fixes #198 -- update link in license9d514b4
2022.06.154151e88
Add py.typed to MANIFEST.in to package in sdist (#196)Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps joblib from 0.14.1 to 1.2.0.
Sourced from joblib's changelog.
Release 1.2.0
Fix a security issue where
eval(pre_dispatch)
could potentially run arbitrary code. Now only basic numerics are supported. joblib/joblib#1327Make sure that joblib works even when multiprocessing is not available, for instance with Pyodide joblib/joblib#1256
Avoid unnecessary warnings when workers and main process delete the temporary memmap folder contents concurrently. joblib/joblib#1263
Fix memory alignment bug for pickles containing numpy arrays. This is especially important when loading the pickle with
mmap_mode != None
as the resultingnumpy.memmap
object would not be able to correct the misalignment without performing a memory copy. This bug would cause invalid computation and segmentation faults with native code that would directly access the underlying data buffer of a numpy array, for instance C/C++/Cython code compiled with older GCC versions or some old OpenBLAS written in platform specific assembly. joblib/joblib#1254Vendor cloudpickle 2.2.0 which adds support for PyPy 3.8+.
Vendor loky 3.3.0 which fixes several bugs including:
robustly forcibly terminating worker processes in case of a crash (joblib/joblib#1269);
avoiding leaking worker processes in case of nested loky parallel calls;
reliability spawn the correct number of reusable workers.
Release 1.1.0
Fix byte order inconsistency issue during deserialization using joblib.load in cross-endian environment: the numpy arrays are now always loaded to use the system byte order, independently of the byte order of the system that serialized the pickle. joblib/joblib#1181
Fix joblib.Memory bug with the
ignore
parameter when the cached function is a decorated function.
... (truncated)
5991350
Release 1.2.03fa2188
MAINT cleanup numpy warnings related to np.matrix in tests (#1340)cea26ff
CI test the future loky-3.3.0 branch (#1338)8aca6f4
MAINT: remove pytest.warns(None) warnings in pytest 7 (#1264)067ed4f
XFAIL test_child_raises_parent_exits_cleanly with multiprocessing (#1339)ac4ebd5
MAINT add back pytest warnings plugin (#1337)a23427d
Test child raises parent exits cleanly more reliable on macos (#1335)ac09691
[MAINT] various test updates (#1334)4a314b1
Vendor loky 3.2.0 (#1333)bdf47e9
Make test_parallel_with_interactively_defined_functions_default_backend timeo...Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps mkdocs from 1.1 to 1.2.3.
Sourced from mkdocs's releases.
1.2.3
MkDocs 1.2.3 is a bugfix release for MkDocs 1.2.
Aside: MkDocs has a new chat room on Gitter/Matrix. More details.
Improvements:
Built-in themes now also support these languages:
Third-party plugins will take precedence over built-in plugins with the same name (#2591)
Bugfix: Fix ability to load translations for some languages: core support (#2565) and search plugin support with fallbacks (#2602)
Bugfix (regression in 1.2): Prevent directory traversal in the dev server (#2604)
Bugfix (regression in 1.2): Prevent webserver warnings from being treated as a build failure in strict mode (#2607)
Bugfix: Correctly print colorful messages in the terminal on Windows (#2606)
Bugfix: Python version 3.10 was displayed incorrectly in
--version
(#2618)Other small improvements; see commit log.
1.2.2
MkDocs 1.2.2 is a bugfix release for MkDocs 1.2 -- make sure you've seen the "major" release notes as well.
Bugfix (regression in 1.2): Fix serving files/paths with Unicode characters (#2464)
Bugfix (regression in 1.2): Revert livereload file watching to use polling observer (#2477)
This had to be done to reasonably support usages that span virtual filesystems such as non-native Docker and network mounts.
This goes back to the polling approach, very similar to that was always used prior, meaning most of the same downsides with latency and CPU usage.
Revert from 1.2: Remove the requirement of a
site_url
config and the restriction onuse_directory_urls
(#2490)Bugfix (regression in 1.2): Don't require trailing slash in the URL when serving a directory index in
mkdocs serve
server (#2507)Instead of showing a 404 error, detect if it's a directory and redirect to a path with a trailing slash added, like before.
Bugfix: Fix
gh_deploy
with config-file in the current directory (#2481)Bugfix: Fix reversed breadcrumbs in "readthedocs" theme (#2179)
Allow "mkdocs.yaml" as the file name when '--config' is not passed (#2478)
... (truncated)
d167eab
Release 1.2.3 (#2614)5629b09
Re-format translation files to pass a lint check (#2621)2c4679b
Re-format translation files to pass a lint check (#2620)9262cc5
Fix the code to abbreviate Python's version (#2618)8345850
Add hint about -f/--config-file
in configuration documentation (#2616)815af48
Added translation for Brazilian Portuguese (#2535)6563439
Update contact instructions: announce chat, preference for issues (#2610)6b72eef
We can again announce support of zh_CN locale (#2609)b18ae29
Drop assert_mock_called_once
compat method from tests (#2611)7a27572
Isolate strict warning counter to just the ongoing build (#2607)Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps nltk from 3.5 to 3.6.6.
Sourced from nltk's changelog.
Version 3.7 2022-02-09
- Improve and update the NLTK team page on nltk.org (#2855, #2941)
- Drop support for Python 3.6, support Python 3.10 (#2920)
Version 3.6.7 2021-12-28
- Resolve IndexError in
sent_tokenize
andword_tokenize
(#2922)Version 3.6.6 2021-12-21
- Refactor
gensim.doctest
to work for gensim 4.0.0 and up (#2914)- Add Precision, Recall, F-measure, Confusion Matrix to Taggers (#2862)
- Added warnings if .zip files exist without any corresponding .csv files. (#2908)
- Fix
FileNotFoundError
when thedownload_dir
is a non-existing nested folder (#2910)- Rename omw to omw-1.4 (#2907)
- Resolve ReDoS opportunity by fixing incorrectly specified regex (#2906)
- Support OMW 1.4 (#2899)
- Deprecate Tree get and set node methods (#2900)
- Fix broken inaugural test case (#2903)
- Use Multilingual Wordnet Data from OMW with newer Wordnet versions (#2889)
- Keep NLTKs "tokenize" module working with pathlib (#2896)
- Make prettyprinter to be more readable (#2893)
- Update links to the nltk book (#2895)
- Add
CITATION.cff
to nltk (#2880)- Resolve serious ReDoS in PunktSentenceTokenizer (#2869)
- Delete old CI config files (#2881)
- Improve Tokenize documentation + add TokenizerI as superclass for TweetTokenizer (#2878)
- Fix expected value for BLEU score doctest after changes from #2572
- Add multi Bleu functionality and tests (#2793)
- Deprecate 'return_str' parameter in NLTKWordTokenizer and TreebankWordTokenizer (#2883)
- Allow empty string in CFG's + more (#2888)
- Partition
tree.py
module intotree
package + pickle fix (#2863)- Fix several TreebankWordTokenizer and NLTKWordTokenizer bugs (#2877)
- Rewind Wordnet data file after each lookup (#2868)
- Correct init call for SyntaxCorpusReader subclasses (#2872)
- Documentation fixes (#2873)
- Fix levenstein distance for duplicated letters (#2849)
- Support alternative Wordnet versions (#2860)
- Remove hundreds of formatting warnings for nltk.org (#2859)
- Modernize
nltk.org/howto
pages (#2856)- Fix Bleu Score smoothing function from taking log(0) (#2839)
- Update third party tools to newer versions and removing MaltParser fixed version (#2832)
- Fix TypeError: _pretty() takes 1 positional argument but 2 were given in sem/drt.py (#2854)
- Replace
http
withhttps
in most URLs (#2852)Thanks to the following contributors to 3.6.6 Adam Hawley, BatMrE, Danny Sepler, Eric Kafe, Gavish Poddar, Panagiotis Simakis, RnDevelover, Robby Horvath, Tom Aarsen, Yuta Nakamura, Mohaned Mashaly
... (truncated)
4862b09
updates for 3.6.66b60213
Refactor gensim.doctest
to work for gensim 4.0.0 and up (#2914)59aa3fb
Fix decode error for bllip parser (#2897)a28d256
Add Precision, Recall, F-measure, Confusion Matrix to Taggers (#2862)72d9885
Added warnings if .zip files exist without any corresponding .csv files. (#2908)dea7b44
Fix FileNotFoundError
when the download_dir
is a non-existing nested fold...abbe86b
Undo #2909 due to unexpected test failurec075dab
Allow commits with /nocache
to not use the cache (#2909)d6d513d
Renamed omw to omw-1.4 (#2907)2a50a3e
Resolve ReDoS opportunity by fixing incorrectly specified regex (#2906)Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
pymysql.err.OperationalError: (1055, "Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'chanjo4.transcript_stat.id' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by")
A brand new release of Chanjo is upon us :tada:
This version integrates with Sambamba and drops Pysam/Samtools as a dependency. It also expands on the possibilities to directly calculate coverage metrics from the command line! Please refer to the updated documentation and the CHANGELOG for more on what's new - enjoy!
And a big thanks to @moonso who is joining as a new core contributor for this release :smile:
Get started by:
bash
$ pip install --upgrade chanjo
We have really done a lot of work to clean up the interface You can now directly query the database for interesting coverage metrics We rely on Sambamba to generate coverage stats from BAM files
Chanjo can now be referenced in scientific journals using DOI.
Chanjo will now automatically keep track of which cutoff you used for completeness, creation dates, and source BAM file for each sample.
Also the docs have been updated with info on SQL structure
Chanjo is now able to handle multiple samples and store them in the same SQLite database. This is great for comparing different samples and keeping all coverage annotations in one place.
A lot of (breaking) changes has had to be made to get to this point. Do check out the updated "chanjo-autopilot" help text if you plan on using the CLI.
I've now also limited Chanjo in scope: it will be focused on reading coverage data and annotating elements. Everything else downstream will be handled by a separate, although, connected package. Announcement later :)
Lastly Chanjo now extends the SQL schema structure found in my own new Elemental DB project.
NOTICE: I've now decided the focus for Chanjo which will be threefold: 1. Setting up a datastore of elements, 2. Getting coverage data as a list of read depths, 3. Annotating elements with relevant coverage metrics.
Stay tuned to find out more about project integrating with Chanjo downstream!
bioconda coverage sambamba python sql genomics