Assess the quality of microbial genomes recovered from isolates, single cells, and metagenomes

Ecogenomics, updated 🕥 2022-10-31 13:53:30

CheckM

version status Bioconda Downloads BioConda Install

Installing and using CheckM

Please see the project home page for usage details and installation instructions: https://github.com/Ecogenomics/CheckM/wiki

We do not recommend installing CheckM from the master branch. This may be unstable. Please install an official release of CheckM or use pip.

Estimating quality of CPR genomes

Information about obtaining improved quality estimates for CPR (Patescibacteria) genomes can be found here: https://github.com/Ecogenomics/CheckM/wiki/Workflows#using-cpr-marker-set

Migration to Python 3

CheckM has been ported to Python 3 to accomodate Python 2 reaching end of life on January 1, 2020. CheckM >=1.1.0 requires Python 3. Python 2 will no longer be actively supported. Apologies for any issues this may cause.

Massive thanks to baudrly, Vini Salazar, and Asaf Peer for initial Python 2 to 3 porting.

Python 2 to 3 Validation

Porting of CheckM to Python 3 was validation on a set of 1,000 genomes randomly select from the GTDB R89 representative genomes. Results were compared to those generated with CheckM v1.0.18, the last Python 2 version of CheckM. Identical results were obtained for the 'lineage_wf', 'taxonomy_wf', and 'ssu_finder' methods across this set of test genomes. Other CheckM methods have been executed on a small set of 3 genomes to verify they run to completion under Python 3.

Removed Functionality

The following features have been removed from CheckM v1.1.x in order to simplify the code base and focus CheckM and support requests on critical functionality: * bin_qa_plot: non-critical, rarely used plot which does not scale to the large numbers of MAGs now being recovered * par_plot: non-critical plot and the same information is better presented in the reference distribution plots * cov_pca, tetra_pca: alternatives to these static plots exist in tools such as Anvi'o * len_plot: rarely used plot which is largely redundant with the len_hist and nx_plot plots * bin_union, bin_compare: feature rich alternative now exist such as DAS Tool and UniteM

Bug Reports

Please report bugs through the GitHub issues system.

Copyright © 2014 Donovan Parks, Connor Skennerton, Michael Imelfort. See LICENSE for further details.

Issues

Customizing CheckM Dataset & Gene Marker Selection for a Species

opened on 2023-03-21 02:29:49 by Leytoncito

On some occasions, CheckM's broad marker set does not encompass all species we wish to analyze, so it is necessary to create a customized dataset. As I am new to using CheckM, I am unsure of how to select the appropriate gene markers for the species I am studying. My idea is to construct a pangenome with complete reference genomes, identify the core genes of single-copy, align them, and build hmm profiles using hmmbuild. Finally, follow CheckM's documentation workflow. For example, for species X, I obtained a core-genome of ~1800 genes. I wonder if it is necessary to use all of these genes and if it is acceptable to include other species in the pangenome analysis to obtain a set of differentiating genes.

Can you please help me with this question? Thank you in advance.

Benjamin

bin_qa_plot

opened on 2023-01-30 08:31:51 by Anna-pro

Hello, I would like to use "bin_qa_plot" for a better visualization, but noticed that this function is not available any more. Is there still a way to produce it or something similar? I have a list of ~35 genomes. Thank you!

Unexpected error: <class 'RecursionError'>

opened on 2022-12-25 07:08:35 by MingyueCheng

Command: checkm lineage_wf -x fa -t 20 --pplacer_threads 20 --tmpdir ./checkm_tmp ./bin ./checkm

Can anybody help me !! Many thanks to you!! I just wanna get a summary table of 3W metagenomic bins, with completeness, contamination, strain heterogeneity... and some other things like these. Did this error occur because of the large number of bins?

[2022-12-24 19:34:28] INFO: CheckM v1.2.2 [2022-12-24 19:34:28] INFO: checkm lineage_wf -x fa -t 20 --pplacer_threads 20 --tmpdir ./checkm_tmp ./bin ./checkm [2022-12-24 19:34:28] INFO: CheckM data: /opt/service/miniconda3/envs/r4_py37_env/checkm_data [2022-12-24 19:34:28] INFO: [CheckM - tree] Placing bins in reference genome tree. [2022-12-24 20:51:16] INFO: Identifying marker genes in 33728 bins with 20 threads: [2022-12-25 05:16:17] INFO: Saving HMM info to file. [2022-12-25 05:16:28] INFO: Calculating genome statistics for 33728 bins with 20 threads: [2022-12-25 05:20:48] INFO: Extracting marker genes to align. [2022-12-25 05:20:48] INFO: Parsing HMM hits to marker genes: [2022-12-25 05:53:53] INFO: Extracting 43 HMMs with 20 threads: [2022-12-25 05:53:55] INFO: Aligning 43 marker genes with 20 threads: [2022-12-25 06:09:50] INFO: Reading marker alignment files. [2022-12-25 06:09:53] INFO: Concatenating alignments. [2022-12-25 06:09:56] INFO: Placing 33728 bins into the genome tree with pplacer (be patient). [2022-12-25 09:57:01] INFO: { Current stage: 14:22:33.383 || Total: 14:22:33.383 } [2022-12-25 09:57:03] INFO: [CheckM - lineage_set] Inferring lineage-specific marker sets. [2022-12-25 09:57:03] INFO: Reading HMM info from file. [2022-12-25 09:57:15] INFO: Parsing HMM hits to marker genes: [2022-12-25 10:22:47] INFO: Determining marker sets for each genome bin.

Unexpected error:

ERROR detail:

Traceback (most recent call last): File "/opt/service/miniconda3/envs/r4_py37_env/bin/checkm", line 856, in checkmParser.parseOptions(args) File "/opt/service/miniconda3/envs/r4_py37_env/lib/python3.7/site-packages/checkm/main.py", line 980, in parseOptions self.lineageSet(options) File "/opt/service/miniconda3/envs/r4_py37_env/lib/python3.7/site-packages/checkm/main.py", line 265, in lineageSet resultsParser, options.unique, options.multi) File "/opt/service/miniconda3/envs/r4_py37_env/lib/python3.7/site-packages/checkm/treeParser.py", line 485, in getBinMarkerSets tree = dendropy.Tree.get_from_path(treeFile, schema='newick', rooting="force-rooted", preserve_underscores=True) File "/opt/service/miniconda3/envs/r4_py37_env/lib/python3.7/site-packages/dendropy/datamodel/basemodel.py", line 219, in get_from_path kwargs) File "/opt/service/miniconda3/envs/r4_py37_env/lib/python3.7/site-packages/dendropy/datamodel/treemodel.py", line 2663, in _parse_and_create_from_stream global_annotations_target=None) File "/opt/service/miniconda3/envs/r4_py37_env/lib/python3.7/site-packages/dendropy/dataio/ioservice.py", line 375, in read_tree_lists global_annotations_target=global_annotations_target) File "/opt/service/miniconda3/envs/r4_py37_env/lib/python3.7/site-packages/dendropy/dataio/newickreader.py", line 326, in _read tree_factory=tree_factory): File "/opt/service/miniconda3/envs/r4_py37_env/lib/python3.7/site-packages/dendropy/dataio/newickreader.py", line 304, in tree_iter taxon_symbol_map_fn=taxon_symbol_mapper.require_taxon_for_symbol) File "/opt/service/miniconda3/envs/r4_py37_env/lib/python3.7/site-packages/dendropy/dataio/newickreader.py", line 386, in _parse_tree_statement is_internal_node=None) File "/opt/service/miniconda3/envs/r4_py37_env/lib/python3.7/site-packages/dendropy/dataio/newickreader.py", line 562, in _parse_tree_node_description is_internal_node=is_new_internal_node, File "/opt/service/miniconda3/envs/r4_py37_env/lib/python3.7/site-packages/dendropy/dataio/newickreader.py", line 562, in _parse_tree_node_description is_internal_node=is_new_internal_node, File "/opt/service/miniconda3/envs/r4_py37_env/lib/python3.7/site-packages/dendropy/dataio/newickreader.py", line 562, in _parse_tree_node_description is_internal_node=is_new_internal_node, [Previous line repeated 979 more times] File "/opt/service/miniconda3/envs/r4_py37_env/lib/python3.7/site-packages/dendropy/dataio/newickreader.py", line 553, in _parse_tree_node_description new_node = tree.node_factory(); File "/opt/service/miniconda3/envs/r4_py37_env/lib/python3.7/site-packages/dendropy/datamodel/treemodel.py", line 3027, in node_factory return Node(kwargs) File "/opt/service/miniconda3/envs/r4_py37_env/lib/python3.7/site-packages/dendropy/datamodel/treemodel.py", line 1028, in init length=kwargs.pop("edge_length", None)) File "/opt/service/miniconda3/envs/r4_py37_env/lib/python3.7/site-packages/dendropy/datamodel/treemodel.py", line 1002, in edge_factory return Edge(**kwargs) File "/opt/service/miniconda3/envs/r4_py37_env/lib/python3.7/site-packages/dendropy/datamodel/treemodel.py", line 747, in init basemodel.DataObject.init(self, label=kwargs.pop("label", None)) RecursionError: maximum recursion depth exceeded while calling a Python object

Is there a possibility to have the distance between the putative genome and the closest reference genome?

opened on 2022-12-20 10:37:54 by dgrissa

As part of my research work, I would like to know the different reasons for having genomes with bad scores for completeness usng CheckM. For that, I would like to know the distance between the putative genome and the closest reference genome. Would that be possible or easy to compute? Thank you in advance for your answer.

checkm coverage issue

opened on 2022-11-28 10:31:46 by confusedWXX

Hi experts, I have the confusions when I used the checkm coverage. I found more than half of my reads failing QC according to checkm program. However, my reads were cleaned and passed QC initially. That's a bit weird. Another issue I met was checkm coverage program is easily stuck when processing reads, does it mean the RAM of my work station is not enough?

Thanks a lot!

Best Regards, Wenxuan

Are bin_compare and bin_qa_plot functions still working?

opened on 2022-11-17 16:50:32 by snehamurthy21

Hello,

I was trying to compare bins across different algorithms and came across this functions bin_compare and bin_qa_plot under checkm, but both seem to no longer work? is it removed? or am i doing something wrong.

### checkm bin_compare usage: checkm {data,tree,tree_qa,lineage_set,taxon_list,taxon_set,analyze,qa,lineage_wf,taxonomy_wf,gc_plot,coding_plot,tetra_plot,dist_plot,gc_bias_plot,nx_plot,len_hist,marker_plot,unbinned,coverage,tetra,profile,ssu_finder,merge,outliers,modify,unique,test} ... checkm: error: argument subparser_name: invalid choice: 'bin_compare' (choose from 'data', 'tree', 'tree_qa', 'lineage_set', 'taxon_list', 'taxon_set', 'analyze', 'qa', 'lineage_wf', 'taxonomy_wf', 'gc_plot', 'coding_plot', 'tetra_plot', 'dist_plot', 'gc_bias_plot', 'nx_plot', 'len_hist', 'marker_plot', 'unbinned', 'coverage', 'tetra', 'profile', 'ssu_finder', 'merge', 'outliers', 'modify', 'unique', 'test')

Releases

v1.2.2 2022-10-31 13:52:47

  • support gzipped protein files as input (thanks to alienzj for the PR)

v1.2.1 2022-08-03 22:28:24

  • renamed private classes to resolve multiprocessing AttributeError (thanks to misialq for the PR!)

v1.2.0 2022-04-30 17:44:24

  • modified how bin IDs are identified to improve output tables when processing compressed files (e.g. output bin ID will be my_bin instead of my_bin.fna)

v1.1.11 2022-04-21 22:40:24

  • fixed error with inverse_transformed being deprecated in newer versions of Matplotlib
  • updated minimum required version of Python packages

v1.1.10 2022-04-21 22:40:24

  • fixed bug with missing bCalledGenes flag that was impacting a number of commands

v1.1.9 2022-04-14 19:41:34

  • fixed support for gzip input FASTA files
Australian Centre for Ecogenomics
GitHub Repository Homepage