Organelle Genome Assembly Toolkit (Chloroplast/Mitocondrial/ITS)

Kinggerm, updated 🕥 2023-03-30 21:21:49

GetOrganelle

Anaconda-Server Badge Anaconda-Server Badge Anaconda-Server Badge

GitHub release

notice: please update to 1.7.5+, which fixed the bug on the multiplicity estimation of self-loop vertices.

This toolkit assemblies organelle genome from genomic skimming data.

It achieved the best performance overall both on simulated and real data and was recommended as the default for chloroplast genome assembly in a third-party comparison paper (Freudenthal et al. 2020. Genome Biology).

Please denote the version of GetOrganelle as well as the dependencies in your manuscript for reproducible science.

Citation: Jian-Jun Jin, Wen-Bin Yu, Jun-Bo Yang, Yu Song, Claude W. dePamphilis, Ting-Shuang Yi, De-Zhu Li. GetOrganelle: a fast and versatile toolkit for accurate de novo assembly of organelle genomes. Genome Biology 21, 241 (2020). https://doi.org/10.1186/s13059-020-02154-5

License: GPL https://www.gnu.org/licenses/gpl-3.0.html

Please also cite the dependencies if used:

SPAdes: Bankevich, A., S. Nurk, D. Antipov, A. A. Gurevich, M. Dvorkin, A. S. Kulikov, V. M. Lesin, S. I. Nikolenko, S. Pham, A. D. Prjibelski, A. V. Pyshkin, A. V. Sirotkin, N. Vyahhi, G. Tesler, M. A. Alekseyev and P. A. Pevzner. 2012. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. Journal of Computational Biology 19: 455-477.

Bowtie2: Langmead, B. and S. L. Salzberg. 2012. Fast gapped-read alignment with Bowtie 2. Nature Methods 9: 357-359.

BLAST+: Camacho, C., G. Coulouris, V. Avagyan, N. Ma, J. Papadopoulos, K. Bealer and T. L. Madden. 2009. BLAST+: architecture and applications. BMC Bioinformatics 10: 421.

Bandage: Wick, R. R., M. B. Schultz, J. Zobel and K. E. Holt. 2015. Bandage: interactive visualization of de novo genome assemblies. Bioinformatics 31: 3350-3352.

Installation & Initialization

GetOrganelle is currently maintained under Python 3.7.0, but designed to be compatible with versions higher than 3.5.1 and 2.7.11. It was built for Linux and macOS. Windows Subsystem Linux is currently not supported, we are working on this.

  • The easiest way to install GetOrganelle and its dependencies is using conda:

    conda install -c bioconda getorganelle

You have to install Anaconda or Miniconda before using the above command. If you don't like conda, or want to follow the latest updates, you can find more installation options here (my preference).

  • After installation of GetOrganelle v1.7+, please download and initialize the database of your preferred organelle genome type (embplant_pt, embplant_mt, embplant_nr, fungus_mt, fungus_nr, animal_mt, and/or other_pt). Supposing you are assembling chloroplast genomes:

    get_organelle_config.py --add embplant_pt,embplant_mt

If connection keeps failing, please manually download the latest database from GetOrganelleDB and initialization from local files.

The database will be located at ~/.GetOrganelle by default, which can be changed via the command line parameter --config-dir, or via the shell environment variable GETORG_PATH (see more here).

Test

Download a simulated Arabidopsis thaliana WGS dataset:

wget https://github.com/Kinggerm/GetOrganelleGallery/raw/master/Test/reads/Arabidopsis_simulated.1.fq.gz
wget https://github.com/Kinggerm/GetOrganelleGallery/raw/master/Test/reads/Arabidopsis_simulated.2.fq.gz

then verify the integrity of downloaded files using md5sum:

md5sum Arabidopsis_simulated.*.fq.gz
# 935589bc609397f1bfc9c40f571f0f19  Arabidopsis_simulated.1.fq.gz
# d0f62eed78d2d2c6bed5f5aeaf4a2c11  Arabidopsis_simulated.2.fq.gz
# Please re-download the reads if your md5 values unmatched above

then do the fast plastome assembly (memory: ~600MB, CPU time: ~60s):

get_organelle_from_reads.py -1 Arabidopsis_simulated.1.fq.gz -2 Arabidopsis_simulated.2.fq.gz -t 1 -o Arabidopsis_simulated.plastome -F embplant_pt -R 10

You are going to get a similar running log as here and the same result as here.

Find more real data examples at GetOrganelle/wiki/Examples, GetOrganelleGallery and GetOrganelleComparison.

Instruction

Find more organelle genome assembly instruction at GetOrganelle/wiki.

In most cases, what you actually need to do is just typing in one simple command as suggested in Recipes. But you are still highly recommended reading the following minimal introductions:

Starting from Reads

The green workflow in the flowchart below shows the processes of get_organelle_from_reads.py.

  • Input data

    Currently, get_organelle_from_reads.py was written for illumina pair-end/single-end data (fastq or fastq.gz). We recommend using adapter-trimmed raw reads without quality control. Usually, >1G per end is enough for plastome for most normal angiosperm samples, and >5G per end is enough for mitochondria genome assembly. Since v1.6.2, get_organelle_from_reads.py will automatically estimate the read data it needs, without user assignment nor data reducing (see flags --reduce-reads-for-coverage and --max-reads).

  • Main Options

    • -w The value word size, like the kmer in assembly, is crucial to the feasibility and efficiency of this process. The best word size changes upon data and will be affected by read length, read quality, base coverage, organ DNA percent and other factors. By default, GetOrganelle would automatically estimate a proper word size based on the data characters. Although the automatically-estimated word size value does not ensure the best performance nor the best result, you do not need to adjust this value (-w) if a complete/circular organelle genome assembly is produced, because the circular result generated by GetOrganelle is highly consistent under different options and seeds. The automatically estimated word size may be screwy in some animal mitogenome data due to inaccurate coverage estimation, for which you fine-tune it instead.

    • -k The best kmer(s) depend on a wide variety of factors too. Although more kmer values add the time consuming, you are recommended to use a wide range of kmers to benefit from the power of SPAdes. Empirically, you should include at least including one small kmer (e.g. 21) and one large kmer (85) for a successful organelle genome assembly. The largest kmer in the gradient may be crucial to the success rate of achieving the complete circular organelle genome.

    • -s GetOrganelle takes the seed (fasta format; if this was not provided, the default is GetOrganelleLib/SeedDatabase/*.fasta) as probe, the script would recruit target reads in successive rounds (extending process). The default seed works for most samples, but using a complete organelle genome sequence of a related species as the seed would help the assembly in many cases (e.g. degraded DNA samples, fastly-evolving in animal/fungal samples; see more here).

  • Key Results

    The key output files include

    • *.path_sequence.fasta, each fasta file represents one type of genome structure
    • *.selected_graph.gfa, the organelle-only assembly graph
    • get_org.log.txt, the log file
    • extended_K*.assembly_graph.fastg, the raw assembly graph
    • extended_K*.assembly_graph.fastg.extend_embplant_pt-embplant_mt.fastg, a simplified assembly graph
    • extended_K*.assembly_graph.fastg.extend_embplant_pt-embplant_mt.csv, a tab-format contig label file for bandage visualization

    You may delete the files other than above if the resulting genome is complete (indicated in the log file and the name of the *.fasta). You are expected to obtain the complete organelle genome assembly for most animal/fungal mitogenomes and plant chloroplast genomes (see here for nuclear ribosomal DNAs) with the recommended recipes.

    If GetOrganelle failed to generate the complete circular genome (produce *scaffolds*path_sequence.fasta), please follow here to adjust your parameters for a second run. You could also use the incomplete sequence to conduct downstream analysis.

Starting from Assembly

The blue workflow in the chat below shows the processes of get_organelle_from_assembly.py.

  • Input data & Main Options

    • -g The input must be a FASTG or GFA formatted assembly graph file.

    • If you input an assembly graph assembled from total DNA sequencing using third-party a de novo assembler (e.g. Velvet), the assembly graph may includes a great amount of non-target contigs. You may want to use --min-depth and --max-depth to greatly reduce the computational burden for target extraction.

    • If you input an organelle-equivalent assembly graph (e.g. manually curated and exported using Bandage), you may use --no-slim.

  • Key Results

    The key output files include

    • *.path_sequence.fasta, one fasta file represents one type of genome structure
    • *.fastg, the organelle related assembly graph to report for improvement and debug
    • *.selected_graph.gfa, the organelle-only assembly graph
    • get_org.log.txt, the log file

GetOrganelle flowchart

flowchart

Recipes

Please refer to the GetOrganelle FAQ to fine-tune the arguments, especially concerning word size, memory, and clock time.

From Reads

  • Embryophyta

    To assembly Embryophyta plant plastid genome (plastome), e.g. using 2G raw data of 150 bp paired reads, typically I use:

    get_organelle_from_reads.py -1 forward.fq -2 reverse.fq -o plastome_output -R 15 -k 21,45,65,85,105 -F embplant_pt

    or in a draft way:

    get_organelle_from_reads.py -1 forward.fq -2 reverse.fq -o plastome_output --fast -k 21,65,105 -w 0.68 -F embplant_pt

    or in a slow and memory-economic way:

    get_organelle_from_reads.py -1 forward.fq -2 reverse.fq -o plastome_output -R 30 -k 21,45,65,85,105 -F embplant_pt --memory-save

    To assembly Embryophyta plant mitochondria genome (mitogenome), usually you need more than 5G raw data:

    get_organelle_from_reads.py -1 forward.fq -2 reverse.fq -o mitochondria_output -R 20 -k 21,45,65,85,105 -P 1000000 -F embplant_mt # 1. please use the FASTG file as the final output for downstream manual processing. until further updates, the FASTA output of plant mitochondria genome of numerous repeats may be error-prone # 2. embplant_mt mode was not tested in the GetOrganelle paper due to the complexity of plant mitogenomes and the defects of short reads. So there is room for improvement in the argument choices.

    To assembly Embryophyta plant nuclear ribosomal RNA (18S-ITS1-5.8S-ITS2-26S):

    get_organelle_from_reads.py -1 forward.fq -2 reverse.fq -o nr_output -R 10 -k 35,85,115 -F embplant_nr # Please also take a look at this FAQ https://github.com/Kinggerm/GetOrganelle/wiki/FAQ#why-does-getorganelle-generate-a-circular-genome-or-not-for-embplant_nrfungus_nr

  • Non-embryophyte

    Non embryophyte plastomes and mitogenomes can be divergent from the embryophyte. We have not explored it very much. But many users have successfully assemble them using GetOrganelle using the default database or a customized database.

    There is a built-in other_pt mode and prepared default database for the non embryophyte plastomes. I would start with -F other_pt and similar options as in the embplant_pt part. However, there is no such built-in mode for non embryophyte mitogenomes. Considering that the sequences may be highly divergent from embplant_mt, besides using similar options as in the embplant_mt part, I would make a pair of customized seed database and label database, then use them to run GetOrganelle following the guidance here.

  • Fungus

    To assembly fungus mitochondria genome:

    get_organelle_from_reads.py -1 forward.fq -2 reverse.fq -R 10 -k 21,45,65,85,105 -F fungus_mt -o fungus_mt_out # if you fail with the default database, use your own seed database and label database with "-s" and "--genes"

    To assembly fungus nuclear ribosomal RNA (18S-ITS1-5.8S-ITS2-28S):

    get_organelle_from_reads.py -1 forward.fq -2 reverse.fq -R 10 -k 21,45,65,85,105 -F fungus_nr -o fungus_nr_out
    # if you fail with the default database, use your own seed database and label database with "-s" and "--genes" # Please also take a look at this FAQ https://github.com/Kinggerm/GetOrganelle/wiki/FAQ#why-does-getorganelle-generate-a-circular-genome-or-not-for-embplant_nrfungus_nr

  • Animal

    To assembly animal mitochondria:

    get_organelle_from_reads.py -1 forward.fq -2 reverse.fq -R 10 -k 21,45,65,85,105 -F animal_mt -o animal_mt_out
    # if you fail with the default database, rerun it using your own seed database (or the output of a first GetOrganelle run) and label database with "-s" and "--genes"

    Animal nuclear ribosomal RNA will be available in the future. Issue136 is the place to follow.

From Assembly Graph

There are as many available organelle types as the From Reads section (see more by get_organelle_from_assembly.py -h), but the simplest usage is not that different. Here is an example to extract the plastid genome from an existing assembly graph (*.fastg/*.gfa; e.g. from long-read sequencing assemblies):

get_organelle_from_assembly.py -F embplant_pt -g ONT_assembly_graph.gfa

Arguments

See a brief illustrations of those arguments by typing in:

get_organelle_from_reads.py -h

or see the detailed illustrations:

get_organelle_from_reads.py --help

The same brief -h and verbose --help menu can be find for get_organelle_from_assembly.py.

You may also find a summary of above information here at Usage.

Contact

Please check GetOrganelle wiki page first. If your question is running specific, please attach the get_org.log.txt file and the post-slimming assembly graph (assembly_graph.fastg.extend_*.fastg, could be Bandage-visualized *.png format to protect your data privacy).

Although older versions like 1.6.3/1.7.1/1.7.6 may be more stable, but we always strongly encourage you to keep updated. GetOrganelle was actively updated with new fixes and new features, but new bugs too. So if you catch one, please do not be surprised and report it to us. We usually have quick response to bugs.

This was previously located at GetOrganelle Issues where you may find old Q&A

Please avoid duplicate and miscellaneous issues

  • GoogleGroups

  • QQ group (ID: 908302723): only for mutual help, and we will no longer reply to questions there

Do NOT directly write to us with your questions, instead please post the questions publicly, using above platforms (we will be informed automatically) or any other platforms (inform us of it). Our emails ([email protected], [email protected]) are only for receiving public question alert and private data (if applied) associated with those public questions. When you send your private data to us, enclose the email with a link where you posted the question. Our only reply emails will be a receiving confirmation, while our answers will be posted in a public place.

Issues

ERROR: Assembling failed due to SPAdes failure

opened on 2023-03-26 10:53:41 by njbayonav

I installed the last version of getorganelle: 1.7.7.0, and then it would not work with Spades as many have pointed out. Thus I downloaded the dependencies from the authors (GetOrganelleDep), and what I did was to edit the file 'spades_init.py' within the bin of getorganelle to replace: 'spades_home = abspath(dirname(realpath(file)))' for 'spades_home = 'path_to_/GetOrganelleDep/macOS/SPAdes'. I also change the spades.py path.

When I try to run the test, it runs, but the pre-assembly and assembly fails. Please see below log:

get_organelle_from_reads.py -1 Arabidopsis_simulated.1.fq.gz -2 Arabidopsis_simulated.2.fq.gz -t 1 -o Arabidopsis_simulated.plastome -F embplant_pt -R 10 --overwrite

2023-03-26 06:32:31,652 - INFO: Pre-reading fastq ... 2023-03-26 06:32:31,652 - INFO: Estimating reads to use ... (to use all reads, set '--reduce-reads-for-coverage inf --max-reads inf') 2023-03-26 06:32:31,752 - INFO: Estimating reads to use finished. 2023-03-26 06:32:31,752 - INFO: Unzipping reads file: Arabidopsis_simulated.1.fq.gz (8796915 bytes) 2023-03-26 06:32:31,863 - INFO: Unzipping reads file: Arabidopsis_simulated.2.fq.gz (9067061 bytes) 2023-03-26 06:32:31,969 - INFO: Counting read qualities ... 2023-03-26 06:32:32,111 - INFO: Identified quality encoding format = Illumina 1.8+ 2023-03-26 06:32:32,112 - INFO: Phred offset = 33 2023-03-26 06:32:32,113 - INFO: Trimming bases with qualities (0.00%): 33..33 ! 2023-03-26 06:32:32,169 - INFO: Mean error rate = 0.0019 2023-03-26 06:32:32,171 - INFO: Counting read lengths ... 2023-03-26 06:32:32,348 - INFO: Mean = 150.0 bp, maximum = 150 bp. 2023-03-26 06:32:32,348 - INFO: Reads used = 91563+91563 2023-03-26 06:32:32,348 - INFO: Pre-reading fastq finished.

2023-03-26 06:32:32,348 - INFO: Making seed reads ... 2023-03-26 06:32:32,349 - INFO: Seed bowtie2 index existed! 2023-03-26 06:32:32,349 - INFO: Mapping reads to seed bowtie2 index ... 2023-03-26 06:32:42,606 - INFO: Mapping finished. 2023-03-26 06:32:42,606 - INFO: Seed reads made: Arabidopsis_simulated.plastome/seed/embplant_pt.initial.fq (14144302 bytes) 2023-03-26 06:32:42,606 - INFO: Making seed reads finished.

2023-03-26 06:32:42,606 - INFO: Checking seed reads and parameters ... 2023-03-26 06:32:42,606 - INFO: The automatically-estimated parameter(s) do not ensure the best choice(s). 2023-03-26 06:32:42,606 - INFO: If the result graph is not a circular organelle genome, 2023-03-26 06:32:42,607 - INFO: you could adjust the value(s) of '-w'/'-R' for another new run. 2023-03-26 06:32:45,467 - INFO: Pre-assembling mapped reads ... 2023-03-26 06:32:45,963 - INFO: Retrying with more reads .. 2023-03-26 06:33:05,013 - WARNING: Pre-assembling failed. The estimations for embplant_pt-hitting base-coverage and word size may be misleading. 2023-03-26 06:33:08,240 - INFO: Estimated embplant_pt-hitting base-coverage = 52.85 2023-03-26 06:33:08,434 - INFO: Estimated word size(s): 98 2023-03-26 06:33:08,434 - INFO: Setting '-w 98' 2023-03-26 06:33:08,434 - INFO: Setting '--max-extending-len inf' 2023-03-26 06:33:08,515 - INFO: Checking seed reads and parameters finished.

2023-03-26 06:33:08,515 - INFO: Making read index ... 2023-03-26 06:33:09,377 - INFO: 178623 candidates in all 183126 reads 2023-03-26 06:33:09,377 - INFO: Pre-grouping reads ... 2023-03-26 06:33:09,377 - INFO: Setting '--pre-w 98' 2023-03-26 06:33:09,391 - INFO: 4074/4074 used/duplicated 2023-03-26 06:33:09,629 - INFO: 517 groups made. 2023-03-26 06:33:09,637 - INFO: Making read index finished.

2023-03-26 06:33:09,637 - INFO: Extending ... 2023-03-26 06:33:09,637 - INFO: Adding initial words ... 2023-03-26 06:33:11,324 - INFO: AW 1113742 2023-03-26 06:33:12,991 - INFO: Round 1: 178623/178623 AI 40378 AW 1126044 2023-03-26 06:33:13,871 - INFO: Round 2: 178623/178623 AI 40411 AW 1126346 2023-03-26 06:33:14,754 - INFO: Round 3: 178623/178623 AI 40411 AW 1126346 2023-03-26 06:33:14,754 - INFO: No more reads found and terminated ... 2023-03-26 06:33:15,065 - INFO: Extending finished.

2023-03-26 06:33:15,066 - INFO: Separating extended fastq file ... 2023-03-26 06:33:15,144 - INFO: Setting '-k 21,55,85,115' 2023-03-26 06:33:15,144 - INFO: Assembling using SPAdes ... 2023-03-26 06:33:15,170 - INFO: spades.py -t 1 --phred-offset 33 -1 Arabidopsis_simulated.plastome/extended_1_paired.fq -2 Arabidopsis_simulated.plastome/extended_2_paired.fq --s1 Arabidopsis_simulated.plastome/extended_1_unpaired.fq --s2 Arabidopsis_simulated.plastome/extended_2_unpaired.fq -k 21,55,85,115 -o Arabidopsis_simulated.plastome/extended_spades 2023-03-26 06:33:15,226 - ERROR: Assembling failed.

Total cost 45.03 s Thank you!


If then I try to run: spades.py --test, I get: Traceback (most recent call last): File "/Users/oxfordlab/anaconda3/envs/getorganelle/bin/spades.py", line 26, in import support ModuleNotFoundError: No module named 'support'

Any suggestion? Thanks!

Running getorganelle with raw target sequencing data error

opened on 2023-03-22 13:47:55 by Oulo

I tried running getorganelle using raw A353 data but the resutling graph was just scaffolds and not a complete circle. Is there another way to use getorganelle for target sequence data?

Using the GFA file (p_utg) exported by hifiasm for assembly with the error "Slimming failed."

opened on 2023-03-17 07:00:13 by sunjinkkk

An error occurs when assembling asm.p_utg.gfa using hifiasm output, but it is normal in asm.p_ctg.gfa. GetOrganelle is installed by conda v1.7.7.0. The command is get_organelle_from_assembly.py -t 40 -F embplant_mt -g ldsg.asm.hic.p_utg.gfa -o test6_from_assembly_mt

The log file is as follows ` GetOrganelle v1.7.7.0

get_organelle_from_assembly.py isolates organelle genomes from assembly graph. Find updates in https://github.com/Kinggerm/GetOrganelle and see README.md for more information.

Python 3.10.9 | packaged by conda-forge | (main, Feb 2 2023, 20:20:04) [GCC 11.3.0] PLATFORM: Linux node2 4.18.0-305.3.1.el8.x86_64 #1 SMP Tue Jun 1 16:14:33 UTC 2021 x86_64 x86_64 PYTHON LIBS: GetOrganelleLib 1.7.7.0; numpy 1.24.2; sympy 1.11.1; scipy 1.10.1 DEPENDENCIES: Blast 2.13.0 GETORG_PATH=/home/stu_sunjin/.GetOrganelle LABEL DB: embplant_mt 0.0.1; embplant_pt 0.0.1 WORKING DIR: /home/stu_sunjin/data/luodi /home/stu_sunjin/biosoft/miniconda/envs/getorganelle/bin/get_organelle_from_assembly.py -t 40 -F embplant_mt -g ldsg.asm.hic.p_utg.gfa -o test6_from_assembly_mt

2023-03-17 14:42:00,209 - INFO: Processing assembly graph ... 2023-03-17 14:42:00,995 - INFO: Processing assembly graph finished.

2023-03-17 14:42:00,995 - INFO: Slimming assembly graph ... 2023-03-17 14:50:25,222 - ERROR: Slimming test6_from_assembly_mt/initial_assembly_graph.gfa failed. Please check *slim.log.txt for details. 2023-03-17 14:50:25,222 - ERROR: 2023-03-17 14:42:01,927 - INFO: Slimming file 1/1: test6_from_assembly_mt/initial_assembly_graph.gfa 2023-03-17 14:42:14,380 - INFO: Parsing input finished. 2023-03-17 14:42:24,198 - INFO: Preparing fasta file finished. 2023-03-17 14:42:24,199 - INFO: Executing BLAST to /home/stu_sunjin/.GetOrganelle/LabelDatabase/embplant_mt ... 2023-03-17 14:42:24,199 - INFO: Executing BLAST ... 2023-03-17 14:46:16,261 - INFO: Executing BLAST finished. 2023-03-17 14:46:16,261 - INFO: Executing BLAST to /home/stu_sunjin/.GetOrganelle/LabelDatabase/embplant_mt finished. 2023-03-17 14:46:16,279 - INFO: Parsing blast result finished. 2023-03-17 14:46:16,279 - INFO: Executing BLAST to /home/stu_sunjin/.GetOrganelle/LabelDatabase/embplant_pt ... 2023-03-17 14:46:16,279 - INFO: Executing BLAST ... 2023-03-17 14:50:20,257 - INFO: Executing BLAST finished. 2023-03-17 14:50:20,257 - INFO: Executing BLAST to /home/stu_sunjin/.GetOrganelle/LabelDatabase/embplant_pt finished. 2023-03-17 14:50:20,354 - INFO: Parsing blast result finished. 2023-03-17 14:50:20,385 - INFO: No enough coverage information found. 2023-03-17 14:50:20,395 - INFO: Mapping names ... 2023-03-17 14:50:25,005 - ERROR: division by zero 2023-03-17 14:50:25,005 - ERROR: Slimming file 1/1: test6_from_assembly_mt/initial_assembly_graph.gfa failed!

2023-03-17 14:50:25,005 - ERROR: Traceback (most recent call last): File "/home/stu_sunjin/biosoft/miniconda/envs/getorganelle/bin/slim_graph.py", line 1070, in main raise e File "/home/stu_sunjin/biosoft/miniconda/envs/getorganelle/bin/slim_graph.py", line 1041, in main reduce_matrix(in_names=in_names_r, ex_names=ex_names_r, seq_matrix=this_matrix, File "/home/stu_sunjin/biosoft/miniconda/envs/getorganelle/bin/slim_graph.py", line 754, in reduce_matrix assembly_graph.reduce_to_subgraph(bait_vertices=in_names, File "/home/stu_sunjin/biosoft/miniconda/envs/getorganelle/lib/python3.10/site-packages/GetOrganelleLib/assembly_parser.py", line 2002, in reduce_to_subgraph max(1, self.vertex_info[next_v].cov / base_cov) ZeroDivisionError: division by zero

2023-03-17 14:50:25,222 - ERROR: Traceback (most recent call last): File "/home/stu_sunjin/biosoft/miniconda/envs/getorganelle/bin/get_organelle_from_assembly.py", line 1019, in main exit() File "/home/stu_sunjin/biosoft/miniconda/envs/getorganelle/lib/python3.10/_sitebuiltins.py", line 26, in call raise SystemExit(code) SystemExit: None

Total cost 506.30 s

For trouble-shooting, please Firstly, check https://github.com/Kinggerm/GetOrganelle/wiki/FAQ Secondly, check if there are open/closed issues related at https://github.com/Kinggerm/GetOrganelle/issues If your problem was still not solved, please open an issue at https://github.com/Kinggerm/GetOrganelle/issues please provide the get_org.log.txt and the the slimmed_assembly_graph. file(s) (can be visualized as .png to protect your data privacy) if possible! `

Error with OS return value: 22; Can anyone please help me with a solution ?

opened on 2023-03-15 19:15:46 by avvypaks

023-03-15 15:07:43,679 - INFO: Separating extended fastq file ... 2023-03-15 15:07:43,763 - INFO: Setting '-k 21,55,85,115' 2023-03-15 15:07:43,763 - INFO: Assembling using SPAdes ... 2023-03-15 15:07:43,787 - INFO: spades.py -t 1 --phred-offset 33 -1 Arabidopsis_simulated.plastome/extended_1_paired.fq -2 Arabidopsis_simulated.plastome/extended_2_paired.fq --s1 Arabidopsis_simulated.plastome/extended_1_unpaired.fq --s2 Arabidopsis_simulated.plastome/extended_2_unpaired.fq -k 21,55,85,115 -o Arabidopsis_simulated.plastome/extended_spades 2023-03-15 15:07:44,373 - ERROR: Error with running SPAdes: == Error == system call for: "['/Users/innovation_user/opt/anaconda3/envs/fassembly/bin/spades-hammer', '/Users/innovation_user/Arabidopsis_simulated.plastome/extended_spades/corrected/configs/config.info']" finished abnormally, OS return value: 22 2023-03-15 15:07:44,396 - ERROR: Assembling failed.

error using different seed

opened on 2023-02-09 15:04:10 by ilenia6

I read that your tool was used also to assemble the rDNA locus. I would like to assemble long reads from a single specific gene locus (human) coming from a sequencing experiment in which we should have an enrichment of that locus. With preliminary data (althoug in this first experiment this locus didn't produced a good enrichment level) I used the following command line:

get_organelle_from_reads.py -u reads.fastq -o output/ -s seq.fasta -F anonym --genes seq.fasta

Where reads.fastq contained the reads to assemble, the seed file corresponded to the reference locus and genes corresponded to the the same sequence, i.e., a single sequence of the human gene locus we want to assemble However, the following error occurred:

Pre-assembling failed. The estimations for anonym-hitting base-coverage and word size may be misleading.

Would you suggest some adjustments in order to use get organelle also in such cases?

Is this command line correct? Is there a minimum coverage of the locus needed? Did you already use your tool with long reads?

Thank you in advance!

Error assembling the Arabidopsis test dataset (SPAdes errors)

opened on 2023-01-06 20:23:35 by jsmillergh

Hello,

I am attempting to assemble the Arabidopsis plastome using the provided test dataset (simulated Arabidopsis thaliana WGS dataset). After (i) installing/updating GetOrganelle, (ii) initializing the organelle type, and (iii) running GetOrganelle, I believe I see two SPAdes errors in the log file but am not sure how to resolve them.

Both the GetOrganelle and the SPAdes log files are attached.

A great many thanks for advice or assistance.

-Jill

get_org.log.txt spades.log

Releases

GetOrganelle v1.7.7.0 2022-12-05 16:09:54

  1. Assembly.merge_all_possible_nodes: fix a bug which will result in bad contig names
  2. add Utilities/get_annotated_regions_from_gb.py (#197 )
  3. fix a bug in seq_parser.SequenceList.remove
  4. Utilities/slim_graph.py: --percent and --blast-options added; exception catch
  5. fix a bug in get_organelle_config.py:pipe_control_func.py:SEED_DB_HASH (#199 )
  6. print slim_graph.py error (in from_assembly) and blastn error
  7. fix a bug in slim_graph.py (#196 )
  8. update Assembly improvement in gfa/fastg parser/writer: storing other attributes if unknown; output unequal overlaps

GetOrganelle v1.7.6.1 2022-05-07 01:18:31

  1. improve the target component recognition on non-circular cases (https://github.com/Kinggerm/GetOrganelle/discussions/138 & https://github.com/Kinggerm/GetOrganelle/issues/141 )
  2. assembly_graph.py & statistical_func.py: specify scipy error (https://github.com/Kinggerm/GetOrganelle/issues/132 )
  3. compatible with the newly released GetOrganelleDB v0.0.1.minima (https://github.com/Kinggerm/GetOrganelle/issues/64 )
  4. get_organelle_config.py: fix a bug when there was no available directory made before running --config-dir would be invalid; fix a bug for malfunctioning --verbose

GetOrganelle v1.7.5.3 2022-01-21 16:20:51

several bug fixes. check versions.py for details.

GetOrganelle v1.7.5.0 2021-05-13 09:10:49

  1. assembly_parser.py: fix a bug in estimation of the multiplicity of self-loop vertices, which were falsely forced to be at least 2. (detected in a case of Yan [email protected])
  2. other minor bug/typo fixes.

GetOrganelle v1.7.4.1 2021-04-16 07:21:01

main updates: bug fixes.

see more @versions.py

GetOrganelle v1.7.4 2021-04-14 14:15:29

Main updates: 1. assembly_parser.py: recording every overlap value rather than using a universal value 2. optparse -> argparse 3. output name prefix: filtered -> extended 4. other improvements 5. many bug fixes

see more @version.py

assembly genome-skimming-data fastg mitogenome chloroplast mitochondria plastome its ribosomal