notice: please update to 1.7.5+, which fixed the bug on the multiplicity estimation of self-loop vertices.
This toolkit assemblies organelle genome from genomic skimming data.
It achieved the best performance overall both on simulated and real data and was recommended as the default for chloroplast genome assembly in a third-party comparison paper (Freudenthal et al. 2020. Genome Biology).
Please denote the version of GetOrganelle as well as the dependencies in your manuscript for reproducible science.
Citation: Jian-Jun Jin, Wen-Bin Yu, Jun-Bo Yang, Yu Song, Claude W. dePamphilis, Ting-Shuang Yi, De-Zhu Li. GetOrganelle: a fast and versatile toolkit for accurate de novo assembly of organelle genomes. Genome Biology 21, 241 (2020). https://doi.org/10.1186/s13059-020-02154-5
License: GPL https://www.gnu.org/licenses/gpl-3.0.html
Please also cite the dependencies if used:
GetOrganelle is currently maintained under Python 3.7.0, but designed to be compatible with versions higher than 3.5.1 and 2.7.11. It was built for Linux and macOS. Windows Subsystem Linux is currently not supported, we are working on this.
The easiest way to install GetOrganelle and its dependencies is using conda:
conda install -c bioconda getorganelle
You have to install Anaconda or Miniconda before using the above command. If you don't like conda, or want to follow the latest updates, you can find more installation options here (my preference).
After installation of GetOrganelle v1.7+, please download and initialize the database of your preferred organelle genome type (embplant_pt, embplant_mt, embplant_nr, fungus_mt, fungus_nr, animal_mt, and/or other_pt). Supposing you are assembling chloroplast genomes:
get_organelle_config.py --add embplant_pt,embplant_mt
If connection keeps failing, please manually download the latest database from GetOrganelleDB and initialization from local files.
The database will be located at ~/.GetOrganelle
by default, which can be changed via the command line parameter --config-dir
, or via the shell environment variable GETORG_PATH
(see more here).
Download a simulated Arabidopsis thaliana WGS dataset:
wget https://github.com/Kinggerm/GetOrganelleGallery/raw/master/Test/reads/Arabidopsis_simulated.1.fq.gz
wget https://github.com/Kinggerm/GetOrganelleGallery/raw/master/Test/reads/Arabidopsis_simulated.2.fq.gz
then verify the integrity of downloaded files using md5sum
:
md5sum Arabidopsis_simulated.*.fq.gz
# 935589bc609397f1bfc9c40f571f0f19 Arabidopsis_simulated.1.fq.gz
# d0f62eed78d2d2c6bed5f5aeaf4a2c11 Arabidopsis_simulated.2.fq.gz
# Please re-download the reads if your md5 values unmatched above
then do the fast plastome assembly (memory: ~600MB, CPU time: ~60s):
get_organelle_from_reads.py -1 Arabidopsis_simulated.1.fq.gz -2 Arabidopsis_simulated.2.fq.gz -t 1 -o Arabidopsis_simulated.plastome -F embplant_pt -R 10
You are going to get a similar running log as here and the same result as here.
Find more real data examples at GetOrganelle/wiki/Examples, GetOrganelleGallery and GetOrganelleComparison.
Find more organelle genome assembly instruction at GetOrganelle/wiki.
In most cases, what you actually need to do is just typing in one simple command as suggested in Recipes. But you are still highly recommended reading the following minimal introductions:
The green workflow in the flowchart below shows the processes of get_organelle_from_reads.py
.
Input data
Currently, get_organelle_from_reads.py
was written for illumina pair-end/single-end data (fastq or fastq.gz). We recommend using adapter-trimmed raw reads without quality control.
Usually, >1G per end is enough for plastome for most normal angiosperm samples,
and >5G per end is enough for mitochondria genome assembly.
Since v1.6.2, get_organelle_from_reads.py
will automatically estimate the read data it needs, without user assignment nor data reducing (see flags --reduce-reads-for-coverage
and --max-reads
).
Main Options
-w
The value word size, like the kmer in assembly, is crucial to the feasibility and efficiency of this process.
The best word size changes upon data and will be affected by read length, read quality, base coverage, organ DNA percent and other factors.
By default, GetOrganelle would automatically estimate a proper word size based on the data characters.
Although the automatically-estimated word size value does not ensure the best performance nor the best result,
you do not need to adjust this value (-w
) if a complete/circular organelle genome assembly is produced,
because the circular result generated by GetOrganelle is highly consistent under different options and seeds.
The automatically estimated word size may be screwy in some animal mitogenome data due to inaccurate coverage estimation,
for which you fine-tune it instead.
-k
The best kmer(s) depend on a wide variety of factors too.
Although more kmer values add the time consuming, you are recommended to use a wide range of kmers to benefit from the power of SPAdes.
Empirically, you should include at least including one small kmer (e.g. 21
) and one large kmer (85
) for a successful organelle genome assembly.
The largest kmer in the gradient may be crucial to the success rate of achieving the complete circular organelle genome.
-s
GetOrganelle takes the seed (fasta format; if this was not provided,
the default is GetOrganelleLib/SeedDatabase/*.fasta
) as probe,
the script would recruit target reads in successive rounds (extending process).
The default seed works for most samples, but using a complete organelle genome sequence of a related species as the seed would help the assembly in many cases
(e.g. degraded DNA samples, fastly-evolving in animal/fungal samples; see more here).
Key Results
The key output files include
*.path_sequence.fasta
, each fasta file represents one type of genome structure*.selected_graph.gfa
, the organelle-only assembly graphget_org.log.txt
, the log fileextended_K*.assembly_graph.fastg
, the raw assembly graphextended_K*.assembly_graph.fastg.extend_embplant_pt-embplant_mt.fastg
, a simplified assembly graph extended_K*.assembly_graph.fastg.extend_embplant_pt-embplant_mt.csv
, a tab-format contig label file for bandage visualizationYou may delete the files other than above if the resulting genome is complete (indicated in the log file and the name of the *.fasta
).
You are expected to obtain the complete organelle genome assembly for most animal/fungal mitogenomes and plant chloroplast genomes
(see here for nuclear ribosomal DNAs) with the recommended recipes.
If GetOrganelle failed to generate the complete circular genome (produce *scaffolds*path_sequence.fasta
),
please follow here to adjust your parameters for a second run.
You could also use the incomplete sequence to conduct downstream analysis.
The blue workflow in the chat below shows the processes of get_organelle_from_assembly.py
.
Input data & Main Options
-g
The input must be a FASTG or GFA formatted assembly graph file.
If you input an assembly graph assembled from total DNA sequencing using third-party a de novo assembler (e.g. Velvet),
the assembly graph may includes a great amount of non-target contigs.
You may want to use --min-depth
and --max-depth
to greatly reduce the computational burden for target extraction.
If you input an organelle-equivalent assembly graph
(e.g. manually curated and exported using Bandage), you may use --no-slim
.
Key Results
The key output files include
*.path_sequence.fasta
, one fasta file represents one type of genome structure*.fastg
, the organelle related assembly graph to report for improvement and debug*.selected_graph.gfa
, the organelle-only assembly graphget_org.log.txt
, the log filePlease refer to the GetOrganelle FAQ to fine-tune the arguments, especially concerning word size, memory, and clock time.
Embryophyta
To assembly Embryophyta plant plastid genome (plastome), e.g. using 2G raw data of 150 bp paired reads, typically I use:
get_organelle_from_reads.py -1 forward.fq -2 reverse.fq -o plastome_output -R 15 -k 21,45,65,85,105 -F embplant_pt
or in a draft way:
get_organelle_from_reads.py -1 forward.fq -2 reverse.fq -o plastome_output --fast -k 21,65,105 -w 0.68 -F embplant_pt
or in a slow and memory-economic way:
get_organelle_from_reads.py -1 forward.fq -2 reverse.fq -o plastome_output -R 30 -k 21,45,65,85,105 -F embplant_pt --memory-save
To assembly Embryophyta plant mitochondria genome (mitogenome), usually you need more than 5G raw data:
get_organelle_from_reads.py -1 forward.fq -2 reverse.fq -o mitochondria_output -R 20 -k 21,45,65,85,105 -P 1000000 -F embplant_mt # 1. please use the FASTG file as the final output for downstream manual processing. until further updates, the FASTA output of plant mitochondria genome of numerous repeats may be error-prone # 2. embplant_mt mode was not tested in the GetOrganelle paper due to the complexity of plant mitogenomes and the defects of short reads. So there is room for improvement in the argument choices.
To assembly Embryophyta plant nuclear ribosomal RNA (18S-ITS1-5.8S-ITS2-26S):
get_organelle_from_reads.py -1 forward.fq -2 reverse.fq -o nr_output -R 10 -k 35,85,115 -F embplant_nr # Please also take a look at this FAQ https://github.com/Kinggerm/GetOrganelle/wiki/FAQ#why-does-getorganelle-generate-a-circular-genome-or-not-for-embplant_nrfungus_nr
Non-embryophyte
Non embryophyte plastomes and mitogenomes can be divergent from the embryophyte. We have not explored it very much. But many users have successfully assemble them using GetOrganelle using the default database or a customized database.
There is a built-in other_pt
mode and prepared default database for the non embryophyte plastomes. I would start with -F other_pt
and similar options as in the embplant_pt
part. However, there is no such built-in mode for non embryophyte mitogenomes. Considering that the sequences may be highly divergent from embplant_mt, besides using similar options as in the embplant_mt
part, I would make a pair of customized seed database and label database, then use them to run GetOrganelle following the guidance here.
Fungus
To assembly fungus mitochondria genome:
get_organelle_from_reads.py -1 forward.fq -2 reverse.fq -R 10 -k 21,45,65,85,105 -F fungus_mt -o fungus_mt_out # if you fail with the default database, use your own seed database and label database with "-s" and "--genes"
To assembly fungus nuclear ribosomal RNA (18S-ITS1-5.8S-ITS2-28S):
get_organelle_from_reads.py -1 forward.fq -2 reverse.fq -R 10 -k 21,45,65,85,105 -F fungus_nr -o fungus_nr_out
# if you fail with the default database, use your own seed database and label database with "-s" and "--genes"
# Please also take a look at this FAQ https://github.com/Kinggerm/GetOrganelle/wiki/FAQ#why-does-getorganelle-generate-a-circular-genome-or-not-for-embplant_nrfungus_nr
Animal
To assembly animal mitochondria:
get_organelle_from_reads.py -1 forward.fq -2 reverse.fq -R 10 -k 21,45,65,85,105 -F animal_mt -o animal_mt_out
# if you fail with the default database, rerun it using your own seed database (or the output of a first GetOrganelle run) and label database with "-s" and "--genes"
Animal nuclear ribosomal RNA will be available in the future. Issue136 is the place to follow.
There are as many available organelle types as the From Reads
section (see more by get_organelle_from_assembly.py -h
), but the simplest usage is not that different. Here is an example to extract the plastid genome from an existing assembly graph (*.fastg
/*.gfa
; e.g. from long-read sequencing assemblies):
get_organelle_from_assembly.py -F embplant_pt -g ONT_assembly_graph.gfa
See a brief illustrations of those arguments by typing in:
get_organelle_from_reads.py -h
or see the detailed illustrations:
get_organelle_from_reads.py --help
The same brief -h
and verbose --help
menu can be find for get_organelle_from_assembly.py
.
You may also find a summary of above information here at Usage.
Please check GetOrganelle wiki page first. If your question is running specific, please attach the get_org.log.txt
file and the post-slimming assembly graph (assembly_graph.fastg.extend_*.fastg
, could be Bandage-visualized *.png format to protect your data privacy).
Although older versions like 1.6.3/1.7.1/1.7.6 may be more stable, but we always strongly encourage you to keep updated. GetOrganelle was actively updated with new fixes and new features, but new bugs too. So if you catch one, please do not be surprised and report it to us. We usually have quick response to bugs.
This was previously located at GetOrganelle Issues where you may find old Q&A
Please avoid duplicate and miscellaneous issues
QQ group (ID: 908302723): only for mutual help, and we will no longer reply to questions there
Do NOT directly write to us with your questions, instead please post the questions publicly, using above platforms (we will be informed automatically) or any other platforms (inform us of it). Our emails ([email protected], [email protected]) are only for receiving public question alert and private data (if applied) associated with those public questions. When you send your private data to us, enclose the email with a link where you posted the question. Our only reply emails will be a receiving confirmation, while our answers will be posted in a public place.
I installed the last version of getorganelle: 1.7.7.0, and then it would not work with Spades as many have pointed out. Thus I downloaded the dependencies from the authors (GetOrganelleDep), and what I did was to edit the file 'spades_init.py' within the bin of getorganelle to replace: 'spades_home = abspath(dirname(realpath(file)))' for 'spades_home = 'path_to_/GetOrganelleDep/macOS/SPAdes'. I also change the spades.py path.
get_organelle_from_reads.py -1 Arabidopsis_simulated.1.fq.gz -2 Arabidopsis_simulated.2.fq.gz -t 1 -o Arabidopsis_simulated.plastome -F embplant_pt -R 10 --overwrite
2023-03-26 06:32:31,652 - INFO: Pre-reading fastq ... 2023-03-26 06:32:31,652 - INFO: Estimating reads to use ... (to use all reads, set '--reduce-reads-for-coverage inf --max-reads inf') 2023-03-26 06:32:31,752 - INFO: Estimating reads to use finished. 2023-03-26 06:32:31,752 - INFO: Unzipping reads file: Arabidopsis_simulated.1.fq.gz (8796915 bytes) 2023-03-26 06:32:31,863 - INFO: Unzipping reads file: Arabidopsis_simulated.2.fq.gz (9067061 bytes) 2023-03-26 06:32:31,969 - INFO: Counting read qualities ... 2023-03-26 06:32:32,111 - INFO: Identified quality encoding format = Illumina 1.8+ 2023-03-26 06:32:32,112 - INFO: Phred offset = 33 2023-03-26 06:32:32,113 - INFO: Trimming bases with qualities (0.00%): 33..33 ! 2023-03-26 06:32:32,169 - INFO: Mean error rate = 0.0019 2023-03-26 06:32:32,171 - INFO: Counting read lengths ... 2023-03-26 06:32:32,348 - INFO: Mean = 150.0 bp, maximum = 150 bp. 2023-03-26 06:32:32,348 - INFO: Reads used = 91563+91563 2023-03-26 06:32:32,348 - INFO: Pre-reading fastq finished.
2023-03-26 06:32:32,348 - INFO: Making seed reads ... 2023-03-26 06:32:32,349 - INFO: Seed bowtie2 index existed! 2023-03-26 06:32:32,349 - INFO: Mapping reads to seed bowtie2 index ... 2023-03-26 06:32:42,606 - INFO: Mapping finished. 2023-03-26 06:32:42,606 - INFO: Seed reads made: Arabidopsis_simulated.plastome/seed/embplant_pt.initial.fq (14144302 bytes) 2023-03-26 06:32:42,606 - INFO: Making seed reads finished.
2023-03-26 06:32:42,606 - INFO: Checking seed reads and parameters ... 2023-03-26 06:32:42,606 - INFO: The automatically-estimated parameter(s) do not ensure the best choice(s). 2023-03-26 06:32:42,606 - INFO: If the result graph is not a circular organelle genome, 2023-03-26 06:32:42,607 - INFO: you could adjust the value(s) of '-w'/'-R' for another new run. 2023-03-26 06:32:45,467 - INFO: Pre-assembling mapped reads ... 2023-03-26 06:32:45,963 - INFO: Retrying with more reads .. 2023-03-26 06:33:05,013 - WARNING: Pre-assembling failed. The estimations for embplant_pt-hitting base-coverage and word size may be misleading. 2023-03-26 06:33:08,240 - INFO: Estimated embplant_pt-hitting base-coverage = 52.85 2023-03-26 06:33:08,434 - INFO: Estimated word size(s): 98 2023-03-26 06:33:08,434 - INFO: Setting '-w 98' 2023-03-26 06:33:08,434 - INFO: Setting '--max-extending-len inf' 2023-03-26 06:33:08,515 - INFO: Checking seed reads and parameters finished.
2023-03-26 06:33:08,515 - INFO: Making read index ... 2023-03-26 06:33:09,377 - INFO: 178623 candidates in all 183126 reads 2023-03-26 06:33:09,377 - INFO: Pre-grouping reads ... 2023-03-26 06:33:09,377 - INFO: Setting '--pre-w 98' 2023-03-26 06:33:09,391 - INFO: 4074/4074 used/duplicated 2023-03-26 06:33:09,629 - INFO: 517 groups made. 2023-03-26 06:33:09,637 - INFO: Making read index finished.
2023-03-26 06:33:09,637 - INFO: Extending ... 2023-03-26 06:33:09,637 - INFO: Adding initial words ... 2023-03-26 06:33:11,324 - INFO: AW 1113742 2023-03-26 06:33:12,991 - INFO: Round 1: 178623/178623 AI 40378 AW 1126044 2023-03-26 06:33:13,871 - INFO: Round 2: 178623/178623 AI 40411 AW 1126346 2023-03-26 06:33:14,754 - INFO: Round 3: 178623/178623 AI 40411 AW 1126346 2023-03-26 06:33:14,754 - INFO: No more reads found and terminated ... 2023-03-26 06:33:15,065 - INFO: Extending finished.
2023-03-26 06:33:15,066 - INFO: Separating extended fastq file ... 2023-03-26 06:33:15,144 - INFO: Setting '-k 21,55,85,115' 2023-03-26 06:33:15,144 - INFO: Assembling using SPAdes ... 2023-03-26 06:33:15,170 - INFO: spades.py -t 1 --phred-offset 33 -1 Arabidopsis_simulated.plastome/extended_1_paired.fq -2 Arabidopsis_simulated.plastome/extended_2_paired.fq --s1 Arabidopsis_simulated.plastome/extended_1_unpaired.fq --s2 Arabidopsis_simulated.plastome/extended_2_unpaired.fq -k 21,55,85,115 -o Arabidopsis_simulated.plastome/extended_spades 2023-03-26 06:33:15,226 - ERROR: Assembling failed.
Total cost 45.03 s Thank you!
If then I try to run: spades.py --test, I get:
Traceback (most recent call last):
File "/Users/oxfordlab/anaconda3/envs/getorganelle/bin/spades.py", line 26, in
Any suggestion? Thanks!
I tried running getorganelle using raw A353 data but the resutling graph was just scaffolds and not a complete circle. Is there another way to use getorganelle for target sequence data?
An error occurs when assembling asm.p_utg.gfa using hifiasm output, but it is normal in asm.p_ctg.gfa. GetOrganelle is installed by conda v1.7.7.0. The command is get_organelle_from_assembly.py -t 40 -F embplant_mt -g ldsg.asm.hic.p_utg.gfa -o test6_from_assembly_mt
The log file is as follows ` GetOrganelle v1.7.7.0
get_organelle_from_assembly.py isolates organelle genomes from assembly graph. Find updates in https://github.com/Kinggerm/GetOrganelle and see README.md for more information.
Python 3.10.9 | packaged by conda-forge | (main, Feb 2 2023, 20:20:04) [GCC 11.3.0] PLATFORM: Linux node2 4.18.0-305.3.1.el8.x86_64 #1 SMP Tue Jun 1 16:14:33 UTC 2021 x86_64 x86_64 PYTHON LIBS: GetOrganelleLib 1.7.7.0; numpy 1.24.2; sympy 1.11.1; scipy 1.10.1 DEPENDENCIES: Blast 2.13.0 GETORG_PATH=/home/stu_sunjin/.GetOrganelle LABEL DB: embplant_mt 0.0.1; embplant_pt 0.0.1 WORKING DIR: /home/stu_sunjin/data/luodi /home/stu_sunjin/biosoft/miniconda/envs/getorganelle/bin/get_organelle_from_assembly.py -t 40 -F embplant_mt -g ldsg.asm.hic.p_utg.gfa -o test6_from_assembly_mt
2023-03-17 14:42:00,209 - INFO: Processing assembly graph ... 2023-03-17 14:42:00,995 - INFO: Processing assembly graph finished.
2023-03-17 14:42:00,995 - INFO: Slimming assembly graph ... 2023-03-17 14:50:25,222 - ERROR: Slimming test6_from_assembly_mt/initial_assembly_graph.gfa failed. Please check *slim.log.txt for details. 2023-03-17 14:50:25,222 - ERROR: 2023-03-17 14:42:01,927 - INFO: Slimming file 1/1: test6_from_assembly_mt/initial_assembly_graph.gfa 2023-03-17 14:42:14,380 - INFO: Parsing input finished. 2023-03-17 14:42:24,198 - INFO: Preparing fasta file finished. 2023-03-17 14:42:24,199 - INFO: Executing BLAST to /home/stu_sunjin/.GetOrganelle/LabelDatabase/embplant_mt ... 2023-03-17 14:42:24,199 - INFO: Executing BLAST ... 2023-03-17 14:46:16,261 - INFO: Executing BLAST finished. 2023-03-17 14:46:16,261 - INFO: Executing BLAST to /home/stu_sunjin/.GetOrganelle/LabelDatabase/embplant_mt finished. 2023-03-17 14:46:16,279 - INFO: Parsing blast result finished. 2023-03-17 14:46:16,279 - INFO: Executing BLAST to /home/stu_sunjin/.GetOrganelle/LabelDatabase/embplant_pt ... 2023-03-17 14:46:16,279 - INFO: Executing BLAST ... 2023-03-17 14:50:20,257 - INFO: Executing BLAST finished. 2023-03-17 14:50:20,257 - INFO: Executing BLAST to /home/stu_sunjin/.GetOrganelle/LabelDatabase/embplant_pt finished. 2023-03-17 14:50:20,354 - INFO: Parsing blast result finished. 2023-03-17 14:50:20,385 - INFO: No enough coverage information found. 2023-03-17 14:50:20,395 - INFO: Mapping names ... 2023-03-17 14:50:25,005 - ERROR: division by zero 2023-03-17 14:50:25,005 - ERROR: Slimming file 1/1: test6_from_assembly_mt/initial_assembly_graph.gfa failed!
2023-03-17 14:50:25,005 - ERROR: Traceback (most recent call last): File "/home/stu_sunjin/biosoft/miniconda/envs/getorganelle/bin/slim_graph.py", line 1070, in main raise e File "/home/stu_sunjin/biosoft/miniconda/envs/getorganelle/bin/slim_graph.py", line 1041, in main reduce_matrix(in_names=in_names_r, ex_names=ex_names_r, seq_matrix=this_matrix, File "/home/stu_sunjin/biosoft/miniconda/envs/getorganelle/bin/slim_graph.py", line 754, in reduce_matrix assembly_graph.reduce_to_subgraph(bait_vertices=in_names, File "/home/stu_sunjin/biosoft/miniconda/envs/getorganelle/lib/python3.10/site-packages/GetOrganelleLib/assembly_parser.py", line 2002, in reduce_to_subgraph max(1, self.vertex_info[next_v].cov / base_cov) ZeroDivisionError: division by zero
2023-03-17 14:50:25,222 - ERROR: Traceback (most recent call last): File "/home/stu_sunjin/biosoft/miniconda/envs/getorganelle/bin/get_organelle_from_assembly.py", line 1019, in main exit() File "/home/stu_sunjin/biosoft/miniconda/envs/getorganelle/lib/python3.10/_sitebuiltins.py", line 26, in call raise SystemExit(code) SystemExit: None
Total cost 506.30 s
For trouble-shooting, please Firstly, check https://github.com/Kinggerm/GetOrganelle/wiki/FAQ Secondly, check if there are open/closed issues related at https://github.com/Kinggerm/GetOrganelle/issues If your problem was still not solved, please open an issue at https://github.com/Kinggerm/GetOrganelle/issues please provide the get_org.log.txt and the the slimmed_assembly_graph. file(s) (can be visualized as .png to protect your data privacy) if possible! `
023-03-15 15:07:43,679 - INFO: Separating extended fastq file ... 2023-03-15 15:07:43,763 - INFO: Setting '-k 21,55,85,115' 2023-03-15 15:07:43,763 - INFO: Assembling using SPAdes ... 2023-03-15 15:07:43,787 - INFO: spades.py -t 1 --phred-offset 33 -1 Arabidopsis_simulated.plastome/extended_1_paired.fq -2 Arabidopsis_simulated.plastome/extended_2_paired.fq --s1 Arabidopsis_simulated.plastome/extended_1_unpaired.fq --s2 Arabidopsis_simulated.plastome/extended_2_unpaired.fq -k 21,55,85,115 -o Arabidopsis_simulated.plastome/extended_spades 2023-03-15 15:07:44,373 - ERROR: Error with running SPAdes: == Error == system call for: "['/Users/innovation_user/opt/anaconda3/envs/fassembly/bin/spades-hammer', '/Users/innovation_user/Arabidopsis_simulated.plastome/extended_spades/corrected/configs/config.info']" finished abnormally, OS return value: 22 2023-03-15 15:07:44,396 - ERROR: Assembling failed.
I read that your tool was used also to assemble the rDNA locus. I would like to assemble long reads from a single specific gene locus (human) coming from a sequencing experiment in which we should have an enrichment of that locus. With preliminary data (althoug in this first experiment this locus didn't produced a good enrichment level) I used the following command line:
get_organelle_from_reads.py -u reads.fastq -o output/ -s seq.fasta -F anonym --genes seq.fasta
Where reads.fastq contained the reads to assemble, the seed file corresponded to the reference locus and genes corresponded to the the same sequence, i.e., a single sequence of the human gene locus we want to assemble However, the following error occurred:
Pre-assembling failed. The estimations for anonym-hitting base-coverage and word size may be misleading.
Would you suggest some adjustments in order to use get organelle also in such cases?
Is this command line correct? Is there a minimum coverage of the locus needed? Did you already use your tool with long reads?
Thank you in advance!
Hello,
I am attempting to assemble the Arabidopsis plastome using the provided test dataset (simulated Arabidopsis thaliana WGS dataset). After (i) installing/updating GetOrganelle, (ii) initializing the organelle type, and (iii) running GetOrganelle, I believe I see two SPAdes errors in the log file but am not sure how to resolve them.
Both the GetOrganelle and the SPAdes log files are attached.
A great many thanks for advice or assistance.
-Jill
several bug fixes. check versions.py for details.
main updates: bug fixes.
see more @versions.py
Main updates: 1. assembly_parser.py: recording every overlap value rather than using a universal value 2. optparse -> argparse 3. output name prefix: filtered -> extended 4. other improvements 5. many bug fixes
see more @version.py
assembly genome-skimming-data fastg mitogenome chloroplast mitochondria plastome its ribosomal