A package to annotate protein sequences

PedroMTQ, updated 🕥 2022-07-02 16:29:01

MANTIS

mantis_icon_small

This tool can be used for protein function annotation, it is a standalone tool that uses HMMER or Diamond to match sequences against multiple reference datasets. It accepts as input an aminoacids sequence fasta.
The main goals of this tool are to: - consider multiple protein domains - annotate with taxonomy resolution - use different reference datasets and provide a consensus annotation - be easy to setup and customize - scale well with multiple samples and/or metagenomes

If you have only loose reads, you need to assemble them first; when you have assembled reads/genomes you need to predict the protein coding regions (gene prediction - e.g. prodigal) to convert your data into a protein fasta that Mantis can then use.

MANTIS IS NO LONGER IN ACTIVE DEVELOPMENT, but I'm still maintaining it. I will try to answer any issues and fix any problems as soon as possible (usually over the weekend).

Mantis is compatible with genomes and metagenomes.

Current release info

| Name | Downloads | Version | Platforms | Latest release | |-----------------------------------------------------------------------------------------------------------------------| --- |---|---|---| | Conda Recipe | Conda Downloads | Conda Version | Conda Platforms | Conda Platforms |

Citation

If you use Mantis, please make sure you cite the respective paper https://doi.org/10.1093/gigascience/giab042

Wiki

Do you have any questions you can't find the answer to in here? Please read the wiki.

Still can't find the answer? Just post an issue and I'll answer as soon as possible!

Workflow overview

overview_small

Installation

  1. conda install -c bioconda mantis_pfa
  2. mantis setup

Mantis is now ready to run with: mantis run -i target_faa

Mantis can only run on Linux or MacOS systems. If you want to run Mantis on MacOS make sure you use python 3.7

Customization

Custom references can be added in config/MANTIS.cfg by adding their absolute path or folder path, for example:

    custom_ref=/path/to/ref_folder/file.hmm
    custom_ref=/path/to/ref_folder/file.dmnd
    custom_ref=/path/to/ref_folder/

Alternatively you may add them to the custom_refs folder, for example:

    Mantis/References/Custom_references/custom1/custom1.hmm
    Mantis/References/Custom_references/custom2/custom2.dmnd

You may also redifine the custom_refs folder path by adding your preferred path to custom_refs_folder in the config/MANTIS.cfg file, for example:

    custom_refs_folder=path/to/custom_refs/

To integrate metadata, each custom reference folder should contain a metadata.tsv file - see Custom References for more details.

Functions

1. Help
mantis -h 2. Setup databases
mantis setup

3. Check installation
mantis check 4. Check SQL metadata files
mantis check_sql 5. Annotate one sample
mantis run -i target.faa -o output_folder-od organism_details -et evalue_threshold -ov overlap_value -mc custom_MANTIS.cfg example: mantis run -i mantis/tests/test_sample.faa -od "Escherichia coli"

6. Annotate multiple samples
mantis run -i target.tsv -o output_folder -et evalue_threshold -ov overlap_value -mc custom_MANTIS.cfg example: mantis run -i mantis/tests/test_file.tsv

Output files

There are 3 output files: - output_annotation.tsv, which has all hits and their coordinates and e-values; - integrated_annotation.tsv which has all hits, their coordinates and e-value, as well as the respective hit metadata; - consensus_annotation.tsv which has all hits and their respective metadata from the best reference sources consensus.

The first two files can have the same query sequence in several lines (query sequence/reference source) while the consensus_annotation.tsv will only have one line per query sequence (consensus/query).

GFF formatted output files can also be generated, as well as KEGG modules completeness tsv. Please see the Output page for information on the additional output files.

Further details

License and copyright

This project is available under the MIT license.

References and acknowledgements

Queirós, Pedro, Novikova, Polina, Wilmes, Paul and May, Patrick. "Unification of functional annotation descriptions using text mining" Biological Chemistry, vol. , no. , 2021. https://doi.org/10.1515/hsz-2021-0125

S. R. Eddy. HMMER: biosequence analysis using profile hidden Markov models. HMMER v.3.2.1 www.hmmer.org

Buchfink, B., Xie, C., & Huson, D. H. (2015). Fast and sensitive protein alignment using DIAMOND. Nature methods, 12(1), 59–60. https://doi.org/10.1038/nmeth.3176

eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Jaime Huerta-Cepas, Damian Szklarczyk, Davide Heller, Ana Hernández-Plaza, Sofia K Forslund, Helen Cook, Daniel R Mende, Ivica Letunic, Thomas Rattei, Lars J Jensen, Christian von Mering, Peer Bork Nucleic Acids Res. 2019 Jan 8; 47(Database issue): D309–D314. https://doi.org/10.1093/nar/gky1085

The Pfam protein families database in 2019: S. El-Gebali, J. Mistry, A. Bateman, S.R. Eddy, A. Luciani, S.C. Potter, M. Qureshi, L.J. Richardson, G.A. Salazar, A. Smart, E.L.L. Sonnhammer, L. Hirsh, L. Paladin, D. Piovesan, S.C.E. Tosatto, R.D. Finn Nucleic Acids Research (2019) https://doi.org/10.1093/nar/gky995

Aramaki T., Blanc-Mathieu R., Endo H., Ohkubo K., Kanehisa M., Goto S., Ogata H. KofamKOALA: KEGG ortholog assignment based on profile HMM and adaptive score threshold. Bioinformatics. 2019 Nov 19. pii: btz859. https://doi.org/10.1093/bioinformatics/btz859.

Lu S, Wang J, Chitsaz F, Derbyshire MK, Geer RC, Gonzales NR, Gwadz M, Hurwitz DI, Marchler GH, Song JS, Thanki N, Yamashita RA, Yang M, Zhang D, Zheng C, Lanczycki CJ, Marchler-Bauer A. CDD/SPARCLE: the conserved domain database in 2020. Nucleic Acids Res. 2020 Jan 8;48(D1):D265-D268. doi: 10.1093/nar/gkz991. PMID: 31777944; PMCID: PMC6943070.

Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper. Jaime Huerta-Cepas, Kristoffer Forslund, Luis Pedro Coelho, Damian Szklarczyk, Lars Juhl Jensen, Christian von Mering and Peer Bork. Mol Biol Evol (2017). doi:10.1093/molbev/msx148

Saier MH, Reddy VS, Moreno-Hagelsieb G, Hendargo KJ, Zhang Y, Iddamsetty V, Lam KJK, Tian N, Russum S, Wang J, Medrano-Soto A. The Transporter Classification Database (TCDB): 2021 update. Nucleic Acids Res. 2021 Jan 8;49(D1):D461-D467. doi: 10.1093/nar/gkaa1004. PMID: 33170213; PMCID: PMC7778945.

Issues

Installation: Unsatisfiable error

opened on 2022-08-26 12:04:31 by susheelbhanu

Hey @PedroMTQ

The current installation instructions might need to be updated. Pulling the tool using conda install -c bioconda mantis_pfa was throwing Unsatisfiable errors.

Replacing it with conda install -c conda-forge -c bioconda mantis_pfa seems to work.

Thanks!

setup gets stuck

opened on 2022-07-11 13:00:42 by ekg

On two very different systems I'm getting stuck at the same place in the setup.

mantis setup ... Merging profiles in /lizardfs/erikg/miniconda3/lib/python3.8/site-packages/References/NCBI/986/to_merge/ Concatenating files into /lizardfs/erikg/miniconda3/lib/python3.8/site-packages/References/NCBI/986/986_merged.hmm

The setup process simply hangs. On my laptop, I had to kill it. But, on a remote server I'll wait to see if it progresses. Nothing is running as far as htop says and no data is being written.

Issue with Mantis Setup

opened on 2022-07-01 19:31:15 by jblakele

Hi,

I am trying to install mantis on my labs server. The conda install went fine but the setup step did not work. Could you help me with this?

I run: mantis setup

Then get the following string of text. It looks like it's getting hung up at the downloading eggNOG step. I was able to download eggnog.db.gz separately, none of the other references were downloaded. Is it possible to set this up outside of setup or is it a bug?

I appreciate the help, Alfredo

Executing command: /tools/miniconda3/envs/MANTIS-Annotation/bin/mantis setup -c 20 -m 100

Setting up databases

ESC[32mUsing default MANTIS.cfg: /tools/miniconda3/envs/MANTIS-Annotation/lib/python3.9/site-packages/config/MANTIS.cfgESC[0m Default references folder: /tools/miniconda3/envs/MANTIS-Annotation/lib/python3.9/site-packages/References/ Resources folder: /tools/miniconda3/envs/MANTIS-Annotation/lib/python3.9/site-packages/Resources/ Custom references folder: /tools/miniconda3/envs/MANTIS-Annotation/lib/python3.9/site-packages/References/Custom_references/ TAX NOG references folder: /tools/miniconda3/envs/MANTIS-Annotation/lib/python3.9/site-packages/References/NOG/ TAX NCBI references folder: /tools/miniconda3/envs/MANTIS-Annotation/lib/python3.9/site-packages/References/NCBI/ Pfam reference folder: /tools/miniconda3/envs/MANTIS-Annotation/lib/python3.9/site-packages/References/pfam/ KOfam reference folder: /tools/miniconda3/envs/MANTIS-Annotation/lib/python3.9/site-packages/References/kofam/ TCDB reference folder: /tools/miniconda3/envs/MANTIS-Annotation/lib/python3.9/site-packages/References/tcdb/ ------------------------------------------# Weights: nog:0.8, pfam:0.9

Will check data for the following NOGT: 995019,119603,7147,526524,1506553,68525,91061,228398,200643,1313,147541,1303,11632,2836,506,203682,33958,59732,118969,1100069,9397,651137,33154,50557,97050,81852,11989,38820,1164882,167375,85018,58840,69657,1307,119060,135625,200940,206389,713636,33183,400634,35301,40674,7148,40117,85016,766,11157,5125,186822,155619,135614,204458,1129,53335,29,85008,1268,183925,4890,72273,186807,33554,119089,1090,583,183963,267888,28211,10239,8782,265,35268,10699,46205,213118,204457,157897,117743,976,9989,90964,267893,7399,85012,586,91835,2,135624,68298,135618,326319,8459,10,285107,544448,5042,548681,119043,572511,314146,772,7711,76831,561,72275,5878,52959,671232,6231,1297,33342,186928,5863,200930,213481,1212,43988,7898,28221,81850,439488,85019,136846,265975,225057,121069,29000,267890,136845,7214,200918,5653,5148,204441,3699,85026,363408,186827,35237,85004,34397,4776,28890,85013,150247,237,35278,186828,622450,1,34383,33208,203494,85021,267889,35325,2433,766764,119065,191028,7088,186821,85005,110618,4891,45404,61432,10656,204428,28216,355688,68892,539002,112252,302485,1142,1570339,10841,57723,1239,10404,186813,186824,311790,76804,33090,290174,183967,68295,267894,35493,189330,244698,45401,60136,541000,45667,245186,1386,256005,224756,118884,213115,4751,75682,508458,1762,142182,33213,201174,186804,5151,2063,84998,451870,629,147548,85010,206350,283735,301297,613,5204,830,71274,675063,1060,186823,558415,1357,135613,252356,118882,203691,32061,289201,547,544,1236,85017,10860,136849,186820,171551,41294,909932,84406,213113,1653,136843,119069,6236,85020,7742,252301,32199,93682,6656,246874,135623,414999,1305,216572,551,2157,34037,422676,5338,74030,69277,136841,213462,468,186818,451867,5819,1511857,768503,117747,84992,92860,538999,31979,326457,74385,53433,85014,83612,206351,85009,815,1161,85025,186801,255475,183980,5794,82986,639021,34384,29258,4893,84995,2759,32003,2323,28883,91561,104264,147550,189775,191675,464095,82117,629295,1117,200795,1189,5796,10474,9263,80864,314294,423358,1150,52604,1016,358033,135619,451866,5809,119066,39782,85023,74201,308865,171550,10744,5234,4447,204037,33867,28037,5139,590,1283313,186806,29547,145357,554915,183939,147545,5129,452284,335928,10662,3041,9443,122277,129337,182709,200783,32066,1028384,31993,119045,69541,28889,35718,9604,34008,82115,10477,1224,183968,178469,204432,52018 /tools/miniconda3/envs/MANTIS-Annotation/lib/python3.9/site-packages/References/NOG/eggnog.db.gz Downloading from http://eggnogdb.embl.de/download/emapperdb-5.0.2/eggnog.db.gz. The file will be kept in /tools/miniconda3/envs/MANTIS-Annotation/lib/python3.9/site-packages/References/NOG/ Did not manage to download the following url correctly: http://eggnogdb.embl.de/download/emapperdb-5.0.2/eggnog.db.gz Traceback (most recent call last): File "/tools/miniconda3/envs/MANTIS-Annotation/bin/mantis", line 11, in sys.exit(main()) File "/tools/miniconda3/envs/MANTIS-Annotation/lib/python3.9/site-packages/mantis/main.py", line 184, in main setup_databases(chunk_size=chunk_size, no_taxonomy=no_taxonomy, File "/tools/miniconda3/envs/MANTIS-Annotation/lib/python3.9/site-packages/mantis/Assembler.py", line 17, in setup_databases mantis.setup_databases() File "/tools/miniconda3/envs/MANTIS-Annotation/lib/python3.9/site-packages/mantis/utils.py", line 588, in wrapper res = f(self, args, *kwargs) File "/tools/miniconda3/envs/MANTIS-Annotation/lib/python3.9/site-packages/mantis/Database_generator.py", line 42, in setup_databases passed_tax_check=self.prepare_queue_setup_databases_tax() File "/tools/miniconda3/envs/MANTIS-Annotation/lib/python3.9/site-packages/mantis/Database_generator.py", line 136, in prepare_queue_setup_databases_tax self.download_and_unzip_eggnogdb() File "/tools/miniconda3/envs/MANTIS-Annotation/lib/python3.9/site-packages/mantis/Database_generator.py", line 853, in download_and_unzip_eggnogdb download_file(url, output_folder=self.mantis_paths['NOG'], stdout_file=stdout_file) File "/tools/miniconda3/envs/MANTIS-Annotation/lib/python3.9/site-packages/mantis/utils.py", line 725, in download_file raise Exception Exception ~

Setup process running out of memory and spawning too many processes

opened on 2021-12-07 15:46:16 by VGalata

Mantis seems to spawn more processes than available cores and to run out of memory during the setup step. I ran it with 5 cores and 20 Gb, and during the metadata extraction with 5 works the job spawned even more sub-processes and crashed. The error message from the slurm job was "out of memory".

I will try to run the job with 12 cores and 48 Gb to see whether the memory issue will appear again.

Used version: 14f75ac

CMD: bash python submodules/mantis/ setup_databases --mantis_config mantis/mantis.config

Config: nog_dmnd_ref_folder=/work/projects/ecosystem_biology/data/mantis_references/NOG/ pfam_ref_folder=/work/projects/ecosystem_biology/data/mantis_references/pfam/ kofam_ref_folder=/work/projects/ecosystem_biology/data/mantis_references/kofam/ ncbi_ref_folder=/work/projects/ecosystem_biology/data/mantis_references/NCBI/ tcdb_ref_folder=/work/projects/ecosystem_biology/data/mantis_references/tcdb/ ncbi_weight=0.9 nog_weight=0.8 pfam_weight=0.9 uniprot_ec_weight=0.9

Conda YAML: yaml channels: - anaconda - conda-forge - bioconda - defaults dependencies: - cython=0.29.21 - hmmer=3.3.1 - nltk=3.5 - numpy=1.19.1 - psutil=5.7.2 - python=3.8.5 - requests=2.24.0 - sqlite=3.33.0

Log file

Screenshots: Screenshot from 2021-12-07 16-10-12 Screenshot from 2021-12-07 16-14-37

Add peptide annotation

opened on 2021-09-30 11:42:57 by PedroMTQ

As pointed out by @hbckleikamp Diamond needs to be setup differently in order to annotate peptides. See https://github.com/bbuchfink/diamond/discussions/469 for details from author.

operon prediction

opened on 2021-05-19 10:07:45 by PedroMTQ

Add putative operon prediction by coordinate and functional clustering

Releases

1.5.5 2022-07-02 16:06:38

Updated tcdb download to adjust for updated uniprot api calls Refactored the code a bit

1.5.4 2022-04-23 08:23:26

updated taxonomy db to dynamically get the correct files from latest release in gtdb

1.5.3 2022-04-16 12:44:20

Release 1.5.1 2022-04-09 07:45:51

Change sample_kos.tsv format to be compatible with https://www.genome.jp/kegg/mapper/ input format

1.5.0 2022-03-03 13:16:37

added sorting to ref_files, they are now redundant if multiple hits from the same file are present, but you can now get the file-hit by their indexes; also added sorting to removal of redundant descriptions to ensure reproducible results

1.4.9 2022-03-02 07:57:35

Added resources_folder to config file so that the user can set where to download data for taxonomy and translation

Pedro Queirós

Nutritionist (BSc U.Porto), Bioinformatician (MSc U.Minho), PhD student (U.Luxembourg)

GitHub Repository

protein-annotation bioinformatics hmmer mantis protein-function-prediction