See the documentation the paper, and databases.
If you find PopPUNK useful, please cite us:
Lees JA, Harris SR, Tonkin-Hill G, Gladstone RA, Lo SW, Weiser JN, Corander J, Bentley SD, Croucher NJ. Fast and flexible bacterial genomic epidemiology with PopPUNK. Genome Research 29:304-316 (2019). doi:10.1101/gr.241455.118
You can also run your command with --citation
to get a list of citations and a
suggested methods paragraph.
We will retire the PopPUNK website. Databases have been expanded, and can be found here: https://www.bacpop.org/poppunk/.
The change in scikit-learn's API in v1.0.0 and above mean that HDBSCAN models
fitted with sklearn <=v0.24
will give an error when loaded. If you run into this,
the solution is one of:
- Downgrade sklearn to v0.24.
- Run model refinement to turn your model into a boundary model instead (this will
change clusters).
- Refit your model in an environment with sklearn >=v1.0
.
If this is a common problem let us know, as we could write a script to 'upgrade' HDBSCAN models. See issue #213 for more details.
We have fixed a number of bugs with may affect the use of poppunk_assign
with
--update-db
. We have also fixed a number of bugs with GPU distances. These are
'advanced' features and are not likely to be encountered in most cases, but if you do wish to use either of these features please make sure that you are using
PopPUNK >=v2.4.0
with pp-sketchlib >=v1.7.0
.
We have discovered a bug affecting the interaction of pp-sketchlib and PopPUNK.
If you have used PopPUNK >=v2.0.0
with pp-sketchlib <v1.5.1
label order may
be incorrect (see issue #95).
Please upgrade to PopPUNK >=v2.2
and pp-sketchlib >=v1.5.1
. If this is not
possible, you can either:
- Run scripts/poppunk_pickle_fix.py
on your .dists.pkl
file and re-run
model fits.
- Create the database with poppunk_sketch
directly, rather than
PopPUNK --create-db
This is for the command line version. For more details see installation in the documentation.
There are other interfaces, in-browser and through galaxy, detailed here.
The easiest way is through conda, which is most easily accessed by first
installing miniconda. PopPUNK can then
be installed by running:
conda install poppunk
If the package cannot be found you will need to add the necessary channels:
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
See the quickstart guide for a brief tutorial.
A docker image is available
docker pull mrcide/poppunk:bacpop-20
Versions
Command used and output returned
Describe the bug
10^7 may not be enough, but should definitely be able to change this in visualise
Versions Poppunk v2.5.0 PopPUNK (POPulation Partitioning Using Nucleotide Kmers) (with backend: sketchlib v2.0.0 sketchlib: /opt/conda/lib/python3.9/site-packages/pp_sketchlib.cpython-39-x86_64-linux-gnu.so)
Command used and output returned poppunk --fit-model refine --model-dir /tmp/Ecoli_n79k_QCd_dbscan_k18_k32 --ref-db /tmp/Ecoli_n79k_db_k18_k32_221115 --output /tmp/Ecoli_n79k_QCd_dbscan_refine_multi_1 --multi-boundary 30 --threads 20
Describe the bug Below is the output I get, both the process of the run and the error. The plan is to run poppunk_iterate after this.
Graph-tools OpenMP parallelisation enabled: with 20 threads Mode: Fitting refine model to reference database
Loading DBSCAN model Completed model loading Loaded previous model of type: dbscan Initial model-based network construction based on DBSCAN fit Trying to optimise score globally Search range (0.001,0.057) to (0.014,0.304) Searching core intercept from 0.006 to 0.042 Searching accessory intercept from 0.064 to 0.448 █████████████████████████████████| 40/40 Trying to optimise score locally
Optimization terminated successfully;
The returned value satisfies the termination criteria
(using xtol = 1e-05 )
Creating multiple boundary fits
Search range (0.000,0.044) to (0.006,0.164)
Searching core intercept from 0.004 to 0.022
Searching accessory intercept from 0.044 to 0.231
█▏ | 1/30
Traceback (most recent call last):
File "/opt/conda/bin/poppunk", line 11, in
Versions
PoPPUNK v2.5.0 in a singularity container. Link to the dockerfile: https://github.com/StaPH-B/docker-builds/tree/master/poppunk/2.5.0. Not sure about pp-sketchlib version because the command returned executable file not found in $PATH
.
Command used and output returned
poppunk --create-db --output ppdb_13_29_4_0.05_4l --r-files rlist.txt --threads 30 --plot-fit 5 --min-k 13 --max-k 29 --k-step 4 --min-cluster-prop 1e-05 --max-zero-dist 0.005
poppunk --fit-model dbscan --ref-db ppdb_13_29_4_0.05_4 --output results_13_29_4_0.05_4 --threads 30
Describe the bug
My main issue is that poppunk fails to cluster genomes appropriately when I run the tool for 15000 genomes (before scaling up I ran the tool with about 500 genomes, and clustering completed successfully).
First, I ran poppunk with default parameters, everything clustered into a single poppunk cluster. Then, I set up an experiment where I altered the following parameters: min-k (13 or 16), max-k (29 or 31), k-step (3 or 4), max-zero-dist (0.005 or 0.05), min-cluster-prop (0.00001, or 0.0001). 13_29_4_0.05_4
in the section above indicates min-k 13, max-k 29, k-step 4, max-zero-dist 0.05, min-cluster-prop 0.0001 (-log10 scale).
Most parameter combinations produced terrible results. Surprisingly, one completed successfully: the one with default parameters. So I ran poppunk again with default parameters again, and then it failed, again. The most obvious cause is that I keep overlooking something (working on it), but if this is not the case, then I am wondering if e.g. poppunk generates random numbers anywhere? Can you please clarify this? Many thanks.
Main changes:
- Lineage fits now use reciprocal best match with --reciprocal-only
, --count-unique-distances
and --max-search-depth
, which gives better results.
- Fixes for threshold model assignment
Full Changelog: https://github.com/bacpop/PopPUNK/compare/v2.5.0...v2.6.0
Minimum sketchlib version for this release is v2.0.0
New features:
- Dendropy replaced with faster & more reliable alternatives #203
- A new logo #202
- Improve iterative PopPUNK code
- Documentation update and improvements #191
- Deal better with name clash when querying #190
- Make manual start a bit easier to use #174
- Replace t-SNE with mandrake
- Output .microreact
files, and allow direct creation of Microreact instances with an API key
- Various QC additions to help with multi-cluster merges #194
Bug fixes:
- Various fixes to cytoscape visualisation #185 #196 #210
- Hide progress bars when using --plot-fit
- Stop always checking query-query dists when clustering (and potential bug adding them to network twice)
- Fix N QC when working with reads #207
Full Changelog: https://github.com/bacpop/PopPUNK/compare/v2.4.0...v2.5.0
Minimum sketchlib version for this release is v1.7.0
To use --gpu-graph
requires cudf and cugraph to be installed from the nvidia conda channel, which is not part of the standard installation)
New features: - Adds minimum spanning tree computation and visualisation #141 #148 - Add two new network scores based on betweenness #146 - Move boundary code into a C++ extension in this package #146 #158 - Adds GPU accelerated graphs #87 #148 - Adds a docker container which is used for web.poppunk.net #151 #162 - New github actions for testing and building the web API #151 - Add progress bars in for model assignment #155 - Parallelise model assignment #155 - Adds the VLKC terminology, and 'unword' cluster names #161
Bug fixes:
- Correctly specify thread count with rapidnj #139
- Regenerate random match changes after --update-db
#149
- Fix issue with label order when using --update-db
more than once #152
- Update some scripts/
to work with newer versions of numpy and scikit-learn #160
- Keep hyphens in sample names in trees #159
- Fix a plot name #158
- Pin some package versions #140 #142
This is a major (API-breaking) update which moves the assign and visualisation functions into their own programs, to make the program more modular. The minimum version of pp-sketchlib required is 1.6.0.
New features:
- Lineage assign mode uses matrix code in pp-sketchlib #108
- New algorithm for clique pruning #110
- Visualisation and query moved out of main, and into their own programs #112 #115 #129
- Simpler CLI defaults #125
- Updated documentation #122
- Add edge weights to graph #123
- Add API for use of poppunk_assign
with a http server #124 #131
- Add corrected/uncorrected distances when plotting k-mer fits #136
Bug fixes:
- More stable generation of documentation #132
- Fixes continue
mode for QC function #134
- Fixes long length QC fail #137
The first bug fix will affect many results, and all users are encouraged to upgrade
New features: - More thorough sample QC using pp-sketchlib features (#101) - Update to pp-sketchlib v1.5.1 (#104)
Bug fixes:
- Misordered labels with older versions of pp-sketchlib (#95)
- TypeError
with visualisations (#99)
- networkx still used in reference prune program (#97)
--lineage-clustering
mode (#72)--refine-model
default boundary (#94)NB python >=3.8 is now required (#81, #76)
Pathogen Informatics and Modelling @ EMBL-EBI / Bacterial Evolutionary Epidemiology Group @ Imperial College London
GitHub Repository Homepagebacteria genomics population-genetics k-mer sketching