XRay Estimation and Refinement Using Similarity (XERUS)

pedrobcst, updated 🕥 2023-01-30 11:50:06

XERUS (X-Ray Estimation and Refinement Using Similarity)



This is the main repository for * Baptista de Castro, P., Terashima, K., Esparza Echevarria, M.G., Takeya, H. and Takano, Y. (2022), XERUS: An Open-Source Tool for Quick XRD Phase Identification and Refinement Automation. Adv. Theory Simul. 2100588. https://doi.org/10.1002/adts.202100588

For the Xerus version that was published in the paper, please refer to release 1.0r1 here


Welcome to the Xerus project. Xerus is an open-source python wrapper / plugin around the GSASII Scriptable package, for automatization of phase quantification and Rietveld analysis by combining similarity calculations of simulated patterns (Pearson´s) with quick Rietveld refinements.

Xerus is only possible due to the existence of the following projects: * COD (Crystallographic Open Database) * The Materials Project (MP) * AFLOW Database * OQMD (Open Quantum Materials Database)
* GSASII Scriptable Engine * pymatgen

Xerus is designed to perform analysis through Jupyter notebooks, providing an easy to use API from phase quantification to Rietveld optimization

The main mechanisms behind Xerus stems from the following papers:

  • Clustering of XRD patterns using similarity:

    • Iwasaki, Y., Kusne, A.G. & Takeuchi, I. Comparison of dissimilarity measures for cluster analysis of X-ray diffraction data from combinatorial libraries. npj Comput Mater 3, 4 (2017). https://doi.org/10.1038/s41524-017-0006-2
  • Optmization of Rietveld refinements:

    • Ozaki, Y., Suzuki, Y., Hawai, T. et al. Automated crystal structure analysis based on blackbox optimisation. npj Comput Mater 6, 75 (2020). https://doi.org/10.1038/s41524-020-0330-9

Plus our own pattern removal iterative process coupled with quick rietveld refinements that allows for multiphase characterization.


In this section we will briefly introduce how to install Xerus in the easiest manner possible.


Before installing Xerus OS and Services Requeriment:

System OS

  • Xerus only supports Linux based and macOS systems. Xerus was mostly developed in Ubuntu 20.04 and have been also tested in CentOS 7.x and MacOS systems (Intel) (M1 machines not available for testing.)

NOTE: As of version 1.1b we started PARTIALLY supporting Windows (currently under testing). All features related to PHASE MATCHING and SEARCHING seem to be working (Win10 python 3.8, Win11 python 3.8). However, refinement optimization seems to not work in Windows yet. There is no ETA to support this. We recommend still using UNIX based systems (Linux/macOS).

Materials Project APIKey

Xerus relies on the Materials Project API for downloading crystal structures with requested chemical space. Therefore, please register and obtain an suitable API key (Free) at: \ www.materialsproject.org \ After registering, you can check your API Key by clicking the API tab at the upper right side of the website.


Xerus relies on a MongoDB server for caching crystal structures downloaded from the providing databases (COD, AFLOW and MP) \ To install the community version (Free) and run please follow the steps listed in: * Ubuntu Installation Steps * MacOS Installation Steps * Windows Installation Steps


Package Installation

Xerus currently can only be installed via conda. Therefore, if you do not have conda, please follow the install instructions here. \ To install Xerus using a virtual enviroment with Python 3.8 follow this steps:

conda create -n Xerus python==3.8 anaconda conda activate Xerus git clone http://www.github.com/pedrobcst/Xerus/ cd Xerus pip install -e .

:warning: You might have trouble installing pymatgen if gcc is not present in your system. You can them for example do sudo apt install g++ to install in Ubuntu, then run pip install -e . again.

Windows Installation (Beta)

  1. Install Microsoft C++ Build Tools
  2. Install Anaconda
  3. Clone or Download this repository.
  4. Open anaconda shell (not windows cmd, but anaconda shell) and cd into the downloaded folder of Xerus.
  5. Do the following commands: bash conda create -n Xerus python==3.8 anaconda conda activate Xerus pip install -e .
  6. Proceed to set the settings as describe below.


If all the packages installation are successful, it is needed to correctly set the configuration file at Xerus/settings/config.conf:


  • [mongodb]
  • host: defaults to localhost
    • Enter ip address to mongodb server.
  • user: username (if necessary)
    • If configuraed to use authentication, please provide the username here
  • password: username password (if necessary)

  • [mp]

  • apikey: Materials project API key.
    • Please provide your materials project API key here


  • Extra config:
  • [gsasii]

    • instr_params: Path to .instrprm (GSASII)
    • Xerus by default comes with an .instprm file obtained from fitting NIST Si in our XRD Machine (Rigaku MiniFlex 600). It is recommended that you follow the GSASII tutorial for obtaining an .instrprm for your own machine but probably not necessary.

    • testxrd:

    • Xerus by default use one of the Examples data for testing purposed of obtained CIFs from the repositories. Feel free to change this to any of your liking

Testing installation

If all the above steps were done sucessfuly (pip install, mongo running [locally] and API key set), please do the following: bash cd tests pytest -vvv If all tests sucessfuly pass, Xerus should be ready for use.

:warning: This process might take a while.


  • To learn how to use Xerus, please follow the Notebook, located at Examples/Examples.ipynb

:warning: Make sure that before running the examples, you have started the MongoDB server.

Streamlit Interface (BETA)

As of release 1.1b we are providing a beta Streamlit interface that can help you interactively use XERUS (and its features). Altough not as flexible as using through Jupyter, it can provide a zero code alternative (or even be hosted in a main server where other users can directly use from their browser)

To start it, after installation do: python streamlit run app/app.py


If you use Xerus please also cite the following papers:

  • If you use only phase matching:
  • Baptista de Castro, P., Terashima, K., Esparza Echevarria, M.G., Takeya, H. and Takano, Y. (2022), XERUS: An Open-Source Tool for Quick XRD Phase Identification and Refinement Automation. Adv. Theory Simul. 2100588. https://doi.org/10.1002/adts.202100588
  • Toby, B. H., & Von Dreele, R. B. (2013). "GSAS-II: the genesis of a modern open-source all purpose crystallography software package". Journal of Applied Crystallography, 46(2), 544-549. ​doi:10.1107/S0021889813003531

  • If you use blackbox method for refinement please also cite:

  • Ozaki, Y., Suzuki, Y., Hawai, T. et al. Automated crystal structure analysis based on blackbox optimisation. npj Comput Mater 6, 75 (2020). https://doi.org/10.1038/s41524-020-0330-9


The code is licensed under an MIT license. The data used for benchmarking is licensed under CC4.0 and was taken from

Szymanski, N. J., Bartel, C. J., Zeng, Y., Tu, Q., & Ceder, G. (2021). Probabilistic Deep Learning Approach to Automate the Interpretation of Multi-phase Diffraction Spectra. Chemistry of Materials.


Evaluate and Implement adding temperature restriction to COD

opened on 2022-05-30 09:02:38 by pedrobcst

Currently, one of the main issues of when querying the COD is the lack of control on what structure we obtain. As discussed in the paper, one of the main issues of missclassifications is when a distorted low temperature structure (that usually comes from the COD) is matched instead of the of room temperature one.

In this situtation, one possibility to avoid this is to implement one extra filter on the OPTIMADE querier of COD (_cod_celltemp) to restrict structures around room temperature only ( maybe 293 +- 5 K ?).

This info seems to not be available on the COD REST API, therefore the Optimade querier should become the main one.

To do this, evaluate: - [ ] Is there any change on the total amounts of structures if the we use _cod_celltemp as filter? - [ ] What will be the impact on the benchmark / examples of Xerus? - [ ] In the first case, if there a lot of structures with no _cod_celltemp, an option might be of doing the filter post-query, and keeping the structures that have no celltemp

Fix dummy not being added when a certain element combination returns no data

opened on 2022-05-10 07:19:00 by pedrobcst

As of latest version, the "dummy" entry created into the database when no structures exist for a given element combination in any of the databases providers to avoid continuosly requerying that element combination is not being added anymore. This probably appeared after the testing method changed. Fix this.

Change streamlit from input to forms

opened on 2022-04-25 06:06:23 by pedrobcst

Change the input paramaters to st.forms so we dont always reload the app.

Deprecate 'tcif.py' and move testing to not be a script

opened on 2022-04-19 01:27:39 by pedrobcst

As of PR #24, we do not do test refinements anymore. Since things became much stabler and there is no more errors that breaks and requires a totally script rerunning, the purpose of tcif.py can be moved elsewhere.

TODO: - [ ] Move testing to be a simple function that does the 'loading' internally - [ ] Remove tcif.py

Change CI to be OS indepedent

opened on 2022-04-18 10:16:51 by pedrobcst

As of possible new release (1.1b), we might support all OSes. In light of this, it might be necessary to update the CI to test in all oses.

This (hypothetically) might work [basically move to conda for enviroment management for CI]:

  • [ ] Create enviroment file for conda that install all Xerus dependencies
  • [ ] Create new enviroment using this file, so there is no need to do apt-get to install gfortran etc (it will come with conda)
  • [ ] Set the tests to run on windows-latests / macos-latest

Allow to use first run simulations for subsequent runs of a given element space

opened on 2022-03-28 04:44:22 by pedrobcst

Currently, everytime the analyze function is ran (even if the same paremeters), Xerus will re-simulate, re-query the database. This can be time consuming, and actually makes the sometimes needed iterative process of hyperparameter tuning (ie, g, delta, n_runs, provider settings and so on) time consuming. In light of this problem, the following changes are needed:

  • [ ] After the first run, keep track of the already simulated patterns / correlations. (should be almost there..)
  • [ ] If this simulations already exist, use them instead of re simulating / re - querying
  • [ ] Keep a 'safe' information of simulations / structures that can always be accessed to be used for filtering purposes


Materials Project Update 2022-12-08 05:44:40

This release brings an update to both pymatgen and updates the Materials Project to use it latest API, which will be the database going forward that will receive continuous updates from the MP team.

v1.1b 2022-04-25 02:04:56

This is a pre-release version of v1.1b that introduces partiall support to Windows OS, and add the Beta streamlit interface

Mostly add GSASII windows binaries that seems to work straight out of the box.

Phase identification (main purpose of Xerus) is working. Optimization is not working and that might not be supported. (Windows issues with multiprocessing..?)

Also, officialy adds the streamlit BETA interface. This interface allows to manipulate Xerus through an easy to use interface. Not as flexible as jupyter, but it provides as option to host it in a server for users to use it directly through browser.

version 1.1a 2022-04-15 10:35:59

This release changes how the querying works. For already existing users, it will be necessary to make the 'id' field of your mongo unique manually.

Supposedly, it should allow for concurrent users to use the same Xerus installation, and this is mostly to provide support for the possible scenario of one server running the Streamlit interface (to be finished soon), and many researchers analyzing their data on this interface conconrruently possible analyzing data with overlapping chemical spaces.

Also, makes testing faster.

Please see #26 for detailed changes.

What's Changed

  • Fix optimade requests by @pedrobcst in https://github.com/pedrobcst/Xerus/pull/25
  • New testing by @pedrobcst in https://github.com/pedrobcst/Xerus/pull/24
  • Added support for multiple uses querying at same time (experimental..) by @pedrobcst in https://github.com/pedrobcst/Xerus/pull/26

Full Changelog: https://github.com/pedrobcst/Xerus/compare/v1.1...v1.1a

version 1.1 2022-03-25 01:01:04

This is the release of v1.1 of Xerus. Few things have changed most notably: (Thanks to ml-evs) - Added self pip installation, allowing the path hacks to be removed and the package be used from anywhere - Started adding CI - Added a way to load old search results - Added support to structure query through OPTIMADE (which is default for OQMD now) - Restricted structure volume size coming from COD - Phased out AFLOW

revision 1 2022-01-18 06:55:15

What's Changed

Changed the way we treat peak positions changes from lattice constants + zero-point error to lattice constants + sample displacement. Update the benchmarks and examples to reflect new change.

2021-12-10 01:52:33

Released first open version

Pedro B. C

phd student @ University of Tsukuba / National Institute for Materials Science (NIMS)

GitHub Repository

xray-diffraction materials-informatics materials-science