The python implementation of the MMTF API, decoder and encoder.

rcsb, updated 🕥 2022-07-07 21:24:10

Build Status Version License Changelog

The macromolecular transmission format (MMTF) is a binary encoding of biological structures.

This repository holds the Python 2 and 3 compatible API, encoding and decoding libraries.

The MMTF python API is available from pip: pip install mmtf-python

Quick getting started.

1) Get the data for a PDB structure and print the number of chains: ```python from mmtf import fetch

Get the data for 4CUP

decoded_data = fetch("4CUP") print("PDB Code: "+str(decoded_data.structure_id)+" has "+str(decoded_data.num_chains)+" chains") 2) Show the charge information for the first group:python print("Group name: "+str(decoded_data.group_list[0]["groupName"])+" has the following atomic charges: "+",".join([str(x) for x in decoded_data.group_list[0]["formalChargeList"]]))

3) Show how many bioassemblies it has:python print("PDB Code: "+str(decoded_data.structure_id)+" has "+str(len(decoded_data.bio_assembly))+" bioassemblies") ```


Please add to README how to run tests

opened on 2022-09-30 03:21:51 by yurivict None

Output `mmtf` uses 64bit floats which violates the mmtf specification.

opened on 2021-05-07 05:58:30 by zacharyrs

The specification outlines the float type as 32bit. Python has 64bit floats, hence when packing these per the template are dumped to the output file. Other parsers (e.g. mmtf-java) try to load these as 32bit floats, and hence fail. We can overcome this easily by updating the msgpack.packb call to include use_single_float=True.

However, it seems mmtf-java also violates the standard, and uses doubles (64bit floats) for the ncsOperatorList, thus the above change means it can't parse the output still. Given mmtf-java is used for the RCSB files, we can assume they won't shift to 32bit floats - it'll break their parsing for even more files.

Additionally, the msgpack-python implementation does not support selecting doubles for only one field - Instead you have to pack the biological assemblies list separately and then combine it, as in the collapsed snipped below.

Code for packing separately. ```python # The mmtf standard expects everything as 32bit - hence use_single_float. # Note the encode_data no longer includes bioAssemblyList. main = msgpack.packb(self.encode_data(), use_bin_type=True, use_single_float=True) # Assemblies need to be 64bit for Java compatibility. assemblies = msgpack.packb( {"bioAssemblyList": self.bio_assembly}, use_bin_type=True, use_single_float=False, ) # In msgpack, the first three bytes of a map (over 15 elements) are `\xde\x12\x34`, where # 1234 gives the map length. # Our `main` map has 30-something elements, hence only the `\x34` matters. # Get the new length indicator, prepended with the map indicator and a `\x00`. new_map_length: bytes = b"\xde\x00" + chr(main[2] + 1).encode() # Strip the first three bytes from `main` (the map indicator byte and two bytes for length). main = main[3:] # Strip the first byte from `assemblies` (it's less than 15 elements, has a single byte indicator). assemblies = assemblies[1:] # Finally put it all back together. new_data = new_map_length + main + assemblies ```

For reference I have raised this issue in the mmtf-java repo too -


v1.1.3 2022-07-06 02:43:51

upgraded msgpack to >=1.0.0 removed unused ipython dependency default entity description to empty string initialize title to None

Bug fix release 2018-05-21 18:43:31


  • msgpack to new 0.5.6 version

Bug fix release 2018-02-26 20:09:25


  • Add option to get reduced files from API #35

Bug fix release 2018-01-03 09:49:02

v1.0.10 - 2018-01-03 @kain88-de

Fixed - Don't leak open file handles #32

Bug fix release 2017-06-28 19:48:05

Bug fix by: @jonwedell

Bug fix release 2017-06-02 17:56:10

  • Resolve #22 #22

Github repository of the RCSB Protein Data Bank

GitHub Repository Homepage