Pyzstd module provides classes and functions for compressing and decompressing data, using Facebook's Zstandard <http://www.zstd.net>
_ (or zstd as short name) algorithm.
The API style is similar to Python's bz2/lzma/zlib modules.
this note <https://pyzstd.readthedocs.io/en/latest/#build-pyzstd>
_.python -m pyzstd --help
Documentation: https://pyzstd.readthedocs.io/en/latest
GitHub: https://github.com/animalize/pyzstd
0.15.4 (Feb 24, 2023)
v1.5.4 <https://github.com/facebook/zstd/releases/tag/v1.5.4>
_. v1.5.3 is a non-public release.pyproject.toml
build mechanism (PEP-517). Note that specifying build options in old way may be invalid, see doc <https://pyzstd.readthedocs.io/en/latest/#build-pyzstd>
_.0.15.3 (Aug 3, 2022)
Fix ZstdError
object can't be pickled.
0.15.2 (Jan 22, 2022)
Update bundled zstd source code from v1.5.1 to v1.5.2 <https://github.com/facebook/zstd/releases/tag/v1.5.2>
_.
0.15.1 (Dec 25, 2021)
v1.5.1 <https://github.com/facebook/zstd/releases/tag/v1.5.1>
_.ZstdFile.write()
/ train_dict()
/ finalize_dict()
may use wrong length for some buffer protocol objects, see this issue <https://github.com/animalize/pyzstd/issues/4>
_.* Setting ``CParameter.nbWorkers`` to ``1`` now means "1-thread multi-threaded mode", rather than "single-threaded mode".
* If the underlying zstd library doesn't support multi-threaded compression, no longer automatically fallback to "single-threaded mode", now raise a ``ZstdError`` exception.
zstd_support_multithread <https://pyzstd.readthedocs.io/en/latest/#zstd_support_multithread>
_.--avx2
, see this note <https://pyzstd.readthedocs.io/en/latest/#build-pyzstd>
_.0.15.0 (May 18, 2021)
v1.5.0 <https://github.com/facebook/zstd/releases/tag/v1.5.0>
_.0.14.4 (Mar 24, 2021)
0.14.3 (Mar 4, 2021)
Update bundled zstd source code from v1.4.8 to v1.4.9 <https://github.com/facebook/zstd/releases/tag/v1.4.9>
_.
0.14.2 (Feb 24, 2021)
compress_stream() <https://pyzstd.readthedocs.io/en/latest/#compress_stream>
, decompress_stream() <https://pyzstd.readthedocs.io/en/latest/#decompress_stream>
.0.14.1 (Dec 19, 2020)
v1.4.8 <https://github.com/facebook/zstd/releases/tag/v1.4.8>
_.* v1.4.6 is a non-public release for Linux kernel.
* v1.4.8 is a hotfix for `v1.4.7 <https://github.com/facebook/zstd/releases/tag/v1.4.7>`_.
0.13.0 (Nov 7, 2020)
ZstdDecompressor
class: now it has the same API and behavior as BZ2Decompressor / LZMADecompressor classes in Python standard library, it stops after a frame is decompressed.EndlessZstdDecompressor
class, it accepts multiple concatenated frames. It is renamed from previous ZstdDecompressor
class, but .at_frame_edge
is True
when both the input and output streams are at a frame edge.zstd_open()
function to open()
, consistent with Python standard library.decompress()
function:* ~9% faster when: there is one frame, and the decompressed size was recorded in frame header.
* raises ZstdError when input **or** output data is not at a frame edge. Previously, it only raise for output data is not at a frame edge.
0.12.5 (Oct 12, 2020)
No longer use Argument Clinic <https://docs.python.org/3/howto/clinic.html>
_, now supports Python 3.5+, previously 3.7+.
0.12.4 (Oct 7, 2020)
It seems the API is stable.
0.2.4 (Sep 2, 2020)
The first version upload to PyPI.
Includes zstd v1.4.5 <https://github.com/facebook/zstd/releases/tag/v1.4.5>
_ source code.
Find project collaborator(s) to release new versions in my absence.
About 2~3 versions of zstd
are released every year, and a new version of pyzstd
needs to be released at this time.
I recently changed the status of the project from Beta to Stable. As I said in https://github.com/animalize/pyzstd/pull/3#issuecomment-825365829, there is basically no need for other maintenance work:
I used to spend time checking such details, and manually triggering exceptions to see if them can be handled correctly. Once the development of pyzstd module is completed, almost no maintenance is needed. Basically just update the zstd source code, and use new API in major version updates.
Other precautions have been written in tech memo. If you are interested, I can explain more, such as what I have tried.
Please ensure that:
This module was originally written for Python stdlib:
https://github.com/animalize/cpython/pull/8/files
And use a script to convert the code to this pyzstd
module.
After Oct-20-2020, all development were transferred to this module, and no longer use CPython's internal feature: argument clinic. Now only use CPython's public API for C extension.
In mid-March 2021, the code seems stable, then add a CFFI implementation.
After exploring some API/implementation changes, always return to "now is better". So in Jan 2023, change Development Status from Beta to Stable. It has exceeded its stdlib brothers a lot.
Compare to zstandard
/zstd
modules: https://github.com/animalize/pyzstd/discussions/19#discussioncomment-4702814
Some links:
[Feature Request] Add zstd module to stdlib, on Python issue tracker. https://bugs.python.org/issue37095
A discussion about adding zstd to Python standard library, on Python-Ideas mail-list. https://mail.python.org/archives/list/[email protected]/thread/VQIFA7WTNRAOYZGTVP4WZC2CD36KYIVY/
Include zstd library source code, without any changes.
Zstd lib source code is in zstd/
folder, if someone wants to upgrade/downgrade the bundled zstd lib, just replace this folder.
The code supports zstd v1.4.0+ (released in Apr 2019).
Only use zstd's "stable" API, don't use "experimental" API.
Means don't #define ZSTD_STATIC_LINKING_ONLY
.
When statically linking to zstd lib, use ZSTD_MULTITHREAD
build macro (in setup.py
) for enabling multi-threaded compression. MT is enabled by default in zstd v1.5.0+, pyzstd
still define it for zstd v1.4.x.
No more zstd macros are defined except this one.
See this note: https://pyzstd.readthedocs.io/en/latest/#build-pyzstd
The API is similar to Python's bz2/lzma/zlib module.
Try to make all major functionalities provided by zstd "stable" API can be used.
π΄ If "skippable frame" is used more, related API may be added. (unlikely. It's not difficult to implement "skippable frame" functions at user side.)
π’ When ZSTD_c_stableInBuffer
parameter is moved from "experimental" API to "stable" API, it can be used to speed up .FLUSH_FRAME
compression if (.last_mode == .FLUSH_FRAME
).
No plan to use ZSTD_c_stableOutBuffer
, because it raises an error when the output buffer is not enough.
(likely)
π’ When ZSTD_getFrameHeader()
function is moved from "experimental" API to "stable" API, more items can be added to get_frame_info()
function.
(likely)
π When ZSTD_d_refMultipleDDicts
parameter is moved from "experimental" API to "stable" API, zstd_dict
parameter may accepts a tuple that contains multiuple dictionaries.
(not very likely, few people use it, and it makes the API complex a bit. This functionality can be implemented via get_frame_info()
function and dispatching to different decompressors.)
π’When ZDICT_finalizeDictionary()
support training dict (no custom dict), the first arg can be None
:
finalize_dict(zstd_dict, samples, dict_size, level)
Compare to train_dict(samples, dict_size)
, it can specify level.
(likely)
π’~~Use multi-phase init when it matures, then pyzstd module can support CPython sub-interpreters.~~ (implemented in 0.15.4, support subclass well.)
Depends on the progresses of CPython:
- Subinterpreters for Python, https://lwn.net/Articles/820424/
- METH_METHOD
flag metioned in PEP 573 can be used with more flag, otherwise have to disable subclass for ZstdDict/Compressor/Decompressor. Maybe need to wait until at least 3.11.
PEP 489 -- Multi-phase extension module initialization PEP 573 -- Module State Access from C Extension Methods
π’ If the minimum version is 3.6:
- use f-string. Its performance is better than %
a bit. Currently string formatting is only used for exception message, so it's not a big problem.
- remove #include "stdint.h"
, #include "pythread.h"
in _zstdmodule.c
.
- try to add -fvisibility=hidden
compile option, it reduces ~12KiB .so size. see commit https://github.com/animalize/pyzstd/commit/ab21add1e8d9b93e90eb49e62811159846934178.
- remove this try...except
in __init__.py
, and related code in unit-test:
python
try:
from os import PathLike
except ImportError:
# For Python 3.5
class PathLike:
pass
π’ If the minimum version is 3.7:
- consider use METH_FASTCALL
- remove #define Py_UNREACHABLE() assert(0)
- remove this code in ZstdFile.read1()
:
python
if size < 0:
size = _32_KiB
π’ If the minimum version is 3.8:
- use :=
operator in ZstdFile, it's a bit faster.
π‘ ZstdDict.__init__(self, dict_content, is_raw=False)
When dict_content
is a normal dictionary, and set is_raw
to True, the dictionary is NOT treated as raw dictionary.
Very rare cases. If has magic number, it's probably a normal dict.
π‘ ~~When dynamically linking to zstd lib, compressionLevel_values.default
may be wrong, it uses the value of ZSTD_CLEVEL_DEFAULT
macro from zstd.h
.~~
Very rare cases. Very few people modify ZSTD_CLEVEL_DEFAULT
when building zstd lib.
Fixed when zstd_version >= 1.5 and pyzstd_version >= 0.15
zstd zstandard python