<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Emma's Blog - compression</title><link>https://emmatyping.dev/</link><description/><atom:link href="https://emmatyping.dev/feeds/compression/rss.xml" rel="self"/><lastBuildDate>Tue, 11 Nov 2025 00:00:00 -0800</lastBuildDate><item><title>Decompression is up to 30% faster in CPython 3.15</title><link>https://emmatyping.dev/decompression-is-up-to-30-faster-in-cpython-315.html</link><description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;tl;dr&lt;/strong&gt;&lt;br&gt;
&lt;code&gt;compression.zstd&lt;/code&gt; is the fastest Python Zstandard bindings with Python 3.15. Changes to code managing output
buffers has led to a 25-30% performance uplift for Zstandard decompression and a 10-15% performance uplift for &lt;code&gt;zlib&lt;/code&gt;
for data at least 1 MiB in size. This has broad implications for e.g. faster wheel installations with pip and many
other use cases.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;Motivation&lt;/h2&gt;
&lt;p&gt;Since &lt;a href="https://peps.python.org/pep-0784/"&gt;landing Zstandard support in CPython&lt;/a&gt;, I wanted to explore
the performance of CPython's compression modules to ensure they were well-optimized. Furthermore, the maintainer of
&lt;a href="https://github.com/Rogdham/pyzstd/"&gt;pyzstd&lt;/a&gt; and &lt;a href="https://github.com/Rogdham/backports.zstd"&gt;backports.zstd&lt;/a&gt; (a backport of
&lt;code&gt;compression.zstd&lt;/code&gt; to Python versions before 3.14) benchmarked the new &lt;code&gt;compression.zstd&lt;/code&gt; module against 3rd party Zstandard
Python bindings such as &lt;a href="https://github.com/Rogdham/pyzstd/"&gt;pyzstd&lt;/a&gt;,
&lt;a href="https://github.com/indygreg/python-zstandard"&gt;zstandard&lt;/a&gt;, and &lt;a href="https://github.com/sergey-dryabzhinsky/python-zstd"&gt;zstd&lt;/a&gt;,
and found the standard library was slower than most other bindings!&lt;/p&gt;
&lt;p&gt;Let's take a closer look at &lt;a href="https://github.com/Rogdham/zstd-benchmark/blob/master/results/2025-09-22_linux.md"&gt;the benchmarks&lt;/a&gt;
and how to read them:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Figures give timing comparison. For example, +42% means that the library needs 42% more time than stdlib/backports.zstd.
The reference time column indicates an average time for a single run.&lt;/p&gt;
&lt;p&gt;Emoji scale: ❤️‍🩹 -25% 🟥 -15% 🔴 -5% ⚪ +5% 🟢 +15% 🟩 +25% 💚&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Okay, so hopefully we don't see a lot of red, meaning the reference standard library (stdlib) time is slower...&lt;/p&gt;
&lt;blockquote&gt;
&lt;h2&gt;CPython 3.14.0rc3&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Case&lt;/th&gt;
&lt;th&gt;stdlib&lt;/th&gt;
&lt;th&gt;pyzstd&lt;/th&gt;
&lt;th&gt;zstandard&lt;/th&gt;
&lt;th&gt;zstd&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;compress 1k level 3&lt;/td&gt;
&lt;td&gt;&amp;lt;1ms&lt;/td&gt;
&lt;td&gt;⚪ - 3.81%&lt;/td&gt;
&lt;td&gt;⚪ - 1.17%&lt;/td&gt;
&lt;td&gt;🟢 + 5.86%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;compress 1k level 10&lt;/td&gt;
&lt;td&gt;&amp;lt;1ms&lt;/td&gt;
&lt;td&gt;⚪ + 1.91%&lt;/td&gt;
&lt;td&gt;🟢 + 6.18%&lt;/td&gt;
&lt;td&gt;🟢 + 9.83%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;compress 1k level 17&lt;/td&gt;
&lt;td&gt;&amp;lt;1ms&lt;/td&gt;
&lt;td&gt;🟢 + 6.33%&lt;/td&gt;
&lt;td&gt;🟢 + 7.67%&lt;/td&gt;
&lt;td&gt;🟢 +12.92%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;compress 1M level 3&lt;/td&gt;
&lt;td&gt;7ms&lt;/td&gt;
&lt;td&gt;⚪ + 0.60%&lt;/td&gt;
&lt;td&gt;🔴 - 7.37%&lt;/td&gt;
&lt;td&gt;🟢 +12.08%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;compress 1M level 10&lt;/td&gt;
&lt;td&gt;27ms&lt;/td&gt;
&lt;td&gt;🟢 +10.39%&lt;/td&gt;
&lt;td&gt;⚪ + 3.39%&lt;/td&gt;
&lt;td&gt;🟢 +12.46%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;compress 1M level 17&lt;/td&gt;
&lt;td&gt;174ms&lt;/td&gt;
&lt;td&gt;⚪ - 2.48%&lt;/td&gt;
&lt;td&gt;⚪ - 3.91%&lt;/td&gt;
&lt;td&gt;⚪ + 0.08%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;compress 1G level 3&lt;/td&gt;
&lt;td&gt;6.03s&lt;/td&gt;
&lt;td&gt;🟩 +16.17%&lt;/td&gt;
&lt;td&gt;⚪ - 2.94%&lt;/td&gt;
&lt;td&gt;⚪ + 2.25%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;decompress 1k level 3&lt;/td&gt;
&lt;td&gt;&amp;lt;1ms&lt;/td&gt;
&lt;td&gt;🟥 -15.14%&lt;/td&gt;
&lt;td&gt;🔴 - 8.53%&lt;/td&gt;
&lt;td&gt;⚪ - 2.37%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;decompress 1k level 10&lt;/td&gt;
&lt;td&gt;&amp;lt;1ms&lt;/td&gt;
&lt;td&gt;🟥 -15.41%&lt;/td&gt;
&lt;td&gt;🔴 - 9.22%&lt;/td&gt;
&lt;td&gt;⚪ - 3.35%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;decompress 1k level 17&lt;/td&gt;
&lt;td&gt;&amp;lt;1ms&lt;/td&gt;
&lt;td&gt;🔴 -11.16%&lt;/td&gt;
&lt;td&gt;🔴 - 7.09%&lt;/td&gt;
&lt;td&gt;⚪ + 2.07%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;decompress 1M level 3&lt;/td&gt;
&lt;td&gt;1ms&lt;/td&gt;
&lt;td&gt;🔴 - 6.88%&lt;/td&gt;
&lt;td&gt;⚪ - 4.03%&lt;/td&gt;
&lt;td&gt;💚 +26.88%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;decompress 1M level 10&lt;/td&gt;
&lt;td&gt;1ms&lt;/td&gt;
&lt;td&gt;🔴 - 6.69%&lt;/td&gt;
&lt;td&gt;⚪ - 4.86%&lt;/td&gt;
&lt;td&gt;💚 +25.63%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;decompress 1M level 17&lt;/td&gt;
&lt;td&gt;1ms&lt;/td&gt;
&lt;td&gt;🔴 - 7.99%&lt;/td&gt;
&lt;td&gt;⚪ - 4.96%&lt;/td&gt;
&lt;td&gt;💚 +25.58%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;decompress 1G level 3&lt;/td&gt;
&lt;td&gt;1.49s&lt;/td&gt;
&lt;td&gt;🟥 -19.41%&lt;/td&gt;
&lt;td&gt;🟥 -17.58%&lt;/td&gt;
&lt;td&gt;🟢 + 6.98%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;decompress 1G level 10&lt;/td&gt;
&lt;td&gt;1.62s&lt;/td&gt;
&lt;td&gt;❤️‍🩹 -27.65%&lt;/td&gt;
&lt;td&gt;❤️‍🩹 -26.48%&lt;/td&gt;
&lt;td&gt;🔴 - 6.92%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;decompress 1G level 17&lt;/td&gt;
&lt;td&gt;1.67s&lt;/td&gt;
&lt;td&gt;🟥 -24.01%&lt;/td&gt;
&lt;td&gt;🟥 -23.04%&lt;/td&gt;
&lt;td&gt;⚪ - 4.43%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/blockquote&gt;
&lt;p&gt;Ouch. 10-25% slower is quite unfortunate! A silver lining is that most of the performance difference is in decompression,
so that narrows the area that is in need of optimization.&lt;/p&gt;
&lt;p&gt;After sitting down and thinking about it for a while, I came up with a few theories as to why &lt;code&gt;compression.zstd&lt;/code&gt; would
be slower compared to pyzstd and zstandard. My thinking was focused on noting differences in implementation I knew
existed between the various bindings. First, both pyzstd and zstandard build against their own copies of libzstd (the C
library implementing Zstandard compression and decompression). Meanwhile, CPython will build against the system-
installed libzstd, which is older on my system. Maybe there is a performance improvement in the newer libzstd
versions? Second, most of the performance difference is in decompression speed. Perhaps the implementation of
&lt;code&gt;compression.zstd.decompress()&lt;/code&gt; is inefficient? It uses multiple decompression instances to handle multi-frame input
where pyzstd uses one, so perhaps that's the issue? Finally, maybe the handling of output buffers is slow? When
decompressing data, CPython needs to provide an output buffer (location in memory to write to) to store the
uncompressed data. If the creation/allocation of that output buffer is slow it could bottleneck the decompressor.&lt;/p&gt;
&lt;h2&gt;Premature Optimizations&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;These optimizations didn't work, so if you'd like to skip to the optimizations which worked, please move to the next
section!&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I decided to tackle these one at a time. First, I built pyzstd and zstandard against the system libzstd. Unfortunately,
after re-running the benchmark, this yielded zero performance difference. Darn.&lt;/p&gt;
&lt;p&gt;Next, I was pretty confident that &lt;code&gt;compression.zstd.decompress()&lt;/code&gt; was at least partially the culprit of the worse
performance. The &lt;a href="https://github.com/python/cpython/blob/95f6e1275b1c9de550d978cb2b4351cc4ed24fe4/Lib/compression/zstd/__init__.py#L152-L172"&gt;current &lt;code&gt;decompress()&lt;/code&gt; implementation&lt;/a&gt;
is written in Python and creates multiple decompression contexts and joins the results together. Surely that had to
lead to some performance degradation? I ended up re-implementing the &lt;code&gt;decompress()&lt;/code&gt; function in C using a single
decompression context to see if my theory was correct. To my chagrin, there was no performance uplift, and it may have
even performed &lt;em&gt;worse&lt;/em&gt;! For the curious, you can see &lt;a href="https://github.com/emmatyping/cpython/tree/zstd-decompress-in-c"&gt;my hacked together branch here&lt;/a&gt;.
Goes to show that you can never be sure about performance bottlenecks based on code itself!&lt;/p&gt;
&lt;h2&gt;Properly Profiling CPython&lt;/h2&gt;
&lt;p&gt;With my first two attempts at optimizing Zstandard decompression in CPython unsuccessful, I realized that I should do
what I probably should have done from the beginning: profile the code! I decided to use the
&lt;a href="https://docs.python.org/3/howto/perf_profiling.html"&gt;standard library support for the perf profiler&lt;/a&gt;, as it would
allow me to see both native/C frames such as inside libzstd or the bindings module &lt;code&gt;_zstd&lt;/code&gt;, as well as Python frames.&lt;/p&gt;
&lt;p&gt;So I went ahead and compiled CPython &lt;a href="https://docs.python.org/3/howto/perf_profiling.html#how-to-obtain-the-best-results"&gt;with some flags to improve perf data&lt;/a&gt;
and ran a simple script which called &lt;code&gt;compression.zstd.decompress()&lt;/code&gt; on a variety of data sizes. I highly recommend
reading the Python documentation about perf support for more details but essentially what I ran was:&lt;/p&gt;
&lt;div class="codehilite" style="background: #0d1117"&gt;&lt;pre style="line-height: 125%;"&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style="color: #8b949e; font-style: italic"&gt;# in a cpython checkout&lt;/span&gt;
./configure&lt;span style="color: #6e7681"&gt; &lt;/span&gt;--enable-optimizations&lt;span style="color: #6e7681"&gt; &lt;/span&gt;--with-lto&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff"&gt;CFLAGS&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;=&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;&amp;quot;-fno-omit-frame-pointer -mno-omit-leaf-frame-pointer&amp;quot;&lt;/span&gt;
make&lt;span style="color: #6e7681"&gt; &lt;/span&gt;-j&lt;span style="color: #ff7b72"&gt;$(&lt;/span&gt;nproc&lt;span style="color: #ff7b72"&gt;)&lt;/span&gt;
&lt;span style="color: #e6edf3"&gt;cd&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;../compression-benchmarks
perf&lt;span style="color: #6e7681"&gt; &lt;/span&gt;record&lt;span style="color: #6e7681"&gt; &lt;/span&gt;-F&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;9999&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;-g&lt;span style="color: #6e7681"&gt; &lt;/span&gt;-o&lt;span style="color: #6e7681"&gt; &lt;/span&gt;perf.data&lt;span style="color: #6e7681"&gt; &lt;/span&gt;../cpython/python&lt;span style="color: #6e7681"&gt; &lt;/span&gt;-X&lt;span style="color: #6e7681"&gt; &lt;/span&gt;perf&lt;span style="color: #6e7681"&gt; &lt;/span&gt;profile_zstd.py
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;After analyzing the profile with &lt;code&gt;perf report --stdio -n -g&lt;/code&gt;, I noticed a significant bottleneck in the output buffer
management code! Let's take a brief detour to discuss what the output buffer management code does and why it was the
decompression bottleneck.&lt;/p&gt;
&lt;h2&gt;(Fast) Buffer Handling is Hard&lt;/h2&gt;
&lt;p&gt;When decompressing data, you feed the decompressor (libzstd in our case) a buffer (&lt;code&gt;bytes&lt;/code&gt; in Python) that is then
decompressed and needs to be written to a new buffer. Since this all happens in C, basically we need to allocate some
memory for libzstd to write the decompressed data into. But how much memory? Well, in many cases, we don't know! So we
need to dynamically resize the output buffer as it is filled up.&lt;/p&gt;
&lt;p&gt;This is actually a pretty challenging problem because there are several constraints and considerations to be made. The
buffer management needs to be fast for a variety of output buffer sizes. If you allocate too much memory up front,
you'll waste time allocating unused memory and slow down decompressing small amounts of data. On the other hand, if you
don't allocate enough, you'll have to make a lot of calls to the allocator, which will also slow things down as each
allocation has overhead and leads to fragmenting the output data. The memory should not grow exponentially for large
outputs, otherwise you could run out of memory for tasks that would normally fit into memory. Finally, each output from
the decompressor can vary in size, given that it may need to buffer data internally.&lt;/p&gt;
&lt;p&gt;Because of the complexity in managing an output buffer, there is code shared across compression modules in CPython to
manage the buffer. This code lives in
&lt;a href="https://github.com/python/cpython/blob/404425575c68bef9d2f042710fc713134d04c23f/Include/internal/pycore_blocks_output_buffer.h"&gt;pycore_blocks_output_buffer.h&lt;/a&gt;.
The code was &lt;a href="https://github.com/python/cpython/commit/f9bedb630e8a0b7d94e1c7e609b20dfaa2b22231"&gt;modified four years ago&lt;/a&gt;
to use an implementation which writes to a series of &lt;code&gt;bytes&lt;/code&gt; objects stored in a &lt;code&gt;list&lt;/code&gt; to hold the output of
decompress calls. When finished, the bytes objects get concatenated together in &lt;code&gt;_BlocksOutputBuffer_Finish&lt;/code&gt;,
returning the final &lt;code&gt;bytes&lt;/code&gt; object containing the decompressed data. When profiling Zstandard decompression, I found
that greater than 50% (!) of decompression time was spent in &lt;code&gt;_BlocksOutputBuffer_Finish&lt;/code&gt;! This seemed inordinately
long, ideally this function should just be a few &lt;code&gt;memcpy&lt;/code&gt;s. So with this knowledge in hand, I tried to think of how
best to optimize the output buffer code.&lt;/p&gt;
&lt;h2&gt;Sometimes Timing Works Out&lt;/h2&gt;
&lt;p&gt;Right around the time that I was working on this, &lt;a href="https://peps.python.org/pep-0782/"&gt;PEP 782&lt;/a&gt; was accepted. This PEP
introduces a new &lt;code&gt;PyBytesWriter&lt;/code&gt; API to CPython which makes it easier to incrementally build up &lt;code&gt;bytes&lt;/code&gt; data in a safe
and performant way at the Python C API level. It seemed like a natural fit for what the blocks output buffer code was
doing, so I wanted to experiment with using it for the output buffer code. After modifying
&lt;code&gt;pycore_blocks_output_buffer.h&lt;/code&gt; to use &lt;code&gt;PyBytesWriter&lt;/code&gt;, I re-ran the original benchmark to see if we had closed the
performance gap:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Note: this benchmark was run on my local machine and the wall times are not comparable to the previous benchmark.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Case&lt;/th&gt;
&lt;th&gt;stdlib&lt;/th&gt;
&lt;th&gt;zstandard&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;compress 1k level 3&lt;/td&gt;
&lt;td&gt;&amp;lt;1ms&lt;/td&gt;
&lt;td&gt;💚 +61.02%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;compress 1k level 10&lt;/td&gt;
&lt;td&gt;&amp;lt;1ms&lt;/td&gt;
&lt;td&gt;💚 +57.77%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;compress 1k level 17&lt;/td&gt;
&lt;td&gt;&amp;lt;1ms&lt;/td&gt;
&lt;td&gt;💚 +364.86%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;compress 1M level 3&lt;/td&gt;
&lt;td&gt;5ms&lt;/td&gt;
&lt;td&gt;💚 +40.02%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;compress 1M level 10&lt;/td&gt;
&lt;td&gt;32ms&lt;/td&gt;
&lt;td&gt;⚪ - 0.99%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;compress 1M level 17&lt;/td&gt;
&lt;td&gt;126ms&lt;/td&gt;
&lt;td&gt;🟩 +15.93%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;compress 1G level 3&lt;/td&gt;
&lt;td&gt;4.47s&lt;/td&gt;
&lt;td&gt;💚 +48.69%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;decompress 1k level 3&lt;/td&gt;
&lt;td&gt;&amp;lt;1ms&lt;/td&gt;
&lt;td&gt;⚪ + 4.67%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;decompress 1k level 10&lt;/td&gt;
&lt;td&gt;&amp;lt;1ms&lt;/td&gt;
&lt;td&gt;⚪ + 4.79%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;decompress 1k level 17&lt;/td&gt;
&lt;td&gt;&amp;lt;1ms&lt;/td&gt;
&lt;td&gt;🟢 + 5.38%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;decompress 1M level 3&lt;/td&gt;
&lt;td&gt;1ms&lt;/td&gt;
&lt;td&gt;💚 +50.23%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;decompress 1M level 10&lt;/td&gt;
&lt;td&gt;1ms&lt;/td&gt;
&lt;td&gt;💚 +41.94%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;decompress 1M level 17&lt;/td&gt;
&lt;td&gt;1ms&lt;/td&gt;
&lt;td&gt;💚 +47.37%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;decompress 1G level 3&lt;/td&gt;
&lt;td&gt;1.80s&lt;/td&gt;
&lt;td&gt;🟢 +12.87%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;decompress 1G level 10&lt;/td&gt;
&lt;td&gt;1.77s&lt;/td&gt;
&lt;td&gt;🟢 +12.54%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;decompress 1G level 17&lt;/td&gt;
&lt;td&gt;1.80s&lt;/td&gt;
&lt;td&gt;🟢 + 8.76%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/blockquote&gt;
&lt;p&gt;WOW! Not only have we closed the gap, &lt;code&gt;compression.zstd&lt;/code&gt; is now &lt;em&gt;faster&lt;/em&gt; than the popular zstandard 3rd-party module.&lt;/p&gt;
&lt;h2&gt;Validating Our Results&lt;/h2&gt;
&lt;p&gt;Wanting to validate the speedup, I decided to write up my own minimal benchmark suite at this point too, to compare
between revisions of the standard library code and use &lt;a href="https://pyperf.readthedocs.io/en/latest/"&gt;&lt;code&gt;pyperf&lt;/code&gt;&lt;/a&gt;,
a benchmarking toolkit used in the venerable &lt;a href="https://github.com/python/pyperformance"&gt;pyperformance benchmark suite&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;So I went ahead and wrote up a &lt;a href="https://github.com/emmatyping/compression-benchmarks/blob/fab8806f3af89b369e40e77be291dd37f3223b7c/bench_zstd.py"&gt;benchmark for zstd&lt;/a&gt;
which tests compression and decompression using default parameters for sizes 1 KiB, 1 MiB, and 1 GiB. I ran these
benchmarks on main and my branch which uses &lt;code&gt;PyBytesWriter&lt;/code&gt;.&lt;/p&gt;
&lt;div class="codehilite" style="background: #0d1117"&gt;&lt;pre style="line-height: 125%;"&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style="color: #e6edf3"&gt;zstd.&lt;/span&gt;&lt;span style="color: #d2a8ff; font-weight: bold"&gt;compress&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;(&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;K)&lt;/span&gt;&lt;span style="color: #f85149"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;Mean&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;std&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;dev&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;main_zstd_3&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;3.01&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;us&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0.03&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;us&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;-&amp;gt;&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;pybyteswriter_zstd_3&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;3.00&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;us&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0.03&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;us&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.01&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;x&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;faster&lt;/span&gt;
&lt;span style="color: #e6edf3"&gt;zstd.&lt;/span&gt;&lt;span style="color: #d2a8ff; font-weight: bold"&gt;compress&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;(&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;M)&lt;/span&gt;&lt;span style="color: #f85149"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;Mean&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;std&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;dev&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;main_zstd_3&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;2.92&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;ms&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0.02&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;ms&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;-&amp;gt;&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;pybyteswriter_zstd_3&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;2.89&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;ms&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0.02&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;ms&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.01&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;x&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;faster&lt;/span&gt;
&lt;span style="color: #e6edf3"&gt;zstd.&lt;/span&gt;&lt;span style="color: #d2a8ff; font-weight: bold"&gt;compress&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;(&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;G)&lt;/span&gt;&lt;span style="color: #f85149"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;Mean&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;std&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;dev&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;main_zstd_3&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;2.72&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;sec&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0.01&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;sec&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;-&amp;gt;&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;pybyteswriter_zstd_3&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;2.67&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;sec&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0.01&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;sec&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.02&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;x&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;faster&lt;/span&gt;
&lt;span style="color: #e6edf3"&gt;zstd.&lt;/span&gt;&lt;span style="color: #d2a8ff; font-weight: bold"&gt;decompress&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;(&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;K)&lt;/span&gt;&lt;span style="color: #f85149"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;Mean&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;std&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;dev&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;main_zstd_3&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.40&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;us&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0.01&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;us&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;-&amp;gt;&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;pybyteswriter_zstd_3&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.38&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;us&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0.01&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;us&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.01&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;x&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;faster&lt;/span&gt;
&lt;span style="color: #e6edf3"&gt;zstd.&lt;/span&gt;&lt;span style="color: #d2a8ff; font-weight: bold"&gt;decompress&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;(&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;M)&lt;/span&gt;&lt;span style="color: #f85149"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;Mean&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;std&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;dev&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;main_zstd_3&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;734&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;us&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;4&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;us&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;-&amp;gt;&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;pybyteswriter_zstd_3&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;546&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;us&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;3&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;us&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.34&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;x&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;faster&lt;/span&gt;
&lt;span style="color: #e6edf3"&gt;zstd.&lt;/span&gt;&lt;span style="color: #d2a8ff; font-weight: bold"&gt;decompress&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;(&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;G)&lt;/span&gt;&lt;span style="color: #f85149"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;Mean&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;std&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;dev&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;main_zstd_3&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;790&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;ms&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;4&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;ms&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;-&amp;gt;&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;pybyteswriter_zstd_3&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;634&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;ms&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;3&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;ms&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.25&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;x&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;faster&lt;/span&gt;

&lt;span style="color: #e6edf3"&gt;Geometric&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;mean&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.10&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;x&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;faster&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;For input sizes great than 1 MiB that's 25-30% faster decompression! In hindsight, this actually makes sense if you
consider that libzstd's decompression implementation is exceptionally fast.
&lt;a href="https://github.com/inikep/lzbench"&gt;lzbench&lt;/a&gt;, a popular compression library benchmark, found that libzstd can
decompress data at greater than 1 GiB/s. This is much faster than bz2, lzma, or zlib, the other compression modules in
the standard library. One of the motivations for adding Zstandard to CPython was it's performance. So it is not too
surprising that the output buffer code would be a bottleneck, given that the existing compression libraries don't write
as quickly to the output buffer. This also explains why compression isn't faster after changing the output buffer
code. Compression is very CPU intensive so more time is spent in the compressor rather than writing to the output
buffer. This also explains why the speedup is non-existent for decompressing 1 KiB of data - the first 32 KiB block that
is allocated is plenty to store all of the output data, meaning all of the time is spent in the decompressor.&lt;/p&gt;
&lt;p&gt;One final validation I wished to do was to check the performance of &lt;code&gt;zlib&lt;/code&gt;, to ensure that the change did not regress
performance for other standard library compression modules. I wrote
&lt;a href="https://github.com/emmatyping/compression-benchmarks/blob/fab8806f3af89b369e40e77be291dd37f3223b7c/bench_zlib.py"&gt;a similar benchmark for zlib&lt;/a&gt;
to the one I wrote for zstd, and found that there was also a performance increase with the output buffer change!&lt;/p&gt;
&lt;div class="codehilite" style="background: #0d1117"&gt;&lt;pre style="line-height: 125%;"&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style="color: #e6edf3"&gt;zlib.&lt;/span&gt;&lt;span style="color: #d2a8ff; font-weight: bold"&gt;compress&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;(&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;M)&lt;/span&gt;&lt;span style="color: #f85149"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;Mean&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;std&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;dev&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;main&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;13.5&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;ms&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0.1&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;ms&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;-&amp;gt;&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;pybyteswriter&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;13.4&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;ms&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0.0&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;ms&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.00&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;x&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;faster&lt;/span&gt;
&lt;span style="color: #e6edf3"&gt;zlib.&lt;/span&gt;&lt;span style="color: #d2a8ff; font-weight: bold"&gt;compress&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;(&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;G)&lt;/span&gt;&lt;span style="color: #f85149"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;Mean&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;std&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;dev&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;main&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;11.4&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;sec&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0.0&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;sec&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;-&amp;gt;&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;pybyteswriter&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;11.3&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;sec&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0.0&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;sec&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.00&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;x&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;faster&lt;/span&gt;
&lt;span style="color: #e6edf3"&gt;zlib.&lt;/span&gt;&lt;span style="color: #d2a8ff; font-weight: bold"&gt;decompress&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;(&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;K)&lt;/span&gt;&lt;span style="color: #f85149"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;Mean&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;std&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;dev&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;main&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.42&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;us&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0.01&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;us&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;-&amp;gt;&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;pybyteswriter&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.39&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;us&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0.01&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;us&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.02&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;x&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;faster&lt;/span&gt;
&lt;span style="color: #e6edf3"&gt;zlib.&lt;/span&gt;&lt;span style="color: #d2a8ff; font-weight: bold"&gt;decompress&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;(&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;M)&lt;/span&gt;&lt;span style="color: #f85149"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;Mean&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;std&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;dev&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;main&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.29&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;ms&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0.00&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;ms&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;-&amp;gt;&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;pybyteswriter&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.17&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;ms&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0.00&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;ms&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.10&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;x&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;faster&lt;/span&gt;
&lt;span style="color: #e6edf3"&gt;zlib.&lt;/span&gt;&lt;span style="color: #d2a8ff; font-weight: bold"&gt;decompress&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;(&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;G)&lt;/span&gt;&lt;span style="color: #f85149"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;Mean&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;std&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;dev&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;main&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.36&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;sec&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0.00&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;sec&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;-&amp;gt;&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;pybyteswriter&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.17&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;sec&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0.00&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;sec&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.17&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;x&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;faster&lt;/span&gt;

&lt;span style="color: #e6edf3"&gt;Benchmark&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;hidden&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;because&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;not&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;significant&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;(&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;)&lt;/span&gt;&lt;span style="color: #f85149"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;zlib.&lt;/span&gt;&lt;span style="color: #d2a8ff; font-weight: bold"&gt;compress&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;(&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;K)&lt;/span&gt;

&lt;span style="color: #e6edf3"&gt;Geometric&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;mean&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.05&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;x&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;faster&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;10-15% faster decompression on data of at least 1 MiB for zlib is pretty significant, especially when you consider that
zlib is used by pip to unpack files in almost every wheel package Python users install.&lt;/p&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;With the improvements to output buffer handling, I was not only able to improve the performance of &lt;code&gt;compression.zstd&lt;/code&gt;,
but all of the compression module's decompression code. After stumbling over a few optimization ideas, I definitely
learned my lesson to profile code before jumping to conclusions! You won't know what is a real bottleneck unless you
can test it! Just having a benchmark is not enough!&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/python/cpython/issues/139877"&gt;The original issue I opened&lt;/a&gt; goes into a bit more detail about the
process of benchmarking the compression modules, and &lt;a href="https://github.com/python/cpython/commit/f262297d525e87906c5e4ab28e80284189641c9e"&gt;the commit with the improvement&lt;/a&gt;
has the diff of changes to adopt &lt;code&gt;PyBytesWriter&lt;/code&gt;. One thing I'm proud of is that not only did the change improve
performance, it also simplifies the implementation of the output buffer code and removed 60 lines of code in the
process!&lt;/p&gt;
&lt;p&gt;I did some more profiling of zlib to see if there were any more performance gains to be made, but the profile I
gathered seems to indicate that 95+% of the time is spent in zlib's inflate implementation (with the rest in the
CPython VM), so there is little if any room for further optimization in CPython's bindings for zlib. I think this
is good, as it indicates Python users are getting the best performance they can in 3.15!&lt;/p&gt;
&lt;p&gt;Going forward, I am planning on profiling compression code more, but the vast majority of the time spent
there will probably be in the compressor since compression is so CPU intensive. Finally, I want to investigate
optimizations related to providing more information about the final size of the output data. In some cases the output
buffer is initialized to a small value and dynamically resized as output is produced, but ideally users would be able
to provide more information about their workflow and see a performance improvement over it. I have a lot of other ideas
related to compression I'd like to work on, check out &lt;a href="https://notes.emmatyping.dev/share/ossTODO"&gt;my OSS TODO list&lt;/a&gt;
for all of the random ideas I want to work on in the future!&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Emma Smith</dc:creator><pubDate>Tue, 11 Nov 2025 00:00:00 -0800</pubDate><guid>tag:emmatyping.dev,2025-11-11:/decompression-is-up-to-30-faster-in-cpython-315.html</guid><category>misc</category><category>python</category><category>compression</category><category>zstd</category></item><item><title>Finding a miscompilation in Rust/LLVM</title><link>https://emmatyping.dev/finding-a-miscompilation-in-rustllvm.html</link><description>&lt;p&gt;Among my friends I have a reputation for &lt;del&gt;causing&lt;/del&gt; stumbling across esoteric error messages. Whether that is &lt;code&gt;SSL read: I/O error: Success&lt;/code&gt; (caused by a layered SSH connection hangup on Windows), or that time I tried installing NixOS on my laptop and &lt;code&gt;os-prober&lt;/code&gt; failed to start (this was several years ago, so I am sure it is no longer an issue). I attribute these oddities to my curiosity, particularly around trying things that may or may not work and seeing if they do. Recently, I was trying to complete an item from &lt;a href="https://notes.emmatyping.dev/share/ossTODO"&gt;my OSS TODO list&lt;/a&gt; when I came across a bug that stumped me for several days. Turns out sometimes even compilers have bugs...&lt;/p&gt;
&lt;p&gt;My goal was to build CPython with Rust implementations of common compression libraries to see if the Rust libraries could be supported. &lt;strong&gt;C&lt;/strong&gt;Python relies on &lt;strong&gt;C&lt;/strong&gt; code to do many performance sensitive activities such as &lt;a href="https://docs.python.org/3.14/library/math.html"&gt;&lt;code&gt;math&lt;/code&gt;&lt;/a&gt; and &lt;a href="https://docs.python.org/3.14/library/compression.html"&gt;&lt;code&gt;compression&lt;/code&gt;&lt;/a&gt;. I had recently read about the &lt;a href="https://trifectatech.org/"&gt;Trifecta Tech Foundation&lt;/a&gt;'s initiative to re-write popular compression libraries in Rust. So far as of September 2025, they have pure-Rust re-implementations of &lt;a href="https://github.com/trifectatechfoundation/zlib-rs"&gt;zlib&lt;/a&gt; (the library used for zip and gzip files), and &lt;a href="https://github.com/trifectatechfoundation/libbzip2-rs"&gt;bzip2&lt;/a&gt; that are available for use.&lt;/p&gt;
&lt;p&gt;These Rust libraries not only bring increased memory safety, they're also &lt;a href="https://trifectatechfoundation.github.io/zlib-rs-bench/"&gt;as fast or faster than their C counter-parts&lt;/a&gt;. Additionally, zlib-rs is widely deployed in Firefox, to the point that it may have &lt;a href="https://github.com/trifectatechfoundation/zlib-rs/issues/306"&gt;tripped over a CPU hardware bug(!)&lt;/a&gt;. So I had confidence that at least zlib-rs would work out of the box.&lt;/p&gt;
&lt;p&gt;To add support for these libraries to CPython, I made &lt;a href="https://github.com/emmatyping/cpython/tree/build-with-rust-compression-libs"&gt;a branch with changes to the autoconf script&lt;/a&gt; to search for the Rust libraries through &lt;code&gt;pkg-config&lt;/code&gt;. I built &lt;a href="https://github.com/trifectatechfoundation/zlib-rs/tree/main/libz-rs-sys-cdylib"&gt;zlib-rs's C library&lt;/a&gt; with &lt;code&gt;RUSTFLAGS="-Ctarget-cpu=native"&lt;/code&gt; for maximum speed, and then pointed CPython's build process to the built zlib_rs library. Everything built just fine. Next, I wanted to run the CPython zlib test suite to verify zlib-rs was working correctly. I mostly did this to make sure I had built things properly, I had no doubts the tests would pass.&lt;/p&gt;
&lt;p&gt;&lt;img alt="A screenshot of test failures. The test_wbits and test_combine_no_iv tests in test_zlib failed." src="https://emmatyping.dev/static/zlib_test_failure.png" /&gt;&lt;/p&gt;
&lt;p&gt;And yet. I was shocked! zlib-rs is used in Firefox, cargo, and many other widely used tools and applications. Hard to believe it would have a glaring bug that would be surfaced by CPython's test suite. At first I assumed I had somehow made a mistake when building. I realized I had used my system zlib header when building, so maybe there was some weirdness with symbol compatibility?? No, re-building CPython pointing to the zlib-rs include directory didn't fix it.
I tried running &lt;code&gt;cargo test&lt;/code&gt; in the zlib-rs directory to make sure there wasn't something wrong I could catch there. No failures occurred.&lt;/p&gt;
&lt;p&gt;At this point I was convinced it was probably a bug with how I was building things, or a bug in the cdylib (Rust lingo for "C library") wrapping zlib-rs since test Rust tests passed but the tests in CPython failed. To make my testing simpler, I captured the state of the &lt;a href="https://github.com/python/cpython/blob/c50d794c7bb81f31d1b977e63d0faba0b926a168/Lib/test/test_zlib.py#L169-L174"&gt;&lt;code&gt;test_zlib.test_combine_no_iv&lt;/code&gt; test&lt;/a&gt; using PDB and wrote a C program which does the same thing as the test, with deterministic inputs:&lt;/p&gt;
&lt;div class="codehilite" style="background: #0d1117"&gt;&lt;pre style="line-height: 125%;"&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style="color: #8b949e; font-weight: bold; font-style: italic"&gt;#include&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #8b949e; font-style: italic"&gt;&amp;lt;stdio.h&amp;gt;&lt;/span&gt;
&lt;span style="color: #8b949e; font-weight: bold; font-style: italic"&gt;#include&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #8b949e; font-style: italic"&gt;&amp;lt;string.h&amp;gt;&lt;/span&gt;
&lt;span style="color: #8b949e; font-weight: bold; font-style: italic"&gt;#include&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #8b949e; font-style: italic"&gt;&amp;quot;zlib.h&amp;quot;&lt;/span&gt;

&lt;span style="color: #ff7b72"&gt;int&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #d2a8ff; font-weight: bold"&gt;main&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;()&lt;/span&gt;
&lt;span style="color: #e6edf3"&gt;{&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;    &lt;/span&gt;&lt;span style="color: #ff7b72"&gt;unsigned&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72"&gt;char&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;a[&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;32&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;=&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;{&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x88&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x64&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x15&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xce&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x5e&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x3b&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x8d&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x35&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;                        &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xdb&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xd2&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xb5&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xfa&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x8e&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xa7&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x73&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x10&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;                        &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x66&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x83&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x1b&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xd1&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xde&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x0f&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x25&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x86&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;                        &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xeb&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xe5&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x42&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x44&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xad&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x62&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xff&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x11&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;};&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;    &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;uInt&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;chk_a&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;=&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;crc32(&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;a,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;32&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;);&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;    &lt;/span&gt;&lt;span style="color: #ff7b72"&gt;unsigned&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72"&gt;char&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;b[&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;64&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;=&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;{&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x31&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xb8&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xce&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x94&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x4d&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x2b&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xb9&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x7e&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;                        &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xd5&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x81&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x7f&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xc2&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x40&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xbf&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x3d&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xa5&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;                        &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x25&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xa5&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xf9&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xdf&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x53&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x68&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xc4&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xf6&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;                        &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xbe&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x06&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x7d&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xf3&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xc7&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xdc&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x5b&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x84&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;                        &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xce&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xd2&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xb2&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xeb&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x87&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x62&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x60&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xe3&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;                        &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x10&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x05&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x64&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x59&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x15&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xc4&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x2d&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x78&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;                        &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xc8&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xf3&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x14&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x38&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x87&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x39&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xb3&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x58&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;                        &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xb5&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x95&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x07&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x25&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xd9&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xc1&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xac&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x04&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;};&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;    &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;uInt&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;chk_b&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;=&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;crc32(&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;b,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;64&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;);&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;    &lt;/span&gt;&lt;span style="color: #ff7b72"&gt;unsigned&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72"&gt;char&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;buff[&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;96&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;];&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;    &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;memcpy(buff,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;a,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;32&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;);&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;    &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;memcpy(buff&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;32&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;b,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;64&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;);&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;    &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;uInt&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;chk&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;=&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;crc32(&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;buff,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;96&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;);&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;    &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;uInt&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;chk_combine&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;=&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;crc32_combine(chk_a,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;chk_b,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;64&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;);&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;    &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;printf(&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;&amp;quot;chk (%u) = chk_combine (%u)? %s&lt;/span&gt;&lt;span style="color: #79c0ff"&gt;\n&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;&amp;quot;&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;chk,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;chk_combine,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;chk&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;==&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;chk_combine&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;?&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;&amp;quot;True&amp;quot;&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;&amp;quot;False&amp;quot;&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;);&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;    &lt;/span&gt;&lt;span style="color: #ff7b72"&gt;return&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;(&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;);&lt;/span&gt;
&lt;span style="color: #e6edf3"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This program also failed. Hm, okay, not an issue with CPython at least. I then translated the above test into Rust to add to the zlib-rs test suite, since the Rust tests passed. If it failed I could more easily debug the issue.&lt;/p&gt;
&lt;div class="codehilite" style="background: #0d1117"&gt;&lt;pre style="line-height: 125%;"&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;diff --git a/zlib-rs/src/crc32/combine.rs b/zlib-rs/src/crc32/combine.rs&lt;/span&gt;
&lt;span style="color: #79c0ff; font-weight: bold"&gt;index 40e3745..65c0143 100644&lt;/span&gt;
&lt;span style="color: #ffa198; background-color: #490202"&gt;--- a/zlib-rs/src/crc32/combine.rs&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+++ b/zlib-rs/src/crc32/combine.rs&lt;/span&gt;
&lt;span style="color: #79c0ff"&gt;@@ -66,6 +66,26 @@ mod test {&lt;/span&gt;

&lt;span style="color: #6e7681"&gt; &lt;/span&gt;   use crate::crc32;

&lt;span style="color: #56d364; background-color: #0f5323"&gt;+    #[test]&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+    fn test_crc32_combine_no_iv() {&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+        for _ in 0..1000 {&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+            let a: &amp;amp;[u8] = &amp;amp;[0x88, 0x64, 0x15, 0xce, 0x5e, 0x3b, 0x8d, 0x35, 0xdb, 0xd2, 0xb5, 0xfa, 0x8e, 0xa7, 0x73, 0x10, 0x66, 0x83, 0x1b, 0xd1, 0xde, 0x0f, 0x25, 0x86, 0xeb, 0xe5, 0x42, 0x44, 0xad, 0x62, 0xff, 0x11];&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+            let b: &amp;amp;[u8] = &amp;amp;[0x31, 0xb8, 0xce, 0x94, 0x4d, 0x2b, 0xb9, 0x7e, 0xd5, 0x81, 0x7f, 0xc2, 0x40, 0xbf, 0x3d, 0xa5, 0x25, 0xa5, 0xf9, 0xdf, 0x53, 0x68, 0xc4, 0xf6, 0xbe, 0x06, 0x7d, 0xf3, 0xc7, 0xdc, 0x5b, 0x84, 0xce, 0xd2, 0xb2, 0xeb, 0x87, 0x62, 0x60, 0xe3, 0x10, 0x05, 0x64, 0x59, 0x15, 0xc4, 0x2d, 0x78, 0xc8, 0xf3, 0x14, 0x38, 0x87, 0x39, 0xb3, 0x58, 0xb5, 0x95, 0x07, 0x25, 0xd9, 0xc1, 0xac, 0x04];&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+            let both: &amp;amp;[u8] = &amp;amp;[0x88, 0x64, 0x15, 0xce, 0x5e, 0x3b, 0x8d, 0x35, 0xdb, 0xd2, 0xb5, 0xfa, 0x8e, 0xa7, 0x73, 0x10, 0x66, 0x83, 0x1b, 0xd1, 0xde, 0x0f, 0x25, 0x86, 0xeb, 0xe5, 0x42, 0x44, 0xad, 0x62, 0xff, 0x11, 0x31, 0xb8, 0xce, 0x94, 0x4d, 0x2b, 0xb9, 0x7e, 0xd5, 0x81, 0x7f, 0xc2, 0x40, 0xbf, 0x3d, 0xa5, 0x25, 0xa5, 0xf9, 0xdf, 0x53, 0x68, 0xc4, 0xf6, 0xbe, 0x06, 0x7d, 0xf3, 0xc7, 0xdc, 0x5b, 0x84, 0xce, 0xd2, 0xb2, 0xeb, 0x87, 0x62, 0x60, 0xe3, 0x10, 0x05, 0x64, 0x59, 0x15, 0xc4, 0x2d, 0x78, 0xc8, 0xf3, 0x14, 0x38, 0x87, 0x39, 0xb3, 0x58, 0xb5, 0x95, 0x07, 0x25, 0xd9, 0xc1, 0xac, 0x04];&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+            let chk_a = crc32(0, &amp;amp;a);&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+            assert_eq!(chk_a, 101488544);&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+            let chk_b = crc32(0, &amp;amp;b);&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+            assert_eq!(chk_b, 2995985109);&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+            let combined = crc32_combine(chk_a, chk_b, 64);&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+            assert_eq!(combined, 2546675245);&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+            let chk_both = crc32(0, &amp;amp;both);&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+            assert_eq!(chk_both, 3010918023);&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+            assert_eq!(combined, chk_both);&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+        }&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+    }&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+&lt;/span&gt;
&lt;span style="color: #6e7681"&gt; &lt;/span&gt;   #[test]
&lt;span style="color: #6e7681"&gt; &lt;/span&gt;   fn test_crc32_combine() {
&lt;span style="color: #6e7681"&gt; &lt;/span&gt;       ::quickcheck::quickcheck(test as fn(_) -&amp;gt; _);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Running &lt;code&gt;cargo test&lt;/code&gt; passed! I was at my wits end! How could the C code fail but the Rust code succeed??&lt;/p&gt;
&lt;p&gt;I felt like I had enough information that I reported the issue to zlib-rs. Let me interrupt this story to mention that I really want to thank Folkert de Vries (maintainer of zlib-rs) for help debugging this. They were extremely friendly and helpful in figuring out what was going wrong. Folkert responded to my issue that my C program sample works for them!
Why would my machine be any different? I was running in the WSL at the time, maybe that could cause weirdness? I decided to write up a Containerfile to ensure I had a clean environment:&lt;/p&gt;
&lt;div class="codehilite" style="background: #0d1117"&gt;&lt;pre style="line-height: 125%;"&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style="color: #ff7b72"&gt;FROM&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;ubuntu:24.04&lt;/span&gt;

&lt;span style="color: #ff7b72"&gt;RUN&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;apt-get&lt;span style="color: #6e7681"&gt; &lt;/span&gt;update&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff"&gt;\&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;    &lt;/span&gt;apt-get&lt;span style="color: #6e7681"&gt; &lt;/span&gt;install&lt;span style="color: #6e7681"&gt; &lt;/span&gt;-y&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff"&gt;\&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;        &lt;/span&gt;build-essential&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff"&gt;\&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;        &lt;/span&gt;curl&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff"&gt;\&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;        &lt;/span&gt;git&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff"&gt;\&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;        &lt;/span&gt;pkg-config&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff"&gt;\&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;        &lt;/span&gt;libssl-dev

&lt;span style="color: #ff7b72"&gt;RUN&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;curl&lt;span style="color: #6e7681"&gt; &lt;/span&gt;https://sh.rustup.rs&lt;span style="color: #6e7681"&gt; &lt;/span&gt;-sSf&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;|&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;bash&lt;span style="color: #6e7681"&gt; &lt;/span&gt;-s&lt;span style="color: #6e7681"&gt; &lt;/span&gt;--&lt;span style="color: #6e7681"&gt; &lt;/span&gt;-y
&lt;span style="color: #ff7b72"&gt;ENV&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff"&gt;PATH&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;=&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;&amp;quot;/root/.cargo/bin:${&lt;/span&gt;&lt;span style="color: #79c0ff"&gt;PATH&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;}&amp;quot;&lt;/span&gt;
&lt;span style="color: #ff7b72"&gt;RUN&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;curl&lt;span style="color: #6e7681"&gt; &lt;/span&gt;-sSL&lt;span style="color: #6e7681"&gt; &lt;/span&gt;https://apt.llvm.org/llvm-snapshot.gpg.key&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;|&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;apt-key&lt;span style="color: #6e7681"&gt; &lt;/span&gt;add&lt;span style="color: #6e7681"&gt; &lt;/span&gt;-
&lt;span style="color: #ff7b72"&gt;RUN&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;echo&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;&amp;quot;deb http://apt.llvm.org/noble/ llvm-toolchain-noble-20 main&amp;quot;&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&amp;gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;/etc/apt/sources.list.d/llvm.list
&lt;span style="color: #ff7b72"&gt;RUN&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;apt-get&lt;span style="color: #6e7681"&gt; &lt;/span&gt;update&lt;span style="color: #6e7681"&gt;  &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;apt-get&lt;span style="color: #6e7681"&gt; &lt;/span&gt;upgrade&lt;span style="color: #6e7681"&gt; &lt;/span&gt;-y&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;apt-get&lt;span style="color: #6e7681"&gt; &lt;/span&gt;install&lt;span style="color: #6e7681"&gt; &lt;/span&gt;-y&lt;span style="color: #6e7681"&gt; &lt;/span&gt;clang-20
&lt;span style="color: #ff7b72"&gt;RUN&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;cargo&lt;span style="color: #6e7681"&gt; &lt;/span&gt;install&lt;span style="color: #6e7681"&gt; &lt;/span&gt;cargo-c
&lt;span style="color: #ff7b72"&gt;RUN&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;mkdir&lt;span style="color: #6e7681"&gt; &lt;/span&gt;/scratch
&lt;span style="color: #ff7b72"&gt;RUN&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;git&lt;span style="color: #6e7681"&gt; &lt;/span&gt;clone&lt;span style="color: #6e7681"&gt; &lt;/span&gt;https://github.com/trifectatechfoundation/zlib-rs.git&lt;span style="color: #6e7681"&gt; &lt;/span&gt;/scratch/zlib-rs
&lt;span style="color: #ff7b72"&gt;COPY&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;./test.c&lt;span style="color: #6e7681"&gt; &lt;/span&gt;/scratch/zlib-rs/libz-rs-sys-cdylib/test.c
&lt;span style="color: #ff7b72"&gt;WORKDIR&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;/scratch/zlib-rs/libz-rs-sys-cdylib&lt;/span&gt;
&lt;span style="color: #ff7b72"&gt;ENV&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff"&gt;RUSTFLAGS&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;=&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;&amp;quot;-Ctarget-cpu=native&amp;quot;&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;#&lt;span style="color: #6e7681"&gt; &lt;/span&gt;comment&lt;span style="color: #6e7681"&gt; &lt;/span&gt;this&lt;span style="color: #6e7681"&gt; &lt;/span&gt;out&lt;span style="color: #6e7681"&gt; &lt;/span&gt;to&lt;span style="color: #6e7681"&gt; &lt;/span&gt;fix&lt;span style="color: #6e7681"&gt; &lt;/span&gt;the&lt;span style="color: #6e7681"&gt; &lt;/span&gt;bug
&lt;span style="color: #ff7b72"&gt;RUN&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;cargo&lt;span style="color: #6e7681"&gt; &lt;/span&gt;cbuild&lt;span style="color: #6e7681"&gt; &lt;/span&gt;--release
&lt;span style="color: #ff7b72"&gt;RUN&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;clang-20&lt;span style="color: #6e7681"&gt; &lt;/span&gt;-o&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;test&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;test.c&lt;span style="color: #6e7681"&gt; &lt;/span&gt;-I&lt;span style="color: #6e7681"&gt; &lt;/span&gt;./include/&lt;span style="color: #6e7681"&gt; &lt;/span&gt;-static&lt;span style="color: #6e7681"&gt; &lt;/span&gt;./target/x86_64-unknown-linux-gnu/release/libz_rs.a
&lt;span style="color: #ff7b72"&gt;ENV&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff"&gt;LD_LIBRARY_PATH&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;=&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;&amp;quot;target/x86_64-unknown-linux-gnu/release/&amp;quot;&lt;/span&gt;
&lt;span style="color: #ff7b72"&gt;ENTRYPOINT&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;[&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;&amp;quot;./test&amp;quot;&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;While experimenting with setting up this container, I found a lead at last! If I compiled with &lt;code&gt;RUSTFLAGS="-Ctarget-cpu=native"&lt;/code&gt;, the program gave the wrong results. If I compiled &lt;em&gt;without&lt;/em&gt; using native code generation, the program worked correctly. Bizarre!!&lt;/p&gt;
&lt;p&gt;Backing up a bit, let me explain what &lt;code&gt;RUSTFLAGS="-Ctarget-cpu=native"&lt;/code&gt; actually does (if you know already, please skip to the next paragraph). Compilers like &lt;code&gt;rustc&lt;/code&gt; have feature flags for each target (aka OS + CPU architecture family) which allows them to optionally emit code that uses features of processors. For example, most x86 processors have &lt;code&gt;sse2&lt;/code&gt;, and ARM64 processors have NEON or SVE. Newer processes usually come with newer features which provide optimized implementations of some useful thing, for example some x86 processors has optimized implementations of SHA hashing. Since not all computers have every feature, these need to be opted into at compile time. In the case of &lt;code&gt;RUSTFLAGS="-Ctarget-cpu=native"&lt;/code&gt; I'm telling Rust "use all the features for my current processor." This is a way to eke out the most performance from a program. But in this case, it meant I had a bug on my hands! Folkert (maintainer of zlib-rs) suggested I try to narrow down exactly which instruction set extension was causing the issue. After a bit of binary searching, I found out it was &lt;code&gt;avx512vl&lt;/code&gt;. AVX is an extension to provide &lt;a href="https://en.wikipedia.org/wiki/Single_instruction,_multiple_data"&gt;SIMD&lt;/a&gt; and AVX512-VL is an extension which allows interoperability between 128/256-bit wide SIMD and faster 512-bit wide SIMD. This made a lot of sense in some ways, after all, I have an AMD R9 9950X, and one of it's features is AVX512 support! But how exactly did these AVX512 instructions get into the final binary?&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;NOTE&lt;/strong&gt;:&lt;br&gt; As pointed out in a message on Mastodon, AVX512-VL is actually 11 years old! It was first introduced in Intel AVX512 implementations. However, AVX512 support in Rust is relatively new.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;So enabling AVX512 was the culprit for the bug in crc32 calculations. Skimming over the zlib-rs code, I was a bit surprised to find that it does not explicitly use AVX-512 &lt;em&gt;anywhere&lt;/em&gt;! In fact it uses the older SSE4.1 instruction set (presumably for maximum portability). So why was AVX512-VL causing these issues? Unfortunately, I don't know for sure. But I have a theory.&lt;/p&gt;
&lt;p&gt;Rust uses LLVM as it's default backend (the bit of the compiler that emits instructions/binaries). LLVM probably realized it could use AVX512-VL instructions (available on my machine) to speed up the SSE4.1 code that zlib-rs is using. However, AVX512-VL is new enough that there was a bug in the compiler - a miscompilation - and the wrong code was emitted. I haven't found a smoking gun issue but &lt;a href="https://github.com/llvm/llvm-project/issues?q=is%3Aissue%20state%3Aclosed%20avx512vl"&gt;it is probably one of these&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I am happy to report that this issue does not present itself with Rust 1.90+ or the latest release of zlib-rs. Many thanks again to Folkert for not only helping figure out the source of the issue, but also adding a mitigation to zlib-rs and cutting a new release to work around the miscompilation! Now the CPython test suite passes when linked against zlib-rs and I can continue my experiments...&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Emma Smith</dc:creator><pubDate>Tue, 14 Oct 2025 00:00:00 -0700</pubDate><guid>tag:emmatyping.dev,2025-10-14:/finding-a-miscompilation-in-rustllvm.html</guid><category>misc</category><category>python</category><category>rust</category><category>compression</category></item></channel></rss>