<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Emma's Blog</title><link>https://emmatyping.dev/</link><description/><atom:link href="https://emmatyping.dev/feeds/all.rss.xml" rel="self"/><lastBuildDate>Tue, 11 Nov 2025 00:00:00 -0800</lastBuildDate><item><title>Decompression is up to 30% faster in CPython 3.15</title><link>https://emmatyping.dev/decompression-is-up-to-30-faster-in-cpython-315.html</link><description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;tl;dr&lt;/strong&gt;&lt;br&gt;
&lt;code&gt;compression.zstd&lt;/code&gt; is the fastest Python Zstandard bindings with Python 3.15. Changes to code managing output
buffers has led to a 25-30% performance uplift for Zstandard decompression and a 10-15% performance uplift for &lt;code&gt;zlib&lt;/code&gt;
for data at least 1 MiB in size. This has broad implications for e.g. faster wheel installations with pip and many
other use cases.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;Motivation&lt;/h2&gt;
&lt;p&gt;Since &lt;a href="https://peps.python.org/pep-0784/"&gt;landing Zstandard support in CPython&lt;/a&gt;, I wanted to explore
the performance of CPython's compression modules to ensure they were well-optimized. Furthermore, the maintainer of
&lt;a href="https://github.com/Rogdham/pyzstd/"&gt;pyzstd&lt;/a&gt; and &lt;a href="https://github.com/Rogdham/backports.zstd"&gt;backports.zstd&lt;/a&gt; (a backport of
&lt;code&gt;compression.zstd&lt;/code&gt; to Python versions before 3.14) benchmarked the new &lt;code&gt;compression.zstd&lt;/code&gt; module against 3rd party Zstandard
Python bindings such as &lt;a href="https://github.com/Rogdham/pyzstd/"&gt;pyzstd&lt;/a&gt;,
&lt;a href="https://github.com/indygreg/python-zstandard"&gt;zstandard&lt;/a&gt;, and &lt;a href="https://github.com/sergey-dryabzhinsky/python-zstd"&gt;zstd&lt;/a&gt;,
and found the standard library was slower than most other bindings!&lt;/p&gt;
&lt;p&gt;Let's take a closer look at &lt;a href="https://github.com/Rogdham/zstd-benchmark/blob/master/results/2025-09-22_linux.md"&gt;the benchmarks&lt;/a&gt;
and how to read them:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Figures give timing comparison. For example, +42% means that the library needs 42% more time than stdlib/backports.zstd.
The reference time column indicates an average time for a single run.&lt;/p&gt;
&lt;p&gt;Emoji scale: ❤️‍🩹 -25% 🟥 -15% 🔴 -5% ⚪ +5% 🟢 +15% 🟩 +25% 💚&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Okay, so hopefully we don't see a lot of red, meaning the reference standard library (stdlib) time is slower...&lt;/p&gt;
&lt;blockquote&gt;
&lt;h2&gt;CPython 3.14.0rc3&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Case&lt;/th&gt;
&lt;th&gt;stdlib&lt;/th&gt;
&lt;th&gt;pyzstd&lt;/th&gt;
&lt;th&gt;zstandard&lt;/th&gt;
&lt;th&gt;zstd&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;compress 1k level 3&lt;/td&gt;
&lt;td&gt;&amp;lt;1ms&lt;/td&gt;
&lt;td&gt;⚪ - 3.81%&lt;/td&gt;
&lt;td&gt;⚪ - 1.17%&lt;/td&gt;
&lt;td&gt;🟢 + 5.86%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;compress 1k level 10&lt;/td&gt;
&lt;td&gt;&amp;lt;1ms&lt;/td&gt;
&lt;td&gt;⚪ + 1.91%&lt;/td&gt;
&lt;td&gt;🟢 + 6.18%&lt;/td&gt;
&lt;td&gt;🟢 + 9.83%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;compress 1k level 17&lt;/td&gt;
&lt;td&gt;&amp;lt;1ms&lt;/td&gt;
&lt;td&gt;🟢 + 6.33%&lt;/td&gt;
&lt;td&gt;🟢 + 7.67%&lt;/td&gt;
&lt;td&gt;🟢 +12.92%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;compress 1M level 3&lt;/td&gt;
&lt;td&gt;7ms&lt;/td&gt;
&lt;td&gt;⚪ + 0.60%&lt;/td&gt;
&lt;td&gt;🔴 - 7.37%&lt;/td&gt;
&lt;td&gt;🟢 +12.08%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;compress 1M level 10&lt;/td&gt;
&lt;td&gt;27ms&lt;/td&gt;
&lt;td&gt;🟢 +10.39%&lt;/td&gt;
&lt;td&gt;⚪ + 3.39%&lt;/td&gt;
&lt;td&gt;🟢 +12.46%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;compress 1M level 17&lt;/td&gt;
&lt;td&gt;174ms&lt;/td&gt;
&lt;td&gt;⚪ - 2.48%&lt;/td&gt;
&lt;td&gt;⚪ - 3.91%&lt;/td&gt;
&lt;td&gt;⚪ + 0.08%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;compress 1G level 3&lt;/td&gt;
&lt;td&gt;6.03s&lt;/td&gt;
&lt;td&gt;🟩 +16.17%&lt;/td&gt;
&lt;td&gt;⚪ - 2.94%&lt;/td&gt;
&lt;td&gt;⚪ + 2.25%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;decompress 1k level 3&lt;/td&gt;
&lt;td&gt;&amp;lt;1ms&lt;/td&gt;
&lt;td&gt;🟥 -15.14%&lt;/td&gt;
&lt;td&gt;🔴 - 8.53%&lt;/td&gt;
&lt;td&gt;⚪ - 2.37%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;decompress 1k level 10&lt;/td&gt;
&lt;td&gt;&amp;lt;1ms&lt;/td&gt;
&lt;td&gt;🟥 -15.41%&lt;/td&gt;
&lt;td&gt;🔴 - 9.22%&lt;/td&gt;
&lt;td&gt;⚪ - 3.35%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;decompress 1k level 17&lt;/td&gt;
&lt;td&gt;&amp;lt;1ms&lt;/td&gt;
&lt;td&gt;🔴 -11.16%&lt;/td&gt;
&lt;td&gt;🔴 - 7.09%&lt;/td&gt;
&lt;td&gt;⚪ + 2.07%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;decompress 1M level 3&lt;/td&gt;
&lt;td&gt;1ms&lt;/td&gt;
&lt;td&gt;🔴 - 6.88%&lt;/td&gt;
&lt;td&gt;⚪ - 4.03%&lt;/td&gt;
&lt;td&gt;💚 +26.88%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;decompress 1M level 10&lt;/td&gt;
&lt;td&gt;1ms&lt;/td&gt;
&lt;td&gt;🔴 - 6.69%&lt;/td&gt;
&lt;td&gt;⚪ - 4.86%&lt;/td&gt;
&lt;td&gt;💚 +25.63%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;decompress 1M level 17&lt;/td&gt;
&lt;td&gt;1ms&lt;/td&gt;
&lt;td&gt;🔴 - 7.99%&lt;/td&gt;
&lt;td&gt;⚪ - 4.96%&lt;/td&gt;
&lt;td&gt;💚 +25.58%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;decompress 1G level 3&lt;/td&gt;
&lt;td&gt;1.49s&lt;/td&gt;
&lt;td&gt;🟥 -19.41%&lt;/td&gt;
&lt;td&gt;🟥 -17.58%&lt;/td&gt;
&lt;td&gt;🟢 + 6.98%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;decompress 1G level 10&lt;/td&gt;
&lt;td&gt;1.62s&lt;/td&gt;
&lt;td&gt;❤️‍🩹 -27.65%&lt;/td&gt;
&lt;td&gt;❤️‍🩹 -26.48%&lt;/td&gt;
&lt;td&gt;🔴 - 6.92%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;decompress 1G level 17&lt;/td&gt;
&lt;td&gt;1.67s&lt;/td&gt;
&lt;td&gt;🟥 -24.01%&lt;/td&gt;
&lt;td&gt;🟥 -23.04%&lt;/td&gt;
&lt;td&gt;⚪ - 4.43%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/blockquote&gt;
&lt;p&gt;Ouch. 10-25% slower is quite unfortunate! A silver lining is that most of the performance difference is in decompression,
so that narrows the area that is in need of optimization.&lt;/p&gt;
&lt;p&gt;After sitting down and thinking about it for a while, I came up with a few theories as to why &lt;code&gt;compression.zstd&lt;/code&gt; would
be slower compared to pyzstd and zstandard. My thinking was focused on noting differences in implementation I knew
existed between the various bindings. First, both pyzstd and zstandard build against their own copies of libzstd (the C
library implementing Zstandard compression and decompression). Meanwhile, CPython will build against the system-
installed libzstd, which is older on my system. Maybe there is a performance improvement in the newer libzstd
versions? Second, most of the performance difference is in decompression speed. Perhaps the implementation of
&lt;code&gt;compression.zstd.decompress()&lt;/code&gt; is inefficient? It uses multiple decompression instances to handle multi-frame input
where pyzstd uses one, so perhaps that's the issue? Finally, maybe the handling of output buffers is slow? When
decompressing data, CPython needs to provide an output buffer (location in memory to write to) to store the
uncompressed data. If the creation/allocation of that output buffer is slow it could bottleneck the decompressor.&lt;/p&gt;
&lt;h2&gt;Premature Optimizations&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;These optimizations didn't work, so if you'd like to skip to the optimizations which worked, please move to the next
section!&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I decided to tackle these one at a time. First, I built pyzstd and zstandard against the system libzstd. Unfortunately,
after re-running the benchmark, this yielded zero performance difference. Darn.&lt;/p&gt;
&lt;p&gt;Next, I was pretty confident that &lt;code&gt;compression.zstd.decompress()&lt;/code&gt; was at least partially the culprit of the worse
performance. The &lt;a href="https://github.com/python/cpython/blob/95f6e1275b1c9de550d978cb2b4351cc4ed24fe4/Lib/compression/zstd/__init__.py#L152-L172"&gt;current &lt;code&gt;decompress()&lt;/code&gt; implementation&lt;/a&gt;
is written in Python and creates multiple decompression contexts and joins the results together. Surely that had to
lead to some performance degradation? I ended up re-implementing the &lt;code&gt;decompress()&lt;/code&gt; function in C using a single
decompression context to see if my theory was correct. To my chagrin, there was no performance uplift, and it may have
even performed &lt;em&gt;worse&lt;/em&gt;! For the curious, you can see &lt;a href="https://github.com/emmatyping/cpython/tree/zstd-decompress-in-c"&gt;my hacked together branch here&lt;/a&gt;.
Goes to show that you can never be sure about performance bottlenecks based on code itself!&lt;/p&gt;
&lt;h2&gt;Properly Profiling CPython&lt;/h2&gt;
&lt;p&gt;With my first two attempts at optimizing Zstandard decompression in CPython unsuccessful, I realized that I should do
what I probably should have done from the beginning: profile the code! I decided to use the
&lt;a href="https://docs.python.org/3/howto/perf_profiling.html"&gt;standard library support for the perf profiler&lt;/a&gt;, as it would
allow me to see both native/C frames such as inside libzstd or the bindings module &lt;code&gt;_zstd&lt;/code&gt;, as well as Python frames.&lt;/p&gt;
&lt;p&gt;So I went ahead and compiled CPython &lt;a href="https://docs.python.org/3/howto/perf_profiling.html#how-to-obtain-the-best-results"&gt;with some flags to improve perf data&lt;/a&gt;
and ran a simple script which called &lt;code&gt;compression.zstd.decompress()&lt;/code&gt; on a variety of data sizes. I highly recommend
reading the Python documentation about perf support for more details but essentially what I ran was:&lt;/p&gt;
&lt;div class="codehilite" style="background: #0d1117"&gt;&lt;pre style="line-height: 125%;"&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style="color: #8b949e; font-style: italic"&gt;# in a cpython checkout&lt;/span&gt;
./configure&lt;span style="color: #6e7681"&gt; &lt;/span&gt;--enable-optimizations&lt;span style="color: #6e7681"&gt; &lt;/span&gt;--with-lto&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff"&gt;CFLAGS&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;=&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;&amp;quot;-fno-omit-frame-pointer -mno-omit-leaf-frame-pointer&amp;quot;&lt;/span&gt;
make&lt;span style="color: #6e7681"&gt; &lt;/span&gt;-j&lt;span style="color: #ff7b72"&gt;$(&lt;/span&gt;nproc&lt;span style="color: #ff7b72"&gt;)&lt;/span&gt;
&lt;span style="color: #e6edf3"&gt;cd&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;../compression-benchmarks
perf&lt;span style="color: #6e7681"&gt; &lt;/span&gt;record&lt;span style="color: #6e7681"&gt; &lt;/span&gt;-F&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;9999&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;-g&lt;span style="color: #6e7681"&gt; &lt;/span&gt;-o&lt;span style="color: #6e7681"&gt; &lt;/span&gt;perf.data&lt;span style="color: #6e7681"&gt; &lt;/span&gt;../cpython/python&lt;span style="color: #6e7681"&gt; &lt;/span&gt;-X&lt;span style="color: #6e7681"&gt; &lt;/span&gt;perf&lt;span style="color: #6e7681"&gt; &lt;/span&gt;profile_zstd.py
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;After analyzing the profile with &lt;code&gt;perf report --stdio -n -g&lt;/code&gt;, I noticed a significant bottleneck in the output buffer
management code! Let's take a brief detour to discuss what the output buffer management code does and why it was the
decompression bottleneck.&lt;/p&gt;
&lt;h2&gt;(Fast) Buffer Handling is Hard&lt;/h2&gt;
&lt;p&gt;When decompressing data, you feed the decompressor (libzstd in our case) a buffer (&lt;code&gt;bytes&lt;/code&gt; in Python) that is then
decompressed and needs to be written to a new buffer. Since this all happens in C, basically we need to allocate some
memory for libzstd to write the decompressed data into. But how much memory? Well, in many cases, we don't know! So we
need to dynamically resize the output buffer as it is filled up.&lt;/p&gt;
&lt;p&gt;This is actually a pretty challenging problem because there are several constraints and considerations to be made. The
buffer management needs to be fast for a variety of output buffer sizes. If you allocate too much memory up front,
you'll waste time allocating unused memory and slow down decompressing small amounts of data. On the other hand, if you
don't allocate enough, you'll have to make a lot of calls to the allocator, which will also slow things down as each
allocation has overhead and leads to fragmenting the output data. The memory should not grow exponentially for large
outputs, otherwise you could run out of memory for tasks that would normally fit into memory. Finally, each output from
the decompressor can vary in size, given that it may need to buffer data internally.&lt;/p&gt;
&lt;p&gt;Because of the complexity in managing an output buffer, there is code shared across compression modules in CPython to
manage the buffer. This code lives in
&lt;a href="https://github.com/python/cpython/blob/404425575c68bef9d2f042710fc713134d04c23f/Include/internal/pycore_blocks_output_buffer.h"&gt;pycore_blocks_output_buffer.h&lt;/a&gt;.
The code was &lt;a href="https://github.com/python/cpython/commit/f9bedb630e8a0b7d94e1c7e609b20dfaa2b22231"&gt;modified four years ago&lt;/a&gt;
to use an implementation which writes to a series of &lt;code&gt;bytes&lt;/code&gt; objects stored in a &lt;code&gt;list&lt;/code&gt; to hold the output of
decompress calls. When finished, the bytes objects get concatenated together in &lt;code&gt;_BlocksOutputBuffer_Finish&lt;/code&gt;,
returning the final &lt;code&gt;bytes&lt;/code&gt; object containing the decompressed data. When profiling Zstandard decompression, I found
that greater than 50% (!) of decompression time was spent in &lt;code&gt;_BlocksOutputBuffer_Finish&lt;/code&gt;! This seemed inordinately
long, ideally this function should just be a few &lt;code&gt;memcpy&lt;/code&gt;s. So with this knowledge in hand, I tried to think of how
best to optimize the output buffer code.&lt;/p&gt;
&lt;h2&gt;Sometimes Timing Works Out&lt;/h2&gt;
&lt;p&gt;Right around the time that I was working on this, &lt;a href="https://peps.python.org/pep-0782/"&gt;PEP 782&lt;/a&gt; was accepted. This PEP
introduces a new &lt;code&gt;PyBytesWriter&lt;/code&gt; API to CPython which makes it easier to incrementally build up &lt;code&gt;bytes&lt;/code&gt; data in a safe
and performant way at the Python C API level. It seemed like a natural fit for what the blocks output buffer code was
doing, so I wanted to experiment with using it for the output buffer code. After modifying
&lt;code&gt;pycore_blocks_output_buffer.h&lt;/code&gt; to use &lt;code&gt;PyBytesWriter&lt;/code&gt;, I re-ran the original benchmark to see if we had closed the
performance gap:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Note: this benchmark was run on my local machine and the wall times are not comparable to the previous benchmark.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Case&lt;/th&gt;
&lt;th&gt;stdlib&lt;/th&gt;
&lt;th&gt;zstandard&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;compress 1k level 3&lt;/td&gt;
&lt;td&gt;&amp;lt;1ms&lt;/td&gt;
&lt;td&gt;💚 +61.02%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;compress 1k level 10&lt;/td&gt;
&lt;td&gt;&amp;lt;1ms&lt;/td&gt;
&lt;td&gt;💚 +57.77%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;compress 1k level 17&lt;/td&gt;
&lt;td&gt;&amp;lt;1ms&lt;/td&gt;
&lt;td&gt;💚 +364.86%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;compress 1M level 3&lt;/td&gt;
&lt;td&gt;5ms&lt;/td&gt;
&lt;td&gt;💚 +40.02%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;compress 1M level 10&lt;/td&gt;
&lt;td&gt;32ms&lt;/td&gt;
&lt;td&gt;⚪ - 0.99%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;compress 1M level 17&lt;/td&gt;
&lt;td&gt;126ms&lt;/td&gt;
&lt;td&gt;🟩 +15.93%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;compress 1G level 3&lt;/td&gt;
&lt;td&gt;4.47s&lt;/td&gt;
&lt;td&gt;💚 +48.69%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;decompress 1k level 3&lt;/td&gt;
&lt;td&gt;&amp;lt;1ms&lt;/td&gt;
&lt;td&gt;⚪ + 4.67%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;decompress 1k level 10&lt;/td&gt;
&lt;td&gt;&amp;lt;1ms&lt;/td&gt;
&lt;td&gt;⚪ + 4.79%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;decompress 1k level 17&lt;/td&gt;
&lt;td&gt;&amp;lt;1ms&lt;/td&gt;
&lt;td&gt;🟢 + 5.38%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;decompress 1M level 3&lt;/td&gt;
&lt;td&gt;1ms&lt;/td&gt;
&lt;td&gt;💚 +50.23%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;decompress 1M level 10&lt;/td&gt;
&lt;td&gt;1ms&lt;/td&gt;
&lt;td&gt;💚 +41.94%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;decompress 1M level 17&lt;/td&gt;
&lt;td&gt;1ms&lt;/td&gt;
&lt;td&gt;💚 +47.37%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;decompress 1G level 3&lt;/td&gt;
&lt;td&gt;1.80s&lt;/td&gt;
&lt;td&gt;🟢 +12.87%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;decompress 1G level 10&lt;/td&gt;
&lt;td&gt;1.77s&lt;/td&gt;
&lt;td&gt;🟢 +12.54%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;decompress 1G level 17&lt;/td&gt;
&lt;td&gt;1.80s&lt;/td&gt;
&lt;td&gt;🟢 + 8.76%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/blockquote&gt;
&lt;p&gt;WOW! Not only have we closed the gap, &lt;code&gt;compression.zstd&lt;/code&gt; is now &lt;em&gt;faster&lt;/em&gt; than the popular zstandard 3rd-party module.&lt;/p&gt;
&lt;h2&gt;Validating Our Results&lt;/h2&gt;
&lt;p&gt;Wanting to validate the speedup, I decided to write up my own minimal benchmark suite at this point too, to compare
between revisions of the standard library code and use &lt;a href="https://pyperf.readthedocs.io/en/latest/"&gt;&lt;code&gt;pyperf&lt;/code&gt;&lt;/a&gt;,
a benchmarking toolkit used in the venerable &lt;a href="https://github.com/python/pyperformance"&gt;pyperformance benchmark suite&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;So I went ahead and wrote up a &lt;a href="https://github.com/emmatyping/compression-benchmarks/blob/fab8806f3af89b369e40e77be291dd37f3223b7c/bench_zstd.py"&gt;benchmark for zstd&lt;/a&gt;
which tests compression and decompression using default parameters for sizes 1 KiB, 1 MiB, and 1 GiB. I ran these
benchmarks on main and my branch which uses &lt;code&gt;PyBytesWriter&lt;/code&gt;.&lt;/p&gt;
&lt;div class="codehilite" style="background: #0d1117"&gt;&lt;pre style="line-height: 125%;"&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style="color: #e6edf3"&gt;zstd.&lt;/span&gt;&lt;span style="color: #d2a8ff; font-weight: bold"&gt;compress&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;(&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;K)&lt;/span&gt;&lt;span style="color: #f85149"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;Mean&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;std&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;dev&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;main_zstd_3&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;3.01&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;us&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0.03&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;us&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;-&amp;gt;&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;pybyteswriter_zstd_3&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;3.00&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;us&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0.03&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;us&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.01&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;x&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;faster&lt;/span&gt;
&lt;span style="color: #e6edf3"&gt;zstd.&lt;/span&gt;&lt;span style="color: #d2a8ff; font-weight: bold"&gt;compress&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;(&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;M)&lt;/span&gt;&lt;span style="color: #f85149"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;Mean&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;std&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;dev&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;main_zstd_3&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;2.92&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;ms&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0.02&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;ms&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;-&amp;gt;&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;pybyteswriter_zstd_3&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;2.89&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;ms&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0.02&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;ms&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.01&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;x&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;faster&lt;/span&gt;
&lt;span style="color: #e6edf3"&gt;zstd.&lt;/span&gt;&lt;span style="color: #d2a8ff; font-weight: bold"&gt;compress&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;(&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;G)&lt;/span&gt;&lt;span style="color: #f85149"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;Mean&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;std&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;dev&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;main_zstd_3&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;2.72&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;sec&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0.01&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;sec&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;-&amp;gt;&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;pybyteswriter_zstd_3&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;2.67&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;sec&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0.01&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;sec&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.02&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;x&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;faster&lt;/span&gt;
&lt;span style="color: #e6edf3"&gt;zstd.&lt;/span&gt;&lt;span style="color: #d2a8ff; font-weight: bold"&gt;decompress&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;(&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;K)&lt;/span&gt;&lt;span style="color: #f85149"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;Mean&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;std&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;dev&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;main_zstd_3&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.40&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;us&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0.01&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;us&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;-&amp;gt;&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;pybyteswriter_zstd_3&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.38&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;us&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0.01&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;us&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.01&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;x&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;faster&lt;/span&gt;
&lt;span style="color: #e6edf3"&gt;zstd.&lt;/span&gt;&lt;span style="color: #d2a8ff; font-weight: bold"&gt;decompress&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;(&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;M)&lt;/span&gt;&lt;span style="color: #f85149"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;Mean&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;std&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;dev&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;main_zstd_3&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;734&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;us&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;4&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;us&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;-&amp;gt;&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;pybyteswriter_zstd_3&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;546&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;us&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;3&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;us&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.34&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;x&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;faster&lt;/span&gt;
&lt;span style="color: #e6edf3"&gt;zstd.&lt;/span&gt;&lt;span style="color: #d2a8ff; font-weight: bold"&gt;decompress&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;(&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;G)&lt;/span&gt;&lt;span style="color: #f85149"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;Mean&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;std&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;dev&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;main_zstd_3&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;790&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;ms&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;4&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;ms&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;-&amp;gt;&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;pybyteswriter_zstd_3&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;634&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;ms&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;3&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;ms&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.25&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;x&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;faster&lt;/span&gt;

&lt;span style="color: #e6edf3"&gt;Geometric&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;mean&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.10&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;x&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;faster&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;For input sizes great than 1 MiB that's 25-30% faster decompression! In hindsight, this actually makes sense if you
consider that libzstd's decompression implementation is exceptionally fast.
&lt;a href="https://github.com/inikep/lzbench"&gt;lzbench&lt;/a&gt;, a popular compression library benchmark, found that libzstd can
decompress data at greater than 1 GiB/s. This is much faster than bz2, lzma, or zlib, the other compression modules in
the standard library. One of the motivations for adding Zstandard to CPython was it's performance. So it is not too
surprising that the output buffer code would be a bottleneck, given that the existing compression libraries don't write
as quickly to the output buffer. This also explains why compression isn't faster after changing the output buffer
code. Compression is very CPU intensive so more time is spent in the compressor rather than writing to the output
buffer. This also explains why the speedup is non-existent for decompressing 1 KiB of data - the first 32 KiB block that
is allocated is plenty to store all of the output data, meaning all of the time is spent in the decompressor.&lt;/p&gt;
&lt;p&gt;One final validation I wished to do was to check the performance of &lt;code&gt;zlib&lt;/code&gt;, to ensure that the change did not regress
performance for other standard library compression modules. I wrote
&lt;a href="https://github.com/emmatyping/compression-benchmarks/blob/fab8806f3af89b369e40e77be291dd37f3223b7c/bench_zlib.py"&gt;a similar benchmark for zlib&lt;/a&gt;
to the one I wrote for zstd, and found that there was also a performance increase with the output buffer change!&lt;/p&gt;
&lt;div class="codehilite" style="background: #0d1117"&gt;&lt;pre style="line-height: 125%;"&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style="color: #e6edf3"&gt;zlib.&lt;/span&gt;&lt;span style="color: #d2a8ff; font-weight: bold"&gt;compress&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;(&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;M)&lt;/span&gt;&lt;span style="color: #f85149"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;Mean&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;std&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;dev&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;main&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;13.5&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;ms&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0.1&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;ms&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;-&amp;gt;&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;pybyteswriter&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;13.4&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;ms&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0.0&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;ms&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.00&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;x&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;faster&lt;/span&gt;
&lt;span style="color: #e6edf3"&gt;zlib.&lt;/span&gt;&lt;span style="color: #d2a8ff; font-weight: bold"&gt;compress&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;(&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;G)&lt;/span&gt;&lt;span style="color: #f85149"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;Mean&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;std&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;dev&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;main&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;11.4&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;sec&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0.0&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;sec&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;-&amp;gt;&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;pybyteswriter&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;11.3&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;sec&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0.0&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;sec&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.00&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;x&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;faster&lt;/span&gt;
&lt;span style="color: #e6edf3"&gt;zlib.&lt;/span&gt;&lt;span style="color: #d2a8ff; font-weight: bold"&gt;decompress&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;(&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;K)&lt;/span&gt;&lt;span style="color: #f85149"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;Mean&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;std&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;dev&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;main&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.42&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;us&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0.01&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;us&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;-&amp;gt;&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;pybyteswriter&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.39&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;us&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0.01&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;us&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.02&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;x&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;faster&lt;/span&gt;
&lt;span style="color: #e6edf3"&gt;zlib.&lt;/span&gt;&lt;span style="color: #d2a8ff; font-weight: bold"&gt;decompress&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;(&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;M)&lt;/span&gt;&lt;span style="color: #f85149"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;Mean&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;std&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;dev&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;main&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.29&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;ms&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0.00&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;ms&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;-&amp;gt;&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;pybyteswriter&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.17&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;ms&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0.00&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;ms&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.10&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;x&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;faster&lt;/span&gt;
&lt;span style="color: #e6edf3"&gt;zlib.&lt;/span&gt;&lt;span style="color: #d2a8ff; font-weight: bold"&gt;decompress&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;(&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;G)&lt;/span&gt;&lt;span style="color: #f85149"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;Mean&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;std&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;dev&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;main&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.36&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;sec&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0.00&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;sec&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;-&amp;gt;&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;[&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;pybyteswriter&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.17&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;sec&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+-&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0.00&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;sec&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.17&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;x&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;faster&lt;/span&gt;

&lt;span style="color: #e6edf3"&gt;Benchmark&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;hidden&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;because&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;not&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;significant&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;(&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;)&lt;/span&gt;&lt;span style="color: #f85149"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;zlib.&lt;/span&gt;&lt;span style="color: #d2a8ff; font-weight: bold"&gt;compress&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;(&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;K)&lt;/span&gt;

&lt;span style="color: #e6edf3"&gt;Geometric&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;mean&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;1.05&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;x&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;faster&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;10-15% faster decompression on data of at least 1 MiB for zlib is pretty significant, especially when you consider that
zlib is used by pip to unpack files in almost every wheel package Python users install.&lt;/p&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;With the improvements to output buffer handling, I was not only able to improve the performance of &lt;code&gt;compression.zstd&lt;/code&gt;,
but all of the compression module's decompression code. After stumbling over a few optimization ideas, I definitely
learned my lesson to profile code before jumping to conclusions! You won't know what is a real bottleneck unless you
can test it! Just having a benchmark is not enough!&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/python/cpython/issues/139877"&gt;The original issue I opened&lt;/a&gt; goes into a bit more detail about the
process of benchmarking the compression modules, and &lt;a href="https://github.com/python/cpython/commit/f262297d525e87906c5e4ab28e80284189641c9e"&gt;the commit with the improvement&lt;/a&gt;
has the diff of changes to adopt &lt;code&gt;PyBytesWriter&lt;/code&gt;. One thing I'm proud of is that not only did the change improve
performance, it also simplifies the implementation of the output buffer code and removed 60 lines of code in the
process!&lt;/p&gt;
&lt;p&gt;I did some more profiling of zlib to see if there were any more performance gains to be made, but the profile I
gathered seems to indicate that 95+% of the time is spent in zlib's inflate implementation (with the rest in the
CPython VM), so there is little if any room for further optimization in CPython's bindings for zlib. I think this
is good, as it indicates Python users are getting the best performance they can in 3.15!&lt;/p&gt;
&lt;p&gt;Going forward, I am planning on profiling compression code more, but the vast majority of the time spent
there will probably be in the compressor since compression is so CPU intensive. Finally, I want to investigate
optimizations related to providing more information about the final size of the output data. In some cases the output
buffer is initialized to a small value and dynamically resized as output is produced, but ideally users would be able
to provide more information about their workflow and see a performance improvement over it. I have a lot of other ideas
related to compression I'd like to work on, check out &lt;a href="https://notes.emmatyping.dev/share/ossTODO"&gt;my OSS TODO list&lt;/a&gt;
for all of the random ideas I want to work on in the future!&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Emma Smith</dc:creator><pubDate>Tue, 11 Nov 2025 00:00:00 -0800</pubDate><guid>tag:emmatyping.dev,2025-11-11:/decompression-is-up-to-30-faster-in-cpython-315.html</guid><category>misc</category><category>python</category><category>compression</category><category>zstd</category></item><item><title>Finding a miscompilation in Rust/LLVM</title><link>https://emmatyping.dev/finding-a-miscompilation-in-rustllvm.html</link><description>&lt;p&gt;Among my friends I have a reputation for &lt;del&gt;causing&lt;/del&gt; stumbling across esoteric error messages. Whether that is &lt;code&gt;SSL read: I/O error: Success&lt;/code&gt; (caused by a layered SSH connection hangup on Windows), or that time I tried installing NixOS on my laptop and &lt;code&gt;os-prober&lt;/code&gt; failed to start (this was several years ago, so I am sure it is no longer an issue). I attribute these oddities to my curiosity, particularly around trying things that may or may not work and seeing if they do. Recently, I was trying to complete an item from &lt;a href="https://notes.emmatyping.dev/share/ossTODO"&gt;my OSS TODO list&lt;/a&gt; when I came across a bug that stumped me for several days. Turns out sometimes even compilers have bugs...&lt;/p&gt;
&lt;p&gt;My goal was to build CPython with Rust implementations of common compression libraries to see if the Rust libraries could be supported. &lt;strong&gt;C&lt;/strong&gt;Python relies on &lt;strong&gt;C&lt;/strong&gt; code to do many performance sensitive activities such as &lt;a href="https://docs.python.org/3.14/library/math.html"&gt;&lt;code&gt;math&lt;/code&gt;&lt;/a&gt; and &lt;a href="https://docs.python.org/3.14/library/compression.html"&gt;&lt;code&gt;compression&lt;/code&gt;&lt;/a&gt;. I had recently read about the &lt;a href="https://trifectatech.org/"&gt;Trifecta Tech Foundation&lt;/a&gt;'s initiative to re-write popular compression libraries in Rust. So far as of September 2025, they have pure-Rust re-implementations of &lt;a href="https://github.com/trifectatechfoundation/zlib-rs"&gt;zlib&lt;/a&gt; (the library used for zip and gzip files), and &lt;a href="https://github.com/trifectatechfoundation/libbzip2-rs"&gt;bzip2&lt;/a&gt; that are available for use.&lt;/p&gt;
&lt;p&gt;These Rust libraries not only bring increased memory safety, they're also &lt;a href="https://trifectatechfoundation.github.io/zlib-rs-bench/"&gt;as fast or faster than their C counter-parts&lt;/a&gt;. Additionally, zlib-rs is widely deployed in Firefox, to the point that it may have &lt;a href="https://github.com/trifectatechfoundation/zlib-rs/issues/306"&gt;tripped over a CPU hardware bug(!)&lt;/a&gt;. So I had confidence that at least zlib-rs would work out of the box.&lt;/p&gt;
&lt;p&gt;To add support for these libraries to CPython, I made &lt;a href="https://github.com/emmatyping/cpython/tree/build-with-rust-compression-libs"&gt;a branch with changes to the autoconf script&lt;/a&gt; to search for the Rust libraries through &lt;code&gt;pkg-config&lt;/code&gt;. I built &lt;a href="https://github.com/trifectatechfoundation/zlib-rs/tree/main/libz-rs-sys-cdylib"&gt;zlib-rs's C library&lt;/a&gt; with &lt;code&gt;RUSTFLAGS="-Ctarget-cpu=native"&lt;/code&gt; for maximum speed, and then pointed CPython's build process to the built zlib_rs library. Everything built just fine. Next, I wanted to run the CPython zlib test suite to verify zlib-rs was working correctly. I mostly did this to make sure I had built things properly, I had no doubts the tests would pass.&lt;/p&gt;
&lt;p&gt;&lt;img alt="A screenshot of test failures. The test_wbits and test_combine_no_iv tests in test_zlib failed." src="https://emmatyping.dev/static/zlib_test_failure.png" /&gt;&lt;/p&gt;
&lt;p&gt;And yet. I was shocked! zlib-rs is used in Firefox, cargo, and many other widely used tools and applications. Hard to believe it would have a glaring bug that would be surfaced by CPython's test suite. At first I assumed I had somehow made a mistake when building. I realized I had used my system zlib header when building, so maybe there was some weirdness with symbol compatibility?? No, re-building CPython pointing to the zlib-rs include directory didn't fix it.
I tried running &lt;code&gt;cargo test&lt;/code&gt; in the zlib-rs directory to make sure there wasn't something wrong I could catch there. No failures occurred.&lt;/p&gt;
&lt;p&gt;At this point I was convinced it was probably a bug with how I was building things, or a bug in the cdylib (Rust lingo for "C library") wrapping zlib-rs since test Rust tests passed but the tests in CPython failed. To make my testing simpler, I captured the state of the &lt;a href="https://github.com/python/cpython/blob/c50d794c7bb81f31d1b977e63d0faba0b926a168/Lib/test/test_zlib.py#L169-L174"&gt;&lt;code&gt;test_zlib.test_combine_no_iv&lt;/code&gt; test&lt;/a&gt; using PDB and wrote a C program which does the same thing as the test, with deterministic inputs:&lt;/p&gt;
&lt;div class="codehilite" style="background: #0d1117"&gt;&lt;pre style="line-height: 125%;"&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style="color: #8b949e; font-weight: bold; font-style: italic"&gt;#include&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #8b949e; font-style: italic"&gt;&amp;lt;stdio.h&amp;gt;&lt;/span&gt;
&lt;span style="color: #8b949e; font-weight: bold; font-style: italic"&gt;#include&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #8b949e; font-style: italic"&gt;&amp;lt;string.h&amp;gt;&lt;/span&gt;
&lt;span style="color: #8b949e; font-weight: bold; font-style: italic"&gt;#include&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #8b949e; font-style: italic"&gt;&amp;quot;zlib.h&amp;quot;&lt;/span&gt;

&lt;span style="color: #ff7b72"&gt;int&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #d2a8ff; font-weight: bold"&gt;main&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;()&lt;/span&gt;
&lt;span style="color: #e6edf3"&gt;{&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;    &lt;/span&gt;&lt;span style="color: #ff7b72"&gt;unsigned&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72"&gt;char&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;a[&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;32&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;=&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;{&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x88&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x64&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x15&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xce&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x5e&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x3b&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x8d&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x35&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;                        &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xdb&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xd2&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xb5&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xfa&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x8e&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xa7&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x73&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x10&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;                        &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x66&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x83&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x1b&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xd1&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xde&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x0f&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x25&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x86&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;                        &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xeb&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xe5&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x42&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x44&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xad&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x62&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xff&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x11&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;};&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;    &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;uInt&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;chk_a&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;=&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;crc32(&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;a,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;32&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;);&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;    &lt;/span&gt;&lt;span style="color: #ff7b72"&gt;unsigned&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72"&gt;char&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;b[&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;64&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;]&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;=&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;{&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x31&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xb8&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xce&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x94&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x4d&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x2b&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xb9&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x7e&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;                        &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xd5&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x81&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x7f&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xc2&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x40&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xbf&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x3d&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xa5&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;                        &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x25&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xa5&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xf9&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xdf&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x53&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x68&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xc4&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xf6&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;                        &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xbe&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x06&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x7d&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xf3&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xc7&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xdc&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x5b&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x84&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;                        &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xce&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xd2&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xb2&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xeb&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x87&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x62&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x60&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xe3&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;                        &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x10&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x05&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x64&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x59&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x15&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xc4&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x2d&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x78&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;                        &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xc8&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xf3&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x14&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x38&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x87&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x39&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xb3&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x58&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;                        &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xb5&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x95&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x07&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x25&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xd9&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xc1&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0xac&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0x04&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;};&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;    &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;uInt&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;chk_b&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;=&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;crc32(&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;b,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;64&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;);&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;    &lt;/span&gt;&lt;span style="color: #ff7b72"&gt;unsigned&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72"&gt;char&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;buff[&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;96&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;];&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;    &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;memcpy(buff,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;a,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;32&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;);&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;    &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;memcpy(buff&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;+&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;32&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;b,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;64&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;);&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;    &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;uInt&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;chk&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;=&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;crc32(&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;buff,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;96&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;);&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;    &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;uInt&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;chk_combine&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;=&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;crc32_combine(chk_a,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;chk_b,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;64&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;);&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;    &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;printf(&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;&amp;quot;chk (%u) = chk_combine (%u)? %s&lt;/span&gt;&lt;span style="color: #79c0ff"&gt;\n&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;&amp;quot;&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;chk,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;chk_combine,&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;chk&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;==&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;chk_combine&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;?&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;&amp;quot;True&amp;quot;&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;:&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;&amp;quot;False&amp;quot;&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;);&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;    &lt;/span&gt;&lt;span style="color: #ff7b72"&gt;return&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;(&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;0&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;);&lt;/span&gt;
&lt;span style="color: #e6edf3"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This program also failed. Hm, okay, not an issue with CPython at least. I then translated the above test into Rust to add to the zlib-rs test suite, since the Rust tests passed. If it failed I could more easily debug the issue.&lt;/p&gt;
&lt;div class="codehilite" style="background: #0d1117"&gt;&lt;pre style="line-height: 125%;"&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style="color: #79c0ff; font-weight: bold"&gt;diff --git a/zlib-rs/src/crc32/combine.rs b/zlib-rs/src/crc32/combine.rs&lt;/span&gt;
&lt;span style="color: #79c0ff; font-weight: bold"&gt;index 40e3745..65c0143 100644&lt;/span&gt;
&lt;span style="color: #ffa198; background-color: #490202"&gt;--- a/zlib-rs/src/crc32/combine.rs&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+++ b/zlib-rs/src/crc32/combine.rs&lt;/span&gt;
&lt;span style="color: #79c0ff"&gt;@@ -66,6 +66,26 @@ mod test {&lt;/span&gt;

&lt;span style="color: #6e7681"&gt; &lt;/span&gt;   use crate::crc32;

&lt;span style="color: #56d364; background-color: #0f5323"&gt;+    #[test]&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+    fn test_crc32_combine_no_iv() {&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+        for _ in 0..1000 {&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+            let a: &amp;amp;[u8] = &amp;amp;[0x88, 0x64, 0x15, 0xce, 0x5e, 0x3b, 0x8d, 0x35, 0xdb, 0xd2, 0xb5, 0xfa, 0x8e, 0xa7, 0x73, 0x10, 0x66, 0x83, 0x1b, 0xd1, 0xde, 0x0f, 0x25, 0x86, 0xeb, 0xe5, 0x42, 0x44, 0xad, 0x62, 0xff, 0x11];&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+            let b: &amp;amp;[u8] = &amp;amp;[0x31, 0xb8, 0xce, 0x94, 0x4d, 0x2b, 0xb9, 0x7e, 0xd5, 0x81, 0x7f, 0xc2, 0x40, 0xbf, 0x3d, 0xa5, 0x25, 0xa5, 0xf9, 0xdf, 0x53, 0x68, 0xc4, 0xf6, 0xbe, 0x06, 0x7d, 0xf3, 0xc7, 0xdc, 0x5b, 0x84, 0xce, 0xd2, 0xb2, 0xeb, 0x87, 0x62, 0x60, 0xe3, 0x10, 0x05, 0x64, 0x59, 0x15, 0xc4, 0x2d, 0x78, 0xc8, 0xf3, 0x14, 0x38, 0x87, 0x39, 0xb3, 0x58, 0xb5, 0x95, 0x07, 0x25, 0xd9, 0xc1, 0xac, 0x04];&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+            let both: &amp;amp;[u8] = &amp;amp;[0x88, 0x64, 0x15, 0xce, 0x5e, 0x3b, 0x8d, 0x35, 0xdb, 0xd2, 0xb5, 0xfa, 0x8e, 0xa7, 0x73, 0x10, 0x66, 0x83, 0x1b, 0xd1, 0xde, 0x0f, 0x25, 0x86, 0xeb, 0xe5, 0x42, 0x44, 0xad, 0x62, 0xff, 0x11, 0x31, 0xb8, 0xce, 0x94, 0x4d, 0x2b, 0xb9, 0x7e, 0xd5, 0x81, 0x7f, 0xc2, 0x40, 0xbf, 0x3d, 0xa5, 0x25, 0xa5, 0xf9, 0xdf, 0x53, 0x68, 0xc4, 0xf6, 0xbe, 0x06, 0x7d, 0xf3, 0xc7, 0xdc, 0x5b, 0x84, 0xce, 0xd2, 0xb2, 0xeb, 0x87, 0x62, 0x60, 0xe3, 0x10, 0x05, 0x64, 0x59, 0x15, 0xc4, 0x2d, 0x78, 0xc8, 0xf3, 0x14, 0x38, 0x87, 0x39, 0xb3, 0x58, 0xb5, 0x95, 0x07, 0x25, 0xd9, 0xc1, 0xac, 0x04];&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+            let chk_a = crc32(0, &amp;amp;a);&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+            assert_eq!(chk_a, 101488544);&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+            let chk_b = crc32(0, &amp;amp;b);&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+            assert_eq!(chk_b, 2995985109);&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+            let combined = crc32_combine(chk_a, chk_b, 64);&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+            assert_eq!(combined, 2546675245);&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+            let chk_both = crc32(0, &amp;amp;both);&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+            assert_eq!(chk_both, 3010918023);&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+            assert_eq!(combined, chk_both);&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+        }&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+    }&lt;/span&gt;
&lt;span style="color: #56d364; background-color: #0f5323"&gt;+&lt;/span&gt;
&lt;span style="color: #6e7681"&gt; &lt;/span&gt;   #[test]
&lt;span style="color: #6e7681"&gt; &lt;/span&gt;   fn test_crc32_combine() {
&lt;span style="color: #6e7681"&gt; &lt;/span&gt;       ::quickcheck::quickcheck(test as fn(_) -&amp;gt; _);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Running &lt;code&gt;cargo test&lt;/code&gt; passed! I was at my wits end! How could the C code fail but the Rust code succeed??&lt;/p&gt;
&lt;p&gt;I felt like I had enough information that I reported the issue to zlib-rs. Let me interrupt this story to mention that I really want to thank Folkert de Vries (maintainer of zlib-rs) for help debugging this. They were extremely friendly and helpful in figuring out what was going wrong. Folkert responded to my issue that my C program sample works for them!
Why would my machine be any different? I was running in the WSL at the time, maybe that could cause weirdness? I decided to write up a Containerfile to ensure I had a clean environment:&lt;/p&gt;
&lt;div class="codehilite" style="background: #0d1117"&gt;&lt;pre style="line-height: 125%;"&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style="color: #ff7b72"&gt;FROM&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;ubuntu:24.04&lt;/span&gt;

&lt;span style="color: #ff7b72"&gt;RUN&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;apt-get&lt;span style="color: #6e7681"&gt; &lt;/span&gt;update&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff"&gt;\&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;    &lt;/span&gt;apt-get&lt;span style="color: #6e7681"&gt; &lt;/span&gt;install&lt;span style="color: #6e7681"&gt; &lt;/span&gt;-y&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff"&gt;\&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;        &lt;/span&gt;build-essential&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff"&gt;\&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;        &lt;/span&gt;curl&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff"&gt;\&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;        &lt;/span&gt;git&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff"&gt;\&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;        &lt;/span&gt;pkg-config&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff"&gt;\&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;        &lt;/span&gt;libssl-dev

&lt;span style="color: #ff7b72"&gt;RUN&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;curl&lt;span style="color: #6e7681"&gt; &lt;/span&gt;https://sh.rustup.rs&lt;span style="color: #6e7681"&gt; &lt;/span&gt;-sSf&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;|&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;bash&lt;span style="color: #6e7681"&gt; &lt;/span&gt;-s&lt;span style="color: #6e7681"&gt; &lt;/span&gt;--&lt;span style="color: #6e7681"&gt; &lt;/span&gt;-y
&lt;span style="color: #ff7b72"&gt;ENV&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff"&gt;PATH&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;=&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;&amp;quot;/root/.cargo/bin:${&lt;/span&gt;&lt;span style="color: #79c0ff"&gt;PATH&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;}&amp;quot;&lt;/span&gt;
&lt;span style="color: #ff7b72"&gt;RUN&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;curl&lt;span style="color: #6e7681"&gt; &lt;/span&gt;-sSL&lt;span style="color: #6e7681"&gt; &lt;/span&gt;https://apt.llvm.org/llvm-snapshot.gpg.key&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;|&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;apt-key&lt;span style="color: #6e7681"&gt; &lt;/span&gt;add&lt;span style="color: #6e7681"&gt; &lt;/span&gt;-
&lt;span style="color: #ff7b72"&gt;RUN&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;echo&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;&amp;quot;deb http://apt.llvm.org/noble/ llvm-toolchain-noble-20 main&amp;quot;&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&amp;gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;/etc/apt/sources.list.d/llvm.list
&lt;span style="color: #ff7b72"&gt;RUN&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;apt-get&lt;span style="color: #6e7681"&gt; &lt;/span&gt;update&lt;span style="color: #6e7681"&gt;  &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;apt-get&lt;span style="color: #6e7681"&gt; &lt;/span&gt;upgrade&lt;span style="color: #6e7681"&gt; &lt;/span&gt;-y&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;apt-get&lt;span style="color: #6e7681"&gt; &lt;/span&gt;install&lt;span style="color: #6e7681"&gt; &lt;/span&gt;-y&lt;span style="color: #6e7681"&gt; &lt;/span&gt;clang-20
&lt;span style="color: #ff7b72"&gt;RUN&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;cargo&lt;span style="color: #6e7681"&gt; &lt;/span&gt;install&lt;span style="color: #6e7681"&gt; &lt;/span&gt;cargo-c
&lt;span style="color: #ff7b72"&gt;RUN&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;mkdir&lt;span style="color: #6e7681"&gt; &lt;/span&gt;/scratch
&lt;span style="color: #ff7b72"&gt;RUN&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;git&lt;span style="color: #6e7681"&gt; &lt;/span&gt;clone&lt;span style="color: #6e7681"&gt; &lt;/span&gt;https://github.com/trifectatechfoundation/zlib-rs.git&lt;span style="color: #6e7681"&gt; &lt;/span&gt;/scratch/zlib-rs
&lt;span style="color: #ff7b72"&gt;COPY&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;./test.c&lt;span style="color: #6e7681"&gt; &lt;/span&gt;/scratch/zlib-rs/libz-rs-sys-cdylib/test.c
&lt;span style="color: #ff7b72"&gt;WORKDIR&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;/scratch/zlib-rs/libz-rs-sys-cdylib&lt;/span&gt;
&lt;span style="color: #ff7b72"&gt;ENV&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff"&gt;RUSTFLAGS&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;=&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;&amp;quot;-Ctarget-cpu=native&amp;quot;&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;#&lt;span style="color: #6e7681"&gt; &lt;/span&gt;comment&lt;span style="color: #6e7681"&gt; &lt;/span&gt;this&lt;span style="color: #6e7681"&gt; &lt;/span&gt;out&lt;span style="color: #6e7681"&gt; &lt;/span&gt;to&lt;span style="color: #6e7681"&gt; &lt;/span&gt;fix&lt;span style="color: #6e7681"&gt; &lt;/span&gt;the&lt;span style="color: #6e7681"&gt; &lt;/span&gt;bug
&lt;span style="color: #ff7b72"&gt;RUN&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;cargo&lt;span style="color: #6e7681"&gt; &lt;/span&gt;cbuild&lt;span style="color: #6e7681"&gt; &lt;/span&gt;--release
&lt;span style="color: #ff7b72"&gt;RUN&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;clang-20&lt;span style="color: #6e7681"&gt; &lt;/span&gt;-o&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;test&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;test.c&lt;span style="color: #6e7681"&gt; &lt;/span&gt;-I&lt;span style="color: #6e7681"&gt; &lt;/span&gt;./include/&lt;span style="color: #6e7681"&gt; &lt;/span&gt;-static&lt;span style="color: #6e7681"&gt; &lt;/span&gt;./target/x86_64-unknown-linux-gnu/release/libz_rs.a
&lt;span style="color: #ff7b72"&gt;ENV&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #79c0ff"&gt;LD_LIBRARY_PATH&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;=&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;&amp;quot;target/x86_64-unknown-linux-gnu/release/&amp;quot;&lt;/span&gt;
&lt;span style="color: #ff7b72"&gt;ENTRYPOINT&lt;/span&gt;&lt;span style="color: #6e7681"&gt; &lt;/span&gt;&lt;span style="color: #e6edf3"&gt;[&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;&amp;quot;./test&amp;quot;&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;While experimenting with setting up this container, I found a lead at last! If I compiled with &lt;code&gt;RUSTFLAGS="-Ctarget-cpu=native"&lt;/code&gt;, the program gave the wrong results. If I compiled &lt;em&gt;without&lt;/em&gt; using native code generation, the program worked correctly. Bizarre!!&lt;/p&gt;
&lt;p&gt;Backing up a bit, let me explain what &lt;code&gt;RUSTFLAGS="-Ctarget-cpu=native"&lt;/code&gt; actually does (if you know already, please skip to the next paragraph). Compilers like &lt;code&gt;rustc&lt;/code&gt; have feature flags for each target (aka OS + CPU architecture family) which allows them to optionally emit code that uses features of processors. For example, most x86 processors have &lt;code&gt;sse2&lt;/code&gt;, and ARM64 processors have NEON or SVE. Newer processes usually come with newer features which provide optimized implementations of some useful thing, for example some x86 processors has optimized implementations of SHA hashing. Since not all computers have every feature, these need to be opted into at compile time. In the case of &lt;code&gt;RUSTFLAGS="-Ctarget-cpu=native"&lt;/code&gt; I'm telling Rust "use all the features for my current processor." This is a way to eke out the most performance from a program. But in this case, it meant I had a bug on my hands! Folkert (maintainer of zlib-rs) suggested I try to narrow down exactly which instruction set extension was causing the issue. After a bit of binary searching, I found out it was &lt;code&gt;avx512vl&lt;/code&gt;. AVX is an extension to provide &lt;a href="https://en.wikipedia.org/wiki/Single_instruction,_multiple_data"&gt;SIMD&lt;/a&gt; and AVX512-VL is an extension which allows interoperability between 128/256-bit wide SIMD and faster 512-bit wide SIMD. This made a lot of sense in some ways, after all, I have an AMD R9 9950X, and one of it's features is AVX512 support! But how exactly did these AVX512 instructions get into the final binary?&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;NOTE&lt;/strong&gt;:&lt;br&gt; As pointed out in a message on Mastodon, AVX512-VL is actually 11 years old! It was first introduced in Intel AVX512 implementations. However, AVX512 support in Rust is relatively new.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;So enabling AVX512 was the culprit for the bug in crc32 calculations. Skimming over the zlib-rs code, I was a bit surprised to find that it does not explicitly use AVX-512 &lt;em&gt;anywhere&lt;/em&gt;! In fact it uses the older SSE4.1 instruction set (presumably for maximum portability). So why was AVX512-VL causing these issues? Unfortunately, I don't know for sure. But I have a theory.&lt;/p&gt;
&lt;p&gt;Rust uses LLVM as it's default backend (the bit of the compiler that emits instructions/binaries). LLVM probably realized it could use AVX512-VL instructions (available on my machine) to speed up the SSE4.1 code that zlib-rs is using. However, AVX512-VL is new enough that there was a bug in the compiler - a miscompilation - and the wrong code was emitted. I haven't found a smoking gun issue but &lt;a href="https://github.com/llvm/llvm-project/issues?q=is%3Aissue%20state%3Aclosed%20avx512vl"&gt;it is probably one of these&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I am happy to report that this issue does not present itself with Rust 1.90+ or the latest release of zlib-rs. Many thanks again to Folkert for not only helping figure out the source of the issue, but also adding a mitigation to zlib-rs and cutting a new release to work around the miscompilation! Now the CPython test suite passes when linked against zlib-rs and I can continue my experiments...&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Emma Smith</dc:creator><pubDate>Tue, 14 Oct 2025 00:00:00 -0700</pubDate><guid>tag:emmatyping.dev,2025-10-14:/finding-a-miscompilation-in-rustllvm.html</guid><category>misc</category><category>python</category><category>rust</category><category>compression</category></item><item><title>Revamping my blog... again</title><link>https://emmatyping.dev/revamping-my-blog-again.html</link><description>&lt;h2&gt;Background&lt;/h2&gt;
&lt;p&gt;Well, I've succumbed to the ever-present urge to completely change one's blog setup. It all started when
I wanted to add my blog to the &lt;a href="https://github.com/cosimameyer/awesome-pyladies-blogs/"&gt;Awesome PyLadies' blogs repo&lt;/a&gt;. As part of the configuration you can add your blog's RSS feed (structured information about a blog's contents). But the configuration says:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;if you wish to have your blog posts being promoted by the Mastodon bot; the RSS feed should be for Python-related posts&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;My previous blog generator was &lt;a href="https://www.getzola.org/"&gt;zola&lt;/a&gt;, which worked really well and was easy to set up! However, zola does not support per-tag (or "taxonomy" as zola calls them) feeds. I considered contributing support for this to zola, but I figured I'd look around at other static site generators and see what they support. My blog content is just a bunch of Markdown files after all, so it should be easy to move to another static site generator!&lt;/p&gt;
&lt;h2&gt;Yak shaving, for fun and profit&lt;/h2&gt;
&lt;p&gt;I came across &lt;a href="https://getpelican.com/"&gt;Pelican&lt;/a&gt;, which was really appealing for a few reasons. First, it supported per-feed RSS feeds. But also, it is written in Python and I felt like it would be fitting since I am a Pythonista. So I decided I would try to port my blog to Pelican. As you may be able to tell by looking at the footer, I did so successfully :)&lt;/p&gt;
&lt;p&gt;Setting up Pelican is actually super easy. I installed pelican with markdown support by running &lt;code&gt;uv tool install pelican[markdown]&lt;/code&gt; and ran &lt;code&gt;pelican-quickstart&lt;/code&gt; to set up a project. After answering a few prompts, I had a full project set up and could copy over the Markdown files used to write this blog. After changing the metadata from zola's format to Pelican's, I had a blog generated... with no theme.&lt;/p&gt;
&lt;p&gt;Oh... I needed to see what themes were available. Fortunately Pelican makes this easy by going to the &lt;a href="https://pelicanthemes.com/"&gt;pelicanthemes.com&lt;/a&gt; website. That site has a number of community authored themes. Unfortunately, I didn't see any themes I loved.&lt;/p&gt;
&lt;h2&gt;Introducing pelican-theme-terminimal&lt;/h2&gt;
&lt;p&gt;So, I did the only natural thing to do and ported the &lt;a href="https://github.com/pawroman/zola-theme-terminimal"&gt;zola theme I was using&lt;/a&gt; to Pelican. Fortunately, this wasn't actually too bad. Zola uses &lt;a href="https://keats.github.io/tera/"&gt;Tera&lt;/a&gt; for its templates, which is based on Jinja2, which is what Pelican uses. So for the most part I could minimally update the variables used and get the theme ported over easily. The layout between the two is slightly different so I had to restructure how things are designed, but overall it was pretty easy and enjoyable.&lt;/p&gt;
&lt;p&gt;You can check out &lt;a href="https://github.com/emmatyping/pelican-theme-terminimal/"&gt;the theme's code here&lt;/a&gt;. I
don't plan on working on the theme a &lt;em&gt;ton&lt;/em&gt; more, mostly just to add features or customizations I want, but it is open source if anyone else wants to use it or submit patches.&lt;/p&gt;
&lt;p&gt;The top priorities I have to work on are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Links to RSS feeds&lt;/li&gt;
&lt;li&gt;Mastodon verification&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So yeah, my blog is now running on Pelican and Python 🎉&lt;/p&gt;
&lt;p&gt;I have a few ideas to blog about over the next week or two so check back soon, or subscribe to
&lt;a href="/feeds/all.rss.xml"&gt;my RSS feed&lt;/a&gt;.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Emma Smith</dc:creator><pubDate>Sun, 07 Sep 2025 00:00:00 -0700</pubDate><guid>tag:emmatyping.dev,2025-09-07:/revamping-my-blog-again.html</guid><category>misc</category><category>meta</category><category>python</category></item><item><title>Introducing the real me!</title><link>https://emmatyping.dev/introducing-the-real-me.html</link><description>&lt;p&gt;I'm really excited for the opportunity to introduce my true self. A while back
I began exploring my gender expression, and today marks an important step in my
journey. So let me re-introduce myself, the real me this time.&lt;/p&gt;
&lt;p&gt;Hi! 👋 I'm Emma, a trans woman. I use she/her pronouns.&lt;/p&gt;
&lt;p&gt;I'm the same person, just more... me! So all the stuff in
&lt;a href="/pages/about.html"&gt;About Me&lt;/a&gt; is still true! Expect more posts about Python,
packaging, and whatever other hobby projects I have going on.&lt;/p&gt;
&lt;p&gt;Finally, I'd like to thank my wife and everyone who has supported me on this
journey so far. I feel so lucky to be surrounded by so many supportive friends
and family members. I absolutely could not have gotten this far without them.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Emma Smith</dc:creator><pubDate>Sat, 23 Nov 2024 00:00:00 -0800</pubDate><guid>tag:emmatyping.dev,2024-11-23:/introducing-the-real-me.html</guid><category>misc</category><category>meta</category></item><item><title>New Rust crate: generational-arena-dom</title><link>https://emmatyping.dev/new-rust-crate-generational-arena-dom.html</link><description>&lt;p&gt;I've just published a new crate someone may find interesting. I recently had a take-home assessment where I used the &lt;a href="https://servo.org/"&gt;Servo project's&lt;/a&gt; &lt;code&gt;html5ever&lt;/code&gt; HTML parser crate. &lt;a href="https://github.com/servo/html5ever"&gt;&lt;code&gt;html5ever&lt;/code&gt;&lt;/a&gt; is the main crate that servo uses to parse HTML content on the web. This crate is very customizable, and you have to bring your own implementation of the DOM (which means you can handle memory management however you want!). They provide an example implementation of the DOM that uses &lt;code&gt;Rc&amp;lt;RefCell&amp;lt;T&amp;gt;&amp;gt;&lt;/code&gt;s all over, which is a bit annoying to use with Rust's borrow checking model. This becomes particularly frustrating when dealing with frequent DOM mutations, as was the case in my project.&lt;/p&gt;
&lt;p&gt;Fortunately, I recalled the benefits of using arenas for ergonomic memory management when working with trees. I also had recently read about &lt;a href="https://verdagon.dev/blog/hybrid-generational-memory"&gt;Vale's generational arena usage&lt;/a&gt;, and I was inspired to build a DOM implementation based on a generational arena design. Generational arenas are nice because they don't suffer from &lt;a href="https://en.wikipedia.org/wiki/ABA_problem"&gt;the ABA problem&lt;/a&gt;, so it is thread-safe to add, update, and delete nodes in the DOM. I ended up coming across the &lt;a href="https://crates.io/crates/generational-indextree"&gt;generational-indextree&lt;/a&gt; crate, which uses tokens instead of references to refer to tree members, which made working with mutable elements of the DOM much easier!&lt;/p&gt;
&lt;p&gt;Anyway, I encourage anyone who is interested to learn more to check out the project &lt;a href="https://github.com/emmatyping/generational-arena-dom"&gt;on my github&lt;/a&gt; or &lt;a href="https://crates.io/crates/generational-arena-dom"&gt;on the project page on crates.io&lt;/a&gt;&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Emma Smith</dc:creator><pubDate>Sat, 08 Jul 2023 00:00:00 -0700</pubDate><guid>tag:emmatyping.dev,2023-07-08:/new-rust-crate-generational-arena-dom.html</guid><category>misc</category><category>html</category><category>rust</category></item><item><title>Using multiprocessing and sqlite3 together</title><link>https://emmatyping.dev/using-multiprocessing-and-sqlite3-together.html</link><description>&lt;blockquote&gt;
&lt;p&gt;Note from the author: this is a pseudo TIL, but I hadn't seen it written down anywhere, hopefully someone finds it useful!
Jump to "Solution" below if you don't care about the background.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h1&gt;Background&lt;/h1&gt;
&lt;h3&gt;Generating Data&lt;/h3&gt;
&lt;p&gt;I recently started working on a reinforcement learning project, and I needed to generate a lot of training data. The project involves quantum compilers, so the data I generate is quantum circuits. For those unfamiliar, quantum circuits are just sequences of unitary matrices laid out in a particular order. I chose to store the circuit as a sequence of unitary gate names. The output of the data generation is the unitaries (numpy arrays) that are the result of multiplying the matrices in the circuit together.&lt;/p&gt;
&lt;p&gt;I ended up wanting to generate somewhere in the region of a few hundred billion matrices, each of them very small. I knew off the bat that this would require a fair bit of time, and I wanted to take advantage of the 32 core server I own. Since I was using Python to generate this data, I used the multiprocessing module. Sadly I cannot yet take advantage of &lt;a href="https://martinheinz.dev/blog/97"&gt;Python multithreading coming in 3.12&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;Disk Space Woes&lt;/h3&gt;
&lt;p&gt;For saving the generated matrices, I started off by doing the simplest thing, just using plain-old &lt;code&gt;np.savetxt&lt;/code&gt; to save the (pretty tiny) matrices to disk in each process after computing the product of the matrices in the quantum circuit. This... was problematic. I quickly ran into a disk out of space error. Normally this means I need to clear out space on whatever VM I am using, but there was one problem -- I still had hundreds of gigabytes of space left on disk!&lt;/p&gt;
&lt;p&gt;I quickly deduced the error actually was caused by hitting the limit on entries in a directory, dang it &lt;a href="https://cifs.com/"&gt;CIFS&lt;/a&gt;! I briefly tried to make my own schema to split the unitaries into more directories to avoid this limit but I ended up hitting more file system limits. It was clear just writing files to disk wouldn't scale to the size of dataset I needed to generate.&lt;/p&gt;
&lt;h3&gt;Choosing a Database&lt;/h3&gt;
&lt;p&gt;Of course, dealing with so many small files, a database was the right solution to this problem. Why didn't I start with a database to begin with? Partially because I wanted to make it easy to load individual unitaries (&lt;code&gt;np.loadtxt&lt;/code&gt; is an incredibly handy API). Also, I was just hacking this data generation script together.&lt;/p&gt;
&lt;p&gt;I had one issue with switching to a database: I wanted something simple and lightweight, I didn't need anything fancy like postgres or the like. Sqlite is the obvious choice but sqlite does not by default support concurrent &lt;em&gt;writes&lt;/em&gt;, which is exactly what I wanted to do!&lt;/p&gt;
&lt;h1&gt;Solution&lt;/h1&gt;
&lt;p&gt;So how can one achieve concurrent writes in Python using sqlite3? Sqlite by default uses a rollback log to maintain consistency. You can change the configuration so that sqlite uses &lt;a href="https://www.sqlite.org/wal.html"&gt;a &lt;em&gt;write-ahead&lt;/em&gt; log (WAL) mode&lt;/a&gt; as well, which allows for concurrent writes. You can enable WAL mode in Python by setting the &lt;a href="https://www.sqlite.org/pragma.html#pragma_journal_mode"&gt;journal mode&lt;/a&gt;:&lt;/p&gt;
&lt;div class="codehilite" style="background: #0d1117"&gt;&lt;pre style="line-height: 125%;"&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style="color: #8b949e; font-style: italic"&gt;# assume some Cursor object `cursor`&lt;/span&gt;
&lt;span style="color: #e6edf3"&gt;cursor&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;.&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;execute(&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;&amp;#39;PRAGMA journal_mode = WAL&amp;#39;&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;However, I started getting exceptions part way through. Some processes calculating the unitaries were being told the database was locked, even though it should not be (these processes should be writing to the WAL, which I want to always be available). Therefore I also set &lt;a href="https://www.sqlite.org/pragma.html#pragma_synchronous"&gt;the sqlite pragma &lt;code&gt;synchronous&lt;/code&gt;&lt;/a&gt; to &lt;code&gt;OFF&lt;/code&gt;, which means that the WAL does not synchronize before checkpoints. Note this is &lt;strong&gt;dangerous&lt;/strong&gt; because theoretically your database could become corrupted if the process crashes or the server shuts down. This is acceptable to me because I can always regenerate the database and either of these are very unlikely to occur while I run these data generation tasks. This can be done in Python like so:&lt;/p&gt;
&lt;div class="codehilite" style="background: #0d1117"&gt;&lt;pre style="line-height: 125%;"&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style="color: #8b949e; font-style: italic"&gt;# assume some Cursor object `cursor`&lt;/span&gt;
&lt;span style="color: #e6edf3"&gt;cursor&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;.&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;execute(&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;&amp;#39;PRAGMA synchronous = OFF&amp;#39;&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;In summary, by enabling the WAL and turning some sync'ing off, I was able to get multi-processed Python code to concurrently write to a sqlite database. This also gave a nice speed bump since sqlite is optimized for writing many small amounts of data to disk, a nice bonus!&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Emma Smith</dc:creator><pubDate>Fri, 19 May 2023 00:00:00 -0700</pubDate><guid>tag:emmatyping.dev,2023-05-19:/using-multiprocessing-and-sqlite3-together.html</guid><category>misc</category><category>sql</category><category>python</category><category>multiprocessing</category></item><item><title>Rust CLI tools apt repo</title><link>https://emmatyping.dev/rust-cli-tools-apt-repo.html</link><description>&lt;blockquote&gt;
&lt;p&gt;tl;dr
Go to &lt;a href="http://apt.cli.rs"&gt;http://apt.cli.rs&lt;/a&gt; and follow the instructions to add the apt repo&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I guess its only fair to start my new blogging kick by catching up on a project I'd worked on several months ago which is pretty much "done." I really like several Rust tools, such as &lt;a href="https://github.com/BurntSushi/ripgrep"&gt;&lt;code&gt;ripgrep&lt;/code&gt;&lt;/a&gt;, &lt;a href="https://github.com/sharkdp/hyperfine"&gt;&lt;code&gt;hyperfine&lt;/code&gt;&lt;/a&gt;, and &lt;a href="https://github.com/sharkdp/fd"&gt;&lt;code&gt;fd&lt;/code&gt;&lt;/a&gt;. I wanted an easy way to install them on Debian-based systems that may not have them in the official repos, so I needed to create my own apt repo. I was originally considering making a ppa, or private package archive, since I mainly use Ubuntu, but I decided I wanted these tools available on Debian as well, so my only option was an apt repo.&lt;/p&gt;
&lt;p&gt;It turns out that apt repos are really simple, you just need to serve a directory with e.g. nginx. It took me a while to find a tool I liked using for making the apt repo, however. I started with trying &lt;code&gt;reprepro&lt;/code&gt;, but I found it was more annoying to use than I wanted, so I ended up using &lt;a href="https://www.aptly.info/"&gt;&lt;code&gt;aptly&lt;/code&gt;&lt;/a&gt;, a newer apt repo management tool.&lt;/p&gt;
&lt;p&gt;When setting up the repo, my criteria for inclusion were:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the tool must be able to build &lt;code&gt;*.deb&lt;/code&gt; packages. This is the package format for Debian/Ubuntu so definitely required&lt;/li&gt;
&lt;li&gt;the tool must build those packages. I wanted to use the official/same binaries as those in the Github release&lt;/li&gt;
&lt;li&gt;the binary packages must be statically musl-linked. Since various versions of Debian/Ubuntu use different versions of glibc, you can get version compatibility errors if you don't statically link. Statically linking to musl is &lt;em&gt;much&lt;/em&gt; easier than glibc, so this seemed to be the best route to avoid compatibility errors.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;With these criteria in hand, I then set up the apt repo with the CLI tools that fit and wrote a script to try updating all of the packages, pulling from the official Github releases. I have subscribed to releases on Github from all the repos that are currently included in the apt repo so all I have to do is run the script when a new release comes out from one of the repos. I may end up setting up a cronjob that runs this update script, but I'm weighing whether it is worth any potential risks...&lt;/p&gt;
&lt;p&gt;Anyway, if you are interested in using this apt repo, head on over to &lt;a href="http://apt.cli.rs"&gt;https://apt.cli.rs&lt;/a&gt;, I've written up the commands you need to get started using the apt repo.&lt;/p&gt;
&lt;p&gt;If you have suggestions for Rust CLI tools that should be included, please &lt;a href="https://github.com/emmatyping/apt.cli.rs/issues/new"&gt;open an issue on the github&lt;/a&gt;! If a package doesn't build musl-linked Debian packages, consider opening an issue on that project to add it. However, please &lt;em&gt;do not&lt;/em&gt; spam maintainers asking for these packages to be built. I unfortunately do not have time to contribute to the packaging of every great Rust CLI tool, but I can add them to the apt repo if they are packaged already!&lt;/p&gt;
&lt;p&gt;Anyway, hopefully someone else finds it useful!&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Emma Smith</dc:creator><pubDate>Sat, 26 Nov 2022 00:00:00 -0800</pubDate><guid>tag:emmatyping.dev,2022-11-26:/rust-cli-tools-apt-repo.html</guid><category>misc</category><category>rust</category><category>packaging</category></item><item><title>Revamping my blog</title><link>https://emmatyping.dev/revamping-my-blog.html</link><description>&lt;p&gt;Well, I've decided I want to blog a bit more, in part thanks to &lt;a href="https://simonwillison.net/2022/Nov/6/what-to-blog-about/"&gt;Simon Willison's post&lt;/a&gt; suggesting keeping a blog is a good idea.&lt;/p&gt;
&lt;p&gt;I've changed the UI a bit and I am now publishing via a Github action (I now just need to write a markdown file and push to a git repo!). I hope making it easier to blog will hopefully mean I blog a bit more.&lt;/p&gt;
&lt;p&gt;One slight issue I ran into is I would push a commit, then my website would go down. The custom domain setting in Github pages would be reset to empty. At first, I thought this was a bug, how could this get reset so frequently! Turns out Github ties the custom domain to a file named &lt;code&gt;CNAME&lt;/code&gt; in the root of the &lt;code&gt;gh-pages&lt;/code&gt; branch (or whatever you configure to publish in the settings). I've now set that as a static asset to be included in the root so I won't run into this problem, but rather annoying behavior, my git repo shouldn't be used to store configuration!&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Emma Smith</dc:creator><pubDate>Fri, 25 Nov 2022 00:00:00 -0800</pubDate><guid>tag:emmatyping.dev,2022-11-25:/revamping-my-blog.html</guid><category>misc</category><category>meta</category></item><item><title>PyBay 2019!</title><link>https://emmatyping.dev/pybay-2019.html</link><description>&lt;p&gt;I will be tabling at PyBay 2019 on August 18th! Hope to see you there :)&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Emma Smith</dc:creator><pubDate>Sun, 04 Aug 2019 00:00:00 -0700</pubDate><guid>tag:emmatyping.dev,2019-08-04:/pybay-2019.html</guid><category>misc</category><category>pybay</category><category>python</category></item><item><title>Stupid solutions for stupid problems</title><link>https://emmatyping.dev/stupid-solutions-for-stupid-problems.html</link><description>&lt;p&gt;A while back, a friend of mine was working on a coding challenge for some internship. The prompt said to create a dictionary with string keys and string values in your favorite programming language, without using the built in dictionary type.&lt;/p&gt;
&lt;p&gt;This got me thinking of how I would solve this, and this terrible monster is what I came up with:&lt;/p&gt;
&lt;div class="codehilite" style="background: #0d1117"&gt;&lt;pre style="line-height: 125%;"&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style="color: #ff7b72"&gt;class&lt;/span&gt; &lt;span style="color: #f0883e; font-weight: bold"&gt;DumbDict&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;
    &lt;span style="color: #d2a8ff; font-weight: bold"&gt;__getitem__&lt;/span&gt; &lt;span style="color: #ff7b72; font-weight: bold"&gt;=&lt;/span&gt; &lt;span style="color: #e6edf3"&gt;getattr&lt;/span&gt;
    &lt;span style="color: #d2a8ff; font-weight: bold"&gt;__setitem__&lt;/span&gt; &lt;span style="color: #ff7b72; font-weight: bold"&gt;=&lt;/span&gt; &lt;span style="color: #e6edf3"&gt;setattr&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This is a very silly solution. I don't feel bad about it. Here it is in action:&lt;/p&gt;
&lt;div class="codehilite" style="background: #0d1117"&gt;&lt;pre style="line-height: 125%;"&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;d&lt;/span&gt; &lt;span style="color: #ff7b72; font-weight: bold"&gt;=&lt;/span&gt; &lt;span style="color: #e6edf3"&gt;DumbDict()&lt;/span&gt;
&lt;span style="color: #ff7b72; font-weight: bold"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;d[&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;&amp;#39;hi&amp;#39;&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;]&lt;/span&gt; &lt;span style="color: #ff7b72; font-weight: bold"&gt;=&lt;/span&gt; &lt;span style="color: #a5d6ff"&gt;&amp;#39;test&amp;#39;&lt;/span&gt;
&lt;span style="color: #ff7b72; font-weight: bold"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;d[&lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;&amp;#39;hi&amp;#39;&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;]&lt;/span&gt;
&lt;span style="color: #a5d6ff"&gt;&amp;#39;test&amp;#39;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;How does it work? Well, in Python &lt;code&gt;__getitem__&lt;/code&gt; is the protocol for subscribtion access. When I write &lt;code&gt;d['hi']&lt;/code&gt; Python internally calls &lt;code&gt;d.__getitem__('hi')&lt;/code&gt;. So whats the deal with the &lt;code&gt;getattr&lt;/code&gt; call then?&lt;/p&gt;
&lt;p&gt;In Python, everything is an object. By default, objects internally map attribute names to values. In my dictionary implementation, I take advantage of this to use Python's internals to create my dictionary.&lt;/p&gt;
&lt;p&gt;The pros of this dictionary are that it is fast. It is close (within 10%) of the builtin dictionary type (there is a bit of overhead with function calls).&lt;/p&gt;
&lt;p&gt;There are quite a few cons, the first of which several of the more knowledable among you have already realized. I'm totally cheating here. Well, maybe. Or I'm not. Technically, classes use something called &lt;code&gt;types.MappingProxyType&lt;/code&gt; to manage attribute mappings. This is however an implementation detail, so its up to personal opinion whether I'm using a built in dictionary or not. Anyway, I think its pretty cool. And remember I did say it was stupid...&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Emma Smith</dc:creator><pubDate>Mon, 04 Jun 2018 00:00:00 -0700</pubDate><guid>tag:emmatyping.dev,2018-06-04:/stupid-solutions-for-stupid-problems.html</guid><category>misc</category><category>silly</category><category>python</category></item><item><title>PyBay 2017</title><link>https://emmatyping.dev/pybay-2017.html</link><description>&lt;p&gt;The video of the panel at PyBay 2017 I was on was posted!&lt;/p&gt;
&lt;iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/XkCyrLN5r2M" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen&gt;&lt;/iframe&gt;

&lt;p&gt;It was a pleasure to be a part of the panel, and a great experience to go to PyBay 2017. The SF Python community was kind and welcoming. I hope to go to their regular meetings if I can find time.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Emma Smith</dc:creator><pubDate>Fri, 01 Sep 2017 00:00:00 -0700</pubDate><guid>tag:emmatyping.dev,2017-09-01:/pybay-2017.html</guid><category>misc</category><category>pybay</category><category>python</category></item><item><title>Typycal - Generate type stubs from runtime type information</title><link>https://emmatyping.dev/typycal-generate-type-stubs-from-runtime-type-information.html</link><description>&lt;p&gt;EDIT: This project stalled and is no longer being worked on. You can find the sources &lt;a href="https://github.com/emmatyping/typycal"&gt;on my github&lt;/a&gt;. The original blog is below.&lt;/p&gt;
&lt;p&gt;Note: This blog post assumes basic familiarity with typing&lt;/p&gt;
&lt;p&gt;I am a collaborator on the &lt;a href="http://mypy-lang.org"&gt;mypy&lt;/a&gt; project, which implements a static type checker for Python. Static typing is useful for many reasons, such as making refactoring and other code maintenance easier. At PyCon this year, I heard from many developers using mypy who found it useful in understanding their code and finding type errors. Andreas Dewes gave a nice summary and analysis of typing in Python in a &lt;a href="https://www.slideshare.net/japh44/type-annotations-in-python-whats-whys-and-wows"&gt;Europython talk&lt;/a&gt;. I highly recommend it if you are just starting typing or are not convinced it is worth it to read the slides from that talk.&lt;/p&gt;
&lt;p&gt;At the moment, several companies and open source projects have moved to using static typing, such as Dropbox, Google, and &lt;a href="https://zulip.org/"&gt;Zulip&lt;/a&gt;, an open source chat application. One of the design goals of static typing in Python is to be optional and gradual. However, even with these goals, annotating code, especially for large code bases, can be a significant time drain. With this in mind, I was interested in trying to make it easier for people to adopt typing in their Python code. Some work has been put into making a more advanced type inference tool that would be able to generate useful type information such as Google's &lt;a href="https://github.com/google/pytype"&gt;pytype&lt;/a&gt; project and mypy's stubgen script. Google and Facebook both have their own closed source runtime type inferencers. However, pytype can be inaccurate and crash on valid code (which is bad if you want to adopt static typing). In addition it only runs on Python 2 (though it can check Python 3 code). Stubgen is rather incomplete, and will likely never be able to infer the most complex cases. These are useful tools, but I believe that runtime introspection can do better.&lt;/p&gt;
&lt;p&gt;These solutions seemed overly complex when all the type information needed is right in front of us: in the running code itself! At the time I was thinking about this problem, I coincidentally had been reading up on &lt;a href="https://github.com/Microsoft/Pyjion"&gt;Pyjion&lt;/a&gt; and &lt;a href="https://www.python.org/dev/peps/pep-0523/"&gt;PEP 523: Adding a frame evaluation API to CPython&lt;/a&gt; when I realized I could use the new frame evaluation API to do runtime type introspection!&lt;/p&gt;
&lt;h2&gt;So what is this frame evaluation API?&lt;/h2&gt;
&lt;p&gt;The frame evaluation API was introduced mainly for JIT (just in time) compilers, and debuggers (PyCharm uses it). However, I wanted to use it to analyze the types in frames. You may be asking yourself: what is a frame? A frame is a data structure that Python uses to describe scopes and information about that scope. The simplest to understand frame is a function:&lt;/p&gt;
&lt;div class="codehilite" style="background: #0d1117"&gt;&lt;pre style="line-height: 125%;"&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style="color: #ff7b72"&gt;def&lt;/span&gt; &lt;span style="color: #d2a8ff; font-weight: bold"&gt;test&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;():&lt;/span&gt;
    &lt;span style="color: #ff7b72; font-weight: bold"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Modules, functions, generators, and comprehensions have their own scope so they get their own frame. So when I call &lt;code&gt;os.path.abspath&lt;/code&gt;(path), I am creating a new frame. The frame data is represented by a C struct in CPython (if you aren't familiar with C, just think of it like a Python class with some attributes). It contains information about the function called, such as its name (&lt;code&gt;abspath&lt;/code&gt; in our example), its file path, and most importantly its locals. Locals is a symbol table (a mapping of names to values) for the scope of the frame. For example, in &lt;code&gt;abspath&lt;/code&gt;, the &lt;code&gt;path&lt;/code&gt; argument is a local. Consider the following:&lt;/p&gt;
&lt;div class="codehilite" style="background: #0d1117"&gt;&lt;pre style="line-height: 125%;"&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style="color: #ff7b72"&gt;def&lt;/span&gt; &lt;span style="color: #d2a8ff; font-weight: bold"&gt;hello&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;(name):&lt;/span&gt;
    &lt;span style="color: #e6edf3"&gt;msg&lt;/span&gt; &lt;span style="color: #ff7b72; font-weight: bold"&gt;=&lt;/span&gt; &lt;span style="color: #a5d6ff"&gt;&amp;quot;Hello %s!&amp;quot;&lt;/span&gt;
    &lt;span style="color: #e6edf3"&gt;print(msg&lt;/span&gt; &lt;span style="color: #ff7b72; font-weight: bold"&gt;%&lt;/span&gt; &lt;span style="color: #e6edf3"&gt;name)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Both &lt;code&gt;name&lt;/code&gt; and &lt;code&gt;msg&lt;/code&gt; are locals in the &lt;code&gt;hello&lt;/code&gt; frame.&lt;/p&gt;
&lt;p&gt;The frame evaluation API allows us to inspect the values of &lt;code&gt;name&lt;/code&gt; and &lt;code&gt;msg&lt;/code&gt;. The important part of locals is that function arguments are in the locals of a frame. We can get the type of these objects and log the argument types of the frame. Then we can execute the frame (run the Python code) and capture the return value (and type) as well. Thus the full signature of the function for each call is available. Here is basically how the code works (Python equivalent of the C code in typycal).&lt;/p&gt;
&lt;div class="codehilite" style="background: #0d1117"&gt;&lt;pre style="line-height: 125%;"&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style="color: #ff7b72"&gt;def&lt;/span&gt; &lt;span style="color: #d2a8ff; font-weight: bold"&gt;typycal_evalframe&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;(frame:&lt;/span&gt; &lt;span style="color: #e6edf3"&gt;frameobject,&lt;/span&gt; &lt;span style="color: #e6edf3"&gt;exc:&lt;/span&gt; &lt;span style="color: #e6edf3"&gt;int):&lt;/span&gt;
&lt;span style="color: #6e7681"&gt;    &lt;/span&gt;&lt;span style="color: #a5d6ff"&gt;&amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
&lt;span style="color: #a5d6ff"&gt;    frame is the current frame to be executed&lt;/span&gt;
&lt;span style="color: #a5d6ff"&gt;    exc indicates whether an exception has been thrown calling the frame.&lt;/span&gt;
&lt;span style="color: #a5d6ff"&gt;    &amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
    &lt;span style="color: #ff7b72"&gt;if&lt;/span&gt; &lt;span style="color: #e6edf3"&gt;exc:&lt;/span&gt;
        &lt;span style="color: #ff7b72"&gt;return&lt;/span&gt; &lt;span style="color: #e6edf3"&gt;_PyEval_EvalFrameDefault(frame,&lt;/span&gt; &lt;span style="color: #e6edf3"&gt;exc)&lt;/span&gt;  &lt;span style="color: #8b949e; font-style: italic"&gt;# execute the frame as normal, this function is part of Python&amp;#39;s private C API&lt;/span&gt;
    &lt;span style="color: #ff7b72"&gt;else&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;
        &lt;span style="color: #e6edf3"&gt;code&lt;/span&gt; &lt;span style="color: #ff7b72; font-weight: bold"&gt;=&lt;/span&gt; &lt;span style="color: #e6edf3"&gt;frame&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;.&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;f_code&lt;/span&gt;  &lt;span style="color: #8b949e; font-style: italic"&gt;# this is the bytecode of the frame (stuff to be executed), and some other useful info&lt;/span&gt;
        &lt;span style="color: #e6edf3"&gt;name&lt;/span&gt; &lt;span style="color: #ff7b72; font-weight: bold"&gt;=&lt;/span&gt; &lt;span style="color: #e6edf3"&gt;code&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;.&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;co_name&lt;/span&gt;
        &lt;span style="color: #e6edf3"&gt;file_name&lt;/span&gt; &lt;span style="color: #ff7b72; font-weight: bold"&gt;=&lt;/span&gt; &lt;span style="color: #e6edf3"&gt;code&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;.&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;co_filename&lt;/span&gt;
        &lt;span style="color: #ff7b72"&gt;if&lt;/span&gt; &lt;span style="color: #e6edf3"&gt;whitelisted(file_name,&lt;/span&gt; &lt;span style="color: #e6edf3"&gt;name):&lt;/span&gt;  &lt;span style="color: #8b949e; font-style: italic"&gt;# ignore stdlib and generators/comprehensions&lt;/span&gt;
            &lt;span style="color: #e6edf3"&gt;locals&lt;/span&gt; &lt;span style="color: #ff7b72; font-weight: bold"&gt;=&lt;/span&gt; &lt;span style="color: #e6edf3"&gt;frame&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;.&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;f_locals&lt;/span&gt;  &lt;span style="color: #8b949e; font-style: italic"&gt;# a dict of locals. names are keys, objects are values&lt;/span&gt;
            &lt;span style="color: #e6edf3"&gt;argc&lt;/span&gt; &lt;span style="color: #ff7b72; font-weight: bold"&gt;=&lt;/span&gt; &lt;span style="color: #e6edf3"&gt;code&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;.&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;co_argcount&lt;/span&gt;  &lt;span style="color: #8b949e; font-style: italic"&gt;# number of arguments passed to the function&lt;/span&gt;
            &lt;span style="color: #e6edf3"&gt;ret&lt;/span&gt; &lt;span style="color: #ff7b72; font-weight: bold"&gt;=&lt;/span&gt; &lt;span style="color: #e6edf3"&gt;_PyEval_EvalFrameDefault(frame,&lt;/span&gt; &lt;span style="color: #e6edf3"&gt;exc)&lt;/span&gt;  &lt;span style="color: #8b949e; font-style: italic"&gt;# run the frame, store the return value for analysis&lt;/span&gt;
            &lt;span style="color: #e6edf3"&gt;serialize_types(file_name,&lt;/span&gt; &lt;span style="color: #e6edf3"&gt;name,&lt;/span&gt; &lt;span style="color: #e6edf3"&gt;locals,&lt;/span&gt; &lt;span style="color: #e6edf3"&gt;argc,&lt;/span&gt; &lt;span style="color: #e6edf3"&gt;ret)&lt;/span&gt;
            &lt;span style="color: #ff7b72"&gt;return&lt;/span&gt; &lt;span style="color: #e6edf3"&gt;ret&lt;/span&gt;
        &lt;span style="color: #ff7b72"&gt;else&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;:&lt;/span&gt;
            &lt;span style="color: #ff7b72"&gt;return&lt;/span&gt; &lt;span style="color: #e6edf3"&gt;_PyEval_EvalFrameDefault(frame,&lt;/span&gt; &lt;span style="color: #e6edf3"&gt;exc)&lt;/span&gt;  &lt;span style="color: #8b949e; font-style: italic"&gt;# don&amp;#39;t want to analyze this frame, so execute it as normal&lt;/span&gt;

&lt;span style="color: #ff7b72"&gt;def&lt;/span&gt; &lt;span style="color: #d2a8ff; font-weight: bold"&gt;hook&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;():&lt;/span&gt;
    &lt;span style="color: #e6edf3"&gt;thread_state&lt;/span&gt; &lt;span style="color: #ff7b72; font-weight: bold"&gt;=&lt;/span&gt; &lt;span style="color: #e6edf3"&gt;thread_state_get()&lt;/span&gt; &lt;span style="color: #8b949e; font-style: italic"&gt;# Needed to tell Python to run our frame evaluation function, instead of the default&lt;/span&gt;
    &lt;span style="color: #e6edf3"&gt;thread_state&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;.&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;interp&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;.&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;eval_frame&lt;/span&gt; &lt;span style="color: #ff7b72; font-weight: bold"&gt;=&lt;/span&gt; &lt;span style="color: #e6edf3"&gt;typycal_evalframe&lt;/span&gt;  &lt;span style="color: #8b949e; font-style: italic"&gt;# assign our function to be called when a frame needs to be evaluated&lt;/span&gt;

&lt;span style="color: #ff7b72"&gt;def&lt;/span&gt; &lt;span style="color: #d2a8ff; font-weight: bold"&gt;unhook&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;():&lt;/span&gt;
    &lt;span style="color: #e6edf3"&gt;thread_state&lt;/span&gt; &lt;span style="color: #ff7b72; font-weight: bold"&gt;=&lt;/span&gt; &lt;span style="color: #e6edf3"&gt;thread_state_get()&lt;/span&gt;
    &lt;span style="color: #e6edf3"&gt;thread_state&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;.&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;interp&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;.&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;eval_frame&lt;/span&gt; &lt;span style="color: #ff7b72; font-weight: bold"&gt;=&lt;/span&gt; &lt;span style="color: #e6edf3"&gt;_PyEval_EvalFrameDefault(frame,&lt;/span&gt; &lt;span style="color: #e6edf3"&gt;exc)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;serialize_types&lt;/code&gt; function just takes the Python object and generates PEP 484 compliant type data that is written to a file. You can call the hook from your code:&lt;/p&gt;
&lt;div class="codehilite" style="background: #0d1117"&gt;&lt;pre style="line-height: 125%;"&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style="color: #ff7b72"&gt;import&lt;/span&gt; &lt;span style="color: #ff7b72"&gt;typycal&lt;/span&gt;
&lt;span style="color: #e6edf3"&gt;typycal&lt;/span&gt;&lt;span style="color: #ff7b72; font-weight: bold"&gt;.&lt;/span&gt;&lt;span style="color: #e6edf3"&gt;hook()&lt;/span&gt;
&lt;span style="color: #ff7b72; font-weight: bold"&gt;...&lt;/span&gt; &lt;span style="color: #8b949e; font-style: italic"&gt;# your code here&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The hook will be put in place and typycal will be able to introspect frames, no decorators needed! We now have a means to serialize the type of the current frame! But how to get to *.pyi files?&lt;/p&gt;
&lt;h2&gt;Building *.pyi files&lt;/h2&gt;
&lt;p&gt;With this information logged, one can then run a Python program (part of typycal) to analyze the signatures and generate stub files. The hope is that companies and individuals will be able to run typycal on their source via unit tests or other uses, and come out with stubs that are a basis to typing their entire code. Currently typycal is implemented in C++, both to interop with CPython well and to be fast. It is rather unoptimized, and causes the execution of 12 million frames to slow by roughly 5x, which is obviously not ideal. Most of the slowness comes from checking if a frame should be analyzed. To not greatly burden I/O and keep the serialized data as minimal as possible, we want to exclude the standard library, which can add millions of frames to a medium sized code base. I have a few plans to reduce the time spent on I/O and other optimizations in mind.&lt;/p&gt;
&lt;p&gt;Since getting type information for all types gets very complex, typycal for now handles &lt;code&gt;int&lt;/code&gt;, &lt;code&gt;str&lt;/code&gt;, &lt;code&gt;tuple&lt;/code&gt;, &lt;code&gt;list&lt;/code&gt;, &lt;code&gt;callable&lt;/code&gt;s, and &lt;code&gt;None&lt;/code&gt;. &lt;code&gt;dict&lt;/code&gt; should be relatively straightforward too. All other types will likely be &lt;code&gt;Any&lt;/code&gt;'d since they may not be safe to put in a stub. This will likely change with time.&lt;/p&gt;
&lt;h2&gt;The future&lt;/h2&gt;
&lt;p&gt;I plan on spending the rest of the summer improving typycal to work decently well to the point that I can release it publicly. If you are are interested in getting a look at the library to play with, feel free to contact me, and I will consider sharing it. I also will be discussing this project and static typing in general at PyBay on August 11, and would be happy to answer questions in person.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Emma Smith</dc:creator><pubDate>Wed, 26 Jul 2017 00:00:00 -0700</pubDate><guid>tag:emmatyping.dev,2017-07-26:/typycal-generate-type-stubs-from-runtime-type-information.html</guid><category>misc</category><category>typing</category><category>python</category></item></channel></rss>