Architecture¶
This page covers the main technical decisions behind postgast and why we
made them.
Why ctypes (not Cython, Rust, or C extensions)¶
postgast binds to libpg_query using Python’s built-in ctypes
module rather than Cython, PyO3/Rust, or a hand-written CPython C extension.
This was a deliberate choice. Here’s the reasoning.
Pure-Python packaging¶
With ctypes the only compiled artifact is the vendored libpg_query
shared library itself. Everything above it (struct definitions, function
signatures, error handling, protobuf deserialization) is plain Python. This
means:
No compiler toolchain at install time. Users never need Cython, a Rust toolchain, or a C compiler to install
postgast.pip install postgastdelivers a pre-built wheel containing the shared library and pure-Python code.Simpler CI matrix. Wheels are built by compiling a single C library (
libpg_query) per platform. There’s no second compilation step for a Python extension module, which removes an entire class of ABI-compatibility issues (limited API, stable ABI, per-interpreter builds, etc.).Easier debugging. Every line between the C boundary and the public API is inspectable Python. You can step through
native.pywith a normal debugger, no mixed C/Python stack frames to deal with.
Minimal dependency footprint¶
The only runtime dependency is protobuf. There’s no build-time dependency
on Cython or setuptools-rust, no transitive dependency on cffi, and
no compiled glue code. Fewer moving parts means fewer ways the install can
break.
BSD licensing¶
pglast, the most established libpg_query wrapper for Python, is
licensed under GPLv3. That makes it unusable in many commercial and
permissively-licensed projects. By keeping the binding layer to ctypes
(stdlib) plus protobuf (BSD-compatible), postgast can ship under the
BSD 2-Clause license with no copyleft obligations.
Trade-offs¶
ctypes isn’t free of downsides:
No compile-time type checking at the C boundary. If the
libpg_querystruct layout changes between versions, the ctypes bindings break silently at runtime rather than failing to compile. We mitigate this by pinning to a specificlibpg_queryversion and testing across platforms in CI.Per-call overhead. Each
ctypescall has slightly more overhead than a direct C extension call. In practice this is negligible because the real work happens insidelibpg_query(parsing a full PostgreSQL grammar) and thectypesmarshalling cost is dwarfed by the parser itself.Manual struct definitions. The ctypes
Structureclasses innative.pymust mirror the C structs exactly. It’s a small amount of code (~200 lines) maintained by hand. A Cython.pxdor Rustbindgenwould generate these, but at the cost of the toolchain complexity described above.
On balance the simplicity, portability, and licensing benefits outweigh the minor ergonomic costs.
How the binding layer works¶
All C interop lives in a single internal module, native.py. It does three
things:
Loads the shared library. Checks for a vendored copy bundled in the wheel first, then falls back to
ctypes.util.find_libraryfor system-installed libraries.Defines ctypes Structure classes that mirror every
libpg_queryresult type (PgQueryParseResult,PgQueryNormalizeResult, etc.).Declares function signatures (
argtypes/restype) for each public C function so that calls are type-checked at the Python level.
Higher-level modules (parse.py, deparse.py, normalize.py, …)
import native.lib and follow a consistent pattern:
# Pseudocode (actual code is in each module)
result = native.lib.pg_query_parse_protobuf(sql.encode())
try:
check_error(result) # raise PgQueryError if result.error is set
payload = extract(result) # read return value
finally:
native.lib.pg_query_free_protobuf_parse_result(result)
The finally block ensures the C-allocated memory is always freed, even
when an error is raised.
Protobuf deserialization¶
libpg_query returns parse trees as serialized Protocol Buffer messages.
postgast deserializes them using the official protobuf library into
generated Python message classes (pg_query_pb2). This avoids writing a
custom deserializer and tracks the upstream .proto schema exactly.
Binary payloads are read with ctypes.string_at(data, length) rather than
c_char_p because protobuf data can contain embedded null bytes that
c_char_p would silently truncate.
Alternatives considered¶
Cython¶
Cython would give compile-time type safety at the C boundary and marginally faster call overhead. But it requires a C compiler at wheel-build time and introduces a Cython build dependency. For a thin binding layer (~200 lines of struct definitions and function signatures), the added build complexity isn’t justified.
Rust (PyO3 / maturin)¶
A Rust extension via PyO3 would provide memory safety guarantees and strong
typing. But libpg_query is a C library, so the Rust layer would still
call C via FFI. Adding Rust introduces a second toolchain (cargo), a
maturin build backend, and complicates cross-compilation. The binding
layer is too thin to benefit from Rust’s strengths.
CFFI¶
CFFI is a popular alternative to ctypes that offers an ABI mode (similar
to ctypes) and an API mode (generates a C extension). ABI mode provides
no real advantage over ctypes for this use case, and API mode
reintroduces the C compiler requirement. Staying with ctypes avoids
adding cffi as a dependency.
Hand-written CPython C extension¶
A C extension would be the fastest option, but it ties the code to CPython internals, requires careful reference counting, and complicates building wheels for multiple Python versions. The performance difference is immaterial for this library’s workload.