User Guide¶
Parsing¶
Parse a SQL query into a protobuf AST:
import postgast
tree = postgast.parse("SELECT id, name FROM users WHERE active = true")
# tree.stmts[0] contains the first (and only) statement
The returned ParseResult is a protobuf message. Each statement in
tree.stmts is a RawStmt wrapping a Node oneof.
Deparsing¶
Convert a parse tree back into SQL text:
sql = postgast.deparse(tree)
# "SELECT id, name FROM users WHERE active = true"
The deparsed SQL is canonicalized by libpg_query and may differ from the original in whitespace, casing, or parenthesization while remaining semantically equivalent.
Normalization¶
Replace literal constants with positional placeholders:
postgast.normalize("SELECT * FROM users WHERE id = 42 AND name = 'alice'")
# => "SELECT * FROM users WHERE id = $1 AND name = $2"
This is useful for grouping structurally equivalent queries.
Fingerprinting¶
Compute a structural hash that identifies equivalent queries regardless of literal values:
fp = postgast.fingerprint("SELECT * FROM users WHERE id = 42")
fp.fingerprint # uint64 hash
fp.hex # hex string representation
Splitting¶
Split a multi-statement SQL string into individual statements:
postgast.split("SELECT 1; SELECT 2;")
# => ["SELECT 1", "SELECT 2"]
Two methods are available via the method parameter:
"parser"(default) — uses the full PostgreSQL parser for accuracy"scanner"— faster, tolerates invalid SQL
postgast.split("SELECT 1; SELECT 2;", method="scanner")
Scanning¶
Tokenize a SQL string:
result = postgast.scan("SELECT 1")
for token in result.tokens:
print(token.token, token.keyword_kind, token.start, token.end)
Tree Walking¶
Walk all nodes in a parse tree with depth-first traversal:
for field_name, node in postgast.walk(tree):
print(type(node).__name__, field_name)
Or use the visitor pattern:
class TableCollector(postgast.Visitor):
def __init__(self):
self.tables = []
def visit_RangeVar(self, node):
self.tables.append(node.relname)
collector = TableCollector()
collector.visit(tree)
print(collector.tables)
AST Helpers¶
Extract common information from parse trees:
tree = postgast.parse("SELECT u.id, u.name FROM users u JOIN orders o ON u.id = o.user_id")
postgast.extract_tables(tree) # ["users", "orders"]
postgast.extract_columns(tree) # ["u.id", "u.name", "u.id", "o.user_id"]
postgast.extract_functions(tree) # []
Find specific node types:
from postgast.pg_query_pb2 import RangeVar
for node in postgast.find_nodes(tree, RangeVar):
print(node.relname, node.alias.aliasname if node.alias.aliasname else "")
DDL Helpers¶
Generate DROP statements from CREATE DDL:
postgast.to_drop("CREATE FUNCTION add(a int, b int) RETURNS int AS $$ SELECT a + b $$ LANGUAGE sql")
# => "DROP FUNCTION add(int, int)"
Rewrite CREATE to CREATE OR REPLACE:
postgast.ensure_or_replace("CREATE FUNCTION add(a int, b int) RETURNS int AS $$ SELECT a + b $$ LANGUAGE sql")
# => "CREATE OR REPLACE FUNCTION ..."
Error Handling¶
All functions raise PgQueryError on invalid SQL:
try:
postgast.parse("SELECT FROM")
except postgast.PgQueryError as e:
print(e.message) # Error message
print(e.cursorpos) # Position in the SQL string