Skip to content

pgpool ≥ 4.3.2 synthesizes 25P02 on driver health-check probes inside aborted savepoints, breaking ORMs that need to call ROLLBACK TO SAVEPOINT #161

@raivil

Description

@raivil

Reporting a behavior change introduced in 4.3.2 by commit eea522eb ("Fix not to abort session while in failed transaction") that traps standard ORM/driver connection-liveness probes and breaks application-level savepoint recovery. I'd appreciate guidance on whether this is intended, a documented limitation, or something that could be relaxed.

Environment

  • pgpool-II 4.7.x (also reproduces on 4.4.3; not present on 4.2.8)
  • Backend: PostgreSQL 18 (also reproduces against 14, 15, 16, 17)
  • Clustering mode: streaming_replication
  • Backend driver: libpq via the Ruby pg gem 1.5.x (same class of issue affects any client that issues a no-op connection-liveness probe)
  • Application: Rails 8.0.5 ActiveRecord. AR's PostgreSQLAdapter#active? issues query(";") as its liveness check.

Symptom

Inside an outer transaction, when an INSERT raises a unique-constraint violation inside a SAVEPOINT, AR's transaction manager runs connection.active? before attempting ROLLBACK TO SAVEPOINT. active? sends ";" to pgpool. On 4.2.8 pgpool forwards ";", PG returns EmptyQueryResponse (no error, even in aborted-txn state), active? returns true, AR proceeds to issue ROLLBACK TO SAVEPOINT, and everything works.

On 4.3.2+ pgpool's check_transaction_state_and_abort() (src/protocol/pool_proto_modules.c:413) sees the backend in 'E' state and the parsed Query is not a COMMIT/ROLLBACK, so pgpool synthesizes:

ERROR:  25P02: current transaction is aborted, commands ignored until end of transaction block

active? interprets the PG::Error as "connection dead" and returns false. AR's SavepointTransaction#rollback is gated on active? and skips emitting ROLLBACK TO SAVEPOINT. The next real query (a follow-up SELECT in the outer transaction) then receives a real 25P02 from PG because the savepoint was never rolled back. From the application's POV, a routine "insert-or-find" pattern raises PG::InFailedSqlTransaction instead of recovering, and the entire request fails.

Why I think this is at least worth discussing as a bug

PostgreSQL's wire protocol explicitly tolerates empty/whitespace-only queries in aborted transactionsEmptyQueryResponse, no error, even while 'E'. pgpool's synthesis path elevates this to a hard 25P02, which is a user-visible divergence from the upstream protocol. Any client driver using a ;-only or empty-string ping for liveness — and there are several common ones — sees the connection as broken while it's actually fine. The original commit message frames this as a session-abort fix, which it is for the relcache case, but the implementation generalized further than the bug it was fixing required.

Minimal repro

A bare libpq client demonstrates the divergence without any ORM in the picture:

# Against pgpool 4.2.8 — no error on the ";" probe, ROLLBACK TO SAVEPOINT works
# Against pgpool 4.3.2+ — ";" returns 25P02, would break a real ORM mid-recovery
PGPASSWORD=postgres psql -h <pgpool-host> -p <pgpool-port> -U postgres -d postgres <<'SQL'
\set VERBOSITY verbose
CREATE TEMP TABLE t (id int UNIQUE);
INSERT INTO t VALUES (1);
BEGIN;
SAVEPOINT sp1;
INSERT INTO t VALUES (1);   -- 23505 from PG
;                            -- empty/no-op probe (what driver active? sends)
ROLLBACK TO SAVEPOINT sp1;
SELECT 'recovered' AS status;
ROLLBACK;
SQL

On 4.2.8: the ; line is silently absorbed, ROLLBACK TO SAVEPOINT sp1 succeeds, the final SELECT returns recovered.

On 4.3.2+: the ; line gets ERROR: 25P02 synthesized by pgpool. ROLLBACK TO SAVEPOINT then works (since b7b5bae8 added it to the exempt list in 4.3.4), but a driver that branched on the ; probe's error has already given up by then.

Possible fixes (in increasing order of intrusiveness)

  1. Forward empty/whitespace-only/comment-only queries even in 'E' state. Match PG's EmptyQueryResponse semantics. This is the smallest behavior change and aligns pgpool with PG's wire protocol contract. The relcache-lookup case the original commit fixed isn't triggered by empty queries.
  2. Add an exempt-list entry for "parsed to empty statement list" alongside COMMIT/ROLLBACK/ROLLBACK TO. Same effect as (1), implemented near is_commit_or_rollback_query.
  3. Introduce a forward_empty_query_in_failed_transaction = on/off GUC (or a broader aborted_transaction_filter = strict|lenient), defaulting to current behavior for backward compatibility. Lets sites opt into PG-faithful behavior when they have drivers that rely on it.

Option 1 strikes me as the most defensible — it restores PG protocol fidelity for a specific message shape — but I'd defer to your judgment on which is appropriate.

What I'm working around in the meantime

For now we've changed the application call site to use save! + outer rescue so that AR's transaction-unwind path (which doesn't call active?) issues the ROLLBACK TO SAVEPOINT directly. That works, but it's a workaround in every application code path that uses an ORM with create-style fallback logic, against every model with a unique constraint, in any environment running pgpool 4.3.2+. That's a wide blast radius for what is, from the application's POV, a behavior change in the proxy.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions