Reporting a behavior change introduced in 4.3.2 by commit eea522eb ("Fix not to abort session while in failed transaction") that traps standard ORM/driver connection-liveness probes and breaks application-level savepoint recovery. I'd appreciate guidance on whether this is intended, a documented limitation, or something that could be relaxed.
Environment
- pgpool-II 4.7.x (also reproduces on 4.4.3; not present on 4.2.8)
- Backend: PostgreSQL 18 (also reproduces against 14, 15, 16, 17)
- Clustering mode:
streaming_replication
- Backend driver: libpq via the Ruby
pg gem 1.5.x (same class of issue affects any client that issues a no-op connection-liveness probe)
- Application: Rails 8.0.5 ActiveRecord. AR's
PostgreSQLAdapter#active? issues query(";") as its liveness check.
Symptom
Inside an outer transaction, when an INSERT raises a unique-constraint violation inside a SAVEPOINT, AR's transaction manager runs connection.active? before attempting ROLLBACK TO SAVEPOINT. active? sends ";" to pgpool. On 4.2.8 pgpool forwards ";", PG returns EmptyQueryResponse (no error, even in aborted-txn state), active? returns true, AR proceeds to issue ROLLBACK TO SAVEPOINT, and everything works.
On 4.3.2+ pgpool's check_transaction_state_and_abort() (src/protocol/pool_proto_modules.c:413) sees the backend in 'E' state and the parsed Query is not a COMMIT/ROLLBACK, so pgpool synthesizes:
ERROR: 25P02: current transaction is aborted, commands ignored until end of transaction block
active? interprets the PG::Error as "connection dead" and returns false. AR's SavepointTransaction#rollback is gated on active? and skips emitting ROLLBACK TO SAVEPOINT. The next real query (a follow-up SELECT in the outer transaction) then receives a real 25P02 from PG because the savepoint was never rolled back. From the application's POV, a routine "insert-or-find" pattern raises PG::InFailedSqlTransaction instead of recovering, and the entire request fails.
Why I think this is at least worth discussing as a bug
PostgreSQL's wire protocol explicitly tolerates empty/whitespace-only queries in aborted transactions — EmptyQueryResponse, no error, even while 'E'. pgpool's synthesis path elevates this to a hard 25P02, which is a user-visible divergence from the upstream protocol. Any client driver using a ;-only or empty-string ping for liveness — and there are several common ones — sees the connection as broken while it's actually fine. The original commit message frames this as a session-abort fix, which it is for the relcache case, but the implementation generalized further than the bug it was fixing required.
Minimal repro
A bare libpq client demonstrates the divergence without any ORM in the picture:
# Against pgpool 4.2.8 — no error on the ";" probe, ROLLBACK TO SAVEPOINT works
# Against pgpool 4.3.2+ — ";" returns 25P02, would break a real ORM mid-recovery
PGPASSWORD=postgres psql -h <pgpool-host> -p <pgpool-port> -U postgres -d postgres <<'SQL'
\set VERBOSITY verbose
CREATE TEMP TABLE t (id int UNIQUE);
INSERT INTO t VALUES (1);
BEGIN;
SAVEPOINT sp1;
INSERT INTO t VALUES (1); -- 23505 from PG
; -- empty/no-op probe (what driver active? sends)
ROLLBACK TO SAVEPOINT sp1;
SELECT 'recovered' AS status;
ROLLBACK;
SQL
On 4.2.8: the ; line is silently absorbed, ROLLBACK TO SAVEPOINT sp1 succeeds, the final SELECT returns recovered.
On 4.3.2+: the ; line gets ERROR: 25P02 synthesized by pgpool. ROLLBACK TO SAVEPOINT then works (since b7b5bae8 added it to the exempt list in 4.3.4), but a driver that branched on the ; probe's error has already given up by then.
Possible fixes (in increasing order of intrusiveness)
- Forward empty/whitespace-only/comment-only queries even in
'E' state. Match PG's EmptyQueryResponse semantics. This is the smallest behavior change and aligns pgpool with PG's wire protocol contract. The relcache-lookup case the original commit fixed isn't triggered by empty queries.
- Add an exempt-list entry for "parsed to empty statement list" alongside COMMIT/ROLLBACK/ROLLBACK TO. Same effect as (1), implemented near
is_commit_or_rollback_query.
- Introduce a
forward_empty_query_in_failed_transaction = on/off GUC (or a broader aborted_transaction_filter = strict|lenient), defaulting to current behavior for backward compatibility. Lets sites opt into PG-faithful behavior when they have drivers that rely on it.
Option 1 strikes me as the most defensible — it restores PG protocol fidelity for a specific message shape — but I'd defer to your judgment on which is appropriate.
What I'm working around in the meantime
For now we've changed the application call site to use save! + outer rescue so that AR's transaction-unwind path (which doesn't call active?) issues the ROLLBACK TO SAVEPOINT directly. That works, but it's a workaround in every application code path that uses an ORM with create-style fallback logic, against every model with a unique constraint, in any environment running pgpool 4.3.2+. That's a wide blast radius for what is, from the application's POV, a behavior change in the proxy.
Reporting a behavior change introduced in 4.3.2 by commit
eea522eb("Fix not to abort session while in failed transaction") that traps standard ORM/driver connection-liveness probes and breaks application-level savepoint recovery. I'd appreciate guidance on whether this is intended, a documented limitation, or something that could be relaxed.Environment
streaming_replicationpggem 1.5.x (same class of issue affects any client that issues a no-op connection-liveness probe)PostgreSQLAdapter#active?issuesquery(";")as its liveness check.Symptom
Inside an outer transaction, when an INSERT raises a unique-constraint violation inside a SAVEPOINT, AR's transaction manager runs
connection.active?before attemptingROLLBACK TO SAVEPOINT.active?sends";"to pgpool. On 4.2.8 pgpool forwards";", PG returnsEmptyQueryResponse(no error, even in aborted-txn state),active?returnstrue, AR proceeds to issueROLLBACK TO SAVEPOINT, and everything works.On 4.3.2+ pgpool's
check_transaction_state_and_abort()(src/protocol/pool_proto_modules.c:413) sees the backend in'E'state and the parsed Query is not a COMMIT/ROLLBACK, so pgpool synthesizes:active?interprets thePG::Erroras "connection dead" and returnsfalse. AR'sSavepointTransaction#rollbackis gated onactive?and skips emittingROLLBACK TO SAVEPOINT. The next real query (a follow-up SELECT in the outer transaction) then receives a real25P02from PG because the savepoint was never rolled back. From the application's POV, a routine "insert-or-find" pattern raisesPG::InFailedSqlTransactioninstead of recovering, and the entire request fails.Why I think this is at least worth discussing as a bug
PostgreSQL's wire protocol explicitly tolerates empty/whitespace-only queries in aborted transactions —
EmptyQueryResponse, no error, even while'E'. pgpool's synthesis path elevates this to a hard 25P02, which is a user-visible divergence from the upstream protocol. Any client driver using a;-only or empty-string ping for liveness — and there are several common ones — sees the connection as broken while it's actually fine. The original commit message frames this as a session-abort fix, which it is for the relcache case, but the implementation generalized further than the bug it was fixing required.Minimal repro
A bare libpq client demonstrates the divergence without any ORM in the picture:
On 4.2.8: the
;line is silently absorbed,ROLLBACK TO SAVEPOINT sp1succeeds, the final SELECT returnsrecovered.On 4.3.2+: the
;line getsERROR: 25P02synthesized by pgpool.ROLLBACK TO SAVEPOINTthen works (sinceb7b5bae8added it to the exempt list in 4.3.4), but a driver that branched on the;probe's error has already given up by then.Possible fixes (in increasing order of intrusiveness)
'E'state. Match PG'sEmptyQueryResponsesemantics. This is the smallest behavior change and aligns pgpool with PG's wire protocol contract. The relcache-lookup case the original commit fixed isn't triggered by empty queries.is_commit_or_rollback_query.forward_empty_query_in_failed_transaction = on/offGUC (or a broaderaborted_transaction_filter = strict|lenient), defaulting to current behavior for backward compatibility. Lets sites opt into PG-faithful behavior when they have drivers that rely on it.Option 1 strikes me as the most defensible — it restores PG protocol fidelity for a specific message shape — but I'd defer to your judgment on which is appropriate.
What I'm working around in the meantime
For now we've changed the application call site to use
save!+ outer rescue so that AR's transaction-unwind path (which doesn't callactive?) issues theROLLBACK TO SAVEPOINTdirectly. That works, but it's a workaround in every application code path that uses an ORM withcreate-style fallback logic, against every model with a unique constraint, in any environment running pgpool 4.3.2+. That's a wide blast radius for what is, from the application's POV, a behavior change in the proxy.