Skip to content

feat: PHPStan extension inferring SQL return shapes#98

Draft
marcreichel wants to merge 5 commits into
4.0.xfrom
phpstan-sql-return-ext
Draft

feat: PHPStan extension inferring SQL return shapes#98
marcreichel wants to merge 5 commits into
4.0.xfrom
phpstan-sql-return-ext

Conversation

@marcreichel
Copy link
Copy Markdown
Member

@marcreichel marcreichel commented May 20, 2026

Summary

Adds a custom PHPStan extension that analyses the literal SQL string passed to the connection's query methods and narrows their return type to a constant array shape derived from the SELECT list. Also ships a custom rule that flags placeholder/parameter count mismatches.

The extension is shipped inside this package and auto-registered for downstream consumers via extra.phpstan.includes (picked up by phpstan/extension-installer). Consumers of artemeon/database get the improved types for free.

What's covered

  • Connection::getPArray()list<array{...}>
  • Connection::getPRow()array{}|array{...}
  • Connection::getGenerator()Generator<int, array{...}>
  • PlaceholderCountRule: errors when literal ? count ≠ literal $params array length on getPArray, getPRow, getGenerator, _pQuery

Design choices

  • Schema-agnostic: the extension only infers row shape (keys) from the SQL; all values are typed mixed. No DB connection or schema dump required.
  • Parser: phpmyadmin/sql-parser (added to require so analysers downstream don't need to add it).
  • Fallbacks: SELECT *array<string, mixed>. Non-literal SQL, INSERT/UPDATE/DELETE, UNION, CTEs, and unparseable SQL silently keep the declared return type.
  • Identifier handling: aliases preferred; otherwise the column name as written (backticks/double-quotes stripped, case preserved). JOIN column collisions: last occurrence wins.

Tests

  • tests/PHPStan/SqlReturnShapeAnalyserTest.php — Pest tests on the analyser (8 cases).
  • tests/PHPStan/PlaceholderCountRuleTest.php — PHPStan RuleTestCase.
  • tests/PHPStan/data/return-types.phpassertType() fixture verifying the inferred return types end-to-end (run via tests/PHPStan/phpstan-fixtures.neon).

Baseline

phpstan-baseline.neon was regenerated. The newly-surfaced entries are real findings where existing test code accesses keys on getPRow() results without first checking the empty-row case — left in the baseline for incremental cleanup.

🤖 Generated with Claude Code

marcreichel and others added 5 commits May 20, 2026 03:58
Adds a custom PHPStan extension that parses the literal SQL string passed
to `Connection::getPArray()`, `getPRow()`, and `getGenerator()` and narrows
the return type to a constant array shape derived from the SELECT list.
Also adds a `PlaceholderCountRule` that flags mismatches between literal
`?` placeholders and the literal `$params` array length.

The extension is shipped inside this package and auto-registered for
downstream consumers via `extra.phpstan.includes`, picked up by
phpstan-extension-installer. Schema-agnostic: column shapes are inferred,
all values typed as `mixed`. SELECT *, UNION, non-literal SQL, and
non-SELECT statements gracefully fall back to the declared return type.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Auto-applied Pint formatting (blank lines before continue/return,
@internal annotation on RuleTestCase, blank line between import groups).
Removed two stale Connection.php baseline entries that PHPStan on CI
doesn't report, which were failing the baseline-match check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously getPRow returned `array{k1?: mixed, k2?: mixed}` (each key
independently optional). After narrowing away the empty case with
`if ($row === []) return null;`, PHPStan left every key still optional,
making downstream calls that need all keys present (e.g. Token::fromRow)
fail typing.

Now returns `array{}|array{k1: mixed, k2: mixed, ...}` so the empty
check narrows cleanly to the all-required shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`SELECT DISTINCT(col) FROM t` is parsed as DISTINCT modifier + the
expression `(col)`, so the key fell back to the verbatim text
`(col)` instead of `col`. Strip matched outer parentheses (validated
to be balanced) and unquote the inner identifier before using it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a regression test verifying that GROUP BY queries with aggregates
(COUNT, MAX, MIN, SUM), with and without HAVING clauses, yield the
expected key list.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant