Skip to content

feat(snowflake): add read-only Snowflake datasource gem#292

Merged
bexchauveto merged 22 commits into
mainfrom
feat/snowflake-datasource
May 7, 2026
Merged

feat(snowflake): add read-only Snowflake datasource gem#292
bexchauveto merged 22 commits into
mainfrom
feat/snowflake-datasource

Conversation

@bexchauveto
Copy link
Copy Markdown
Member

@bexchauveto bexchauveto commented Apr 28, 2026

Summary

Introduces forest_admin_datasource_snowflake — a native Forest Admin datasource backed by Snowflake via ODBC, with no ActiveRecord involvement.

Read-only first pass. Covers list/aggregate query translation, bulk schema introspection, connection pooling, retry-on-disconnect, foreign-key auto-discovery, primary-key resolution (incl. composite keys with a strict operator override), and optional per-session statement timeout. Wired into build.yml lint/test matrices with a unixodbc-dev apt step and into .releaserc.js for the release pipeline.

Highlights

Schema introspection

  • Each datasource instance is locked to a single Snowflake schema. Schema= from the connection string wins; otherwise the datasource snapshots CURRENT_SCHEMA() once at boot. If neither yields a value, boot raises with a clear "set Schema=" message instead of letting cross-schema collisions happen.
  • A single bulk INFORMATION_SCHEMA.COLUMNS query introspects every column in the schema in one round-trip. Combined with conn.tables, SHOW PRIMARY KEYS IN SCHEMA, and SHOW IMPORTED KEYS IN SCHEMA, total introspection cost is 4 queries regardless of table count (was ~2N+3). Observed agent boot dropped from ~63s to ~5s on a 7-table schema.
  • Primary key resolution: operator primary_keys: override → Snowflake-declared PK (composite preserved, ordered by key_sequence) → column literally named id → first column. Override accepts a String or Array<String> for composite keys; a typo'd column name raises ForestAdminDatasourceSnowflake::Error at boot rather than silently falling back.
  • FK auto-discovery via SHOW IMPORTED KEYS IN SCHEMA runs unconditionally; discover_relations exposes each FK as a ManyToOne relation. Cross-datasource FKs (Snowflake → Postgres) still need to be wired manually with add_many_to_one_relation / add_one_to_many_relation — Snowflake has no concept of an FK to another database.
  • Introspection failures (column / PK / FK) emit a [forest_admin_datasource_snowflake] prefixed warning to stderr and cache the empty result so the same broken query isn't re-issued on every collection lookup.

Type translation

  • Forest types are derived from the Snowflake-native DATA_TYPE returned by INFORMATION_SCHEMA.COLUMNS. The mapping covers the full canonical Snowflake set — NUMBER/DECIMAL/INT/BIGINT/SMALLINT/TINYINT/BYTEINT, FLOAT/FLOAT4/FLOAT8/DOUBLE/REAL, VARCHAR/CHAR/TEXT/STRING, BOOLEAN, DATE, TIME, DATETIME/TIMESTAMP/TIMESTAMP_NTZ/LTZ/TZ, VARIANT/OBJECT/ARRAY, BINARY/VARBINARY, GEOGRAPHY/GEOMETRY/VECTOR. Unknown types fall back to String.
  • ODBC::Date / ODBC::TimeStamp / ODBC::Time coerced to native Ruby Date/Time/string (otherwise they JSON-serialize as {}).
  • VARIANT / OBJECT / ARRAY columns are JSON-parsed at projection time so the UI receives structured data.
  • Session-level TIMEZONE='UTC' so all three TIMESTAMP variants serialize consistently.

SQL translation correctness

  • EQUAL / NOT_EQUAL with nil value emit IS NULL / IS NOT NULL (used to bind NULL into field = ? and silently match nothing under three-valued logic).
  • IN / NOT_IN with nil in the list split into (field IS NULL OR field IN (...)) and (field IS NOT NULL AND field NOT IN (...)) (used to silently drop or zero-out matches).
  • IN / NOT_IN with an empty list emit 1=0 / 1=1.
  • COUNT with a blank/missing field falls through to COUNT(*); SUM / AVG / MIN / MAX raise rather than emitting SUM().
  • Projection containing only relation paths (customer:name) defaults to SELECT * instead of SELECT FROM ....
  • Malformed connection-string options (e.g. Driver=Snowflake;SomeFlag;Server=X) raise a clear Malformed connection string option 'SomeFlag': expected 'key=value' instead of ArgumentError: wrong array length.

Authentication

  • The datasource doesn't interpret connection-string options. Anything the Snowflake ODBC driver accepts works pass-through, including the production-friendly key-pair / JWT flow (AUTHENTICATOR=SNOWFLAKE_JWT;PRIV_KEY_FILE=/path/to/key.p8), EXTERNALBROWSER SSO, and OAUTH.

Operational

  • connection_pool-backed pool, sized via pool_size: (default 5) and pool_timeout: (default 5s).
  • with_connection retries the block once on connection-lost ODBC errors and cycles the pool to drop stale handles. Retry patterns include Snowflake auth token expiry (390114).
  • Optional statement_timeout: (in seconds) issues ALTER SESSION SET STATEMENT_TIMEOUT_IN_SECONDS on each new connection.
  • All ODBC statements are closed via begin/ensure on every introspection and execution path.

Read-only enforcement

  • Every column is emitted with is_read_only: true, so Forest's schema emitter computes collection-level isReadOnly: true automatically and the UI hides create/edit/delete affordances.
  • create / update / delete raise ForestException with an explicit read-only message as a defence-in-depth guard.

Coverage

110 specs across:

  • Parser::Column — Snowflake-native type mapping, operator sets per Forest type
  • Utils::Identifier — case-preserving quoting
  • Utils::Query — condition tree translation, IN/LIKE/ILIKE, sort/page, aggregation, nil semantics
  • Datasource — introspection, pool, session settings, retry, FK discovery, schema resolution, primary-key resolution, malformed conn_str
  • Collection — schema introspection, list/aggregate, JSON parsing, write guards

rubocop is pinned (1.86.1, rubocop-performance 1.26.1, rubocop-rspec 3.9.0) in both Gemfile and Gemfile-test so contributors and CI run the same cops.

Caveats / follow-ups

  • No real-Snowflake integration test in CI — coverage is fully mocked at the ODBC boundary. A follow-up could add an env-gated smoke test (skipped unless SNOWFLAKE_* env vars are present).
  • Read-only by design for v1. Write support is a future direction; the read-only guards in Collection are explicit so a later subclass can override safely.
  • Cross-datasource FKs (e.g. Snowflake BILLING_USAGE.CUSTOMER_ID → Postgres customers.id) cannot be auto-discovered — Snowflake only knows about its own metadata.
  • One schema per datasource instance. To expose tables from multiple Snowflake schemas, instantiate one datasource per schema.

Test plan

  • cd packages/forest_admin_datasource_snowflake && BUNDLE_GEMFILE=Gemfile-test bundle install && BUNDLE_GEMFILE=Gemfile-test bundle exec rspec passes (110 examples, 0 failures)
  • bundle exec rubocop packages/forest_admin_datasource_snowflake passes (clean)
  • bundle exec rubocop repo-wide passes (still clean after .rubocop.yml exclusion additions)
  • Live Forest UI smoke test against a Snowflake account: list, count, filter, aggregate, FK auto-discovery on intra-Snowflake relations, type rendering for VARIANT/OBJECT/ARRAY/BINARY/TIMESTAMP variants, composite-PK record URLs

Note

Add read-only Snowflake datasource gem backed by ODBC

  • Introduces a new forest_admin_datasource_snowflake gem that connects to Snowflake via ODBC and exposes tables as read-only Forest Admin collections.
  • Datasource manages a connection pool, introspects INFORMATION_SCHEMA.COLUMNS and SHOW PRIMARY KEYS for schema discovery, and auto-discovers ManyToOne relations via SHOW IMPORTED KEYS.
  • Collection maps Snowflake types to Forest Admin field types, coerces ODBC temporal values to Ruby types, and parses VARIANT columns as JSON; create, update, and delete always raise a read-only error.
  • Utils::Query translates Forest Admin condition trees into parameterized SQL with support for COUNT/SUM/AVG/MIN/MAX aggregations, ORDER BY, and LIMIT/OFFSET.
  • Transient connection loss triggers a one-time automatic pool reset and retry; session is always initialized with TIMEZONE=UTC and optional schema/statement-timeout settings.
  • CI, release pipeline, and RuboCop config are updated to include the new package.

Changes since #292 opened

  • Added automatic schema resolution to ForestAdminDatasourceSnowflake::Datasource when Schema parameter is not provided in connection string [a305194]
  • Fixed COUNT aggregation handling for empty field values in ForestAdminDatasourceSnowflake::Utils::Query [a305194]

Macroscope summarized 1c4a27f.

@qltysh
Copy link
Copy Markdown

qltysh Bot commented Apr 28, 2026

9 new issues

Tool Category Rule Count
qlty Structure Function with high complexity (count = 5): discover_relations 6
qlty Structure Function with many parameters (count = 4): aggregate 3

execute_to_hashes(sql, binds, projection.to_a)
end

def aggregate(_caller, filter, aggregation, limit = nil)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function with many parameters (count = 4): aggregate [qlty:function-parameters]

source.add_field(relation_name, relation)
end
rescue ::ODBC::Error => e
warn "[forest_admin_datasource_snowflake] FK introspection skipped: #{e.message}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function with high complexity (count = 5): discover_relations [qlty:function-complexity]

result += equality + orderables if %w[Date Dateonly Time Number].include?(type)
result += equality + orderables + strings if type == 'String'

result
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function with high complexity (count = 7): operators_for_column_type [qlty:function-complexity]

ConditionTreeBranch = ForestAdminDatasourceToolkit::Components::Query::ConditionTree::Nodes::ConditionTreeBranch
ConditionTreeLeaf = ForestAdminDatasourceToolkit::Components::Query::ConditionTree::Nodes::ConditionTreeLeaf

def initialize(collection, projection: nil, filter: nil, aggregation: nil, limit: nil)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function with many parameters (count = 5): initialize [qlty:function-parameters]

sql << " OFFSET #{Integer(offset)}" if offset
end

[sql, @binds]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function with high complexity (count = 12): to_sql [qlty:function-complexity]

translate_leaf(node)
else
raise ForestAdminDatasourceSnowflake::Error, "Unsupported condition tree node: #{node.class}"
end
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function with high complexity (count = 8): build_where_clause [qlty:function-complexity]

@bexchauveto bexchauveto force-pushed the feat/snowflake-datasource branch from 1ab8be1 to 465613a Compare April 28, 2026 15:50
@bexchauveto bexchauveto force-pushed the feat/snowflake-datasource branch from 465613a to e6253b4 Compare April 28, 2026 15:54
"#{op}(#{q(field)})"
else
raise ForestAdminDatasourceSnowflake::Error, "Unsupported aggregation operation: #{op}"
end
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function with high complexity (count = 7): build_aggregation_expression [qlty:function-complexity]

@matthv
Copy link
Copy Markdown
Member

matthv commented Apr 29, 2026

The PR is missing changes to .releaserc.js. The new forest_admin_datasource_snowflake package needs to be wired into 3 sections so it gets published on the next release:

  • prepareCmd
  • successCmd
  • assets

Without this, the gem won't be published to RubyGems even after the PR is merged.


native_types = @datasource.fetch_snowflake_native_types(@table_name)
pk_name = (rows.find { |r| r[3].to_s.casecmp('id').zero? } || rows.first)&.dig(3)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the default order returned by conn.columns when no column is named “id”? Couldn't rows.first lead to unexpected behavior?

Copy link
Copy Markdown
Member Author

@bexchauveto bexchauveto May 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order is not guaranteed by the ODBC spec. The Snowflake ODBC driver happens to return columns in ORDINAL_POSITION order (the CREATE TABLE definition order), and INFORMATION_SCHEMA confirms that. So rows.first resolves to whatever the operator declared first in the DDL.

That works for tables where id is first by convention, but it's a real foot-gun: a table like (created_at TIMESTAMP, id NUMBER) would silently designate created_at as the PK. Composite primary keys also collapse to a single column with no warning.

The flow now to get the PK is this commit:

  1. Operator override — primary_keys: { 'TABLE_NAME' => 'COLUMN_NAME' } constructor option, case-insensitive table lookup
  2. Snowflake-declared PK — SHOW PRIMARY KEYS IN SCHEMA (lowest key_sequence wins for composite PKs)
  3. id column (case-insensitive)
  4. First column (last resort)

end

def apply_session_settings(conn)
run_session_statement(conn, "ALTER SESSION SET TIMEZONE = 'UTC'")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be mistaken (I've only recently started working with Snowflake), but I've noticed a synchronization issue around schema configuration.
In initialize, the schema can be defined via the Ruby schema: option. However, Snowflake also allows users to pass it directly within the connection string (conn_str) using SCHEMA=....

Currently, these two methods are out of sync:

  • visible_tables relies on @schema_override.
  • fetch_snowflake_native_types and Parser::Relation.discover both rely on the session's CURRENT_SCHEMA().

The issue: If a user only sets the Ruby schema: option, the session remains unaware of it. As a result:

  • JSON columns are incorrectly rendered as String (the Snowflake ODBC driver returns VARIANT/OBJECT/ARRAY as SQL_VARCHAR unless the native type lookup corrects them — but that lookup itself queries the wrong schema).
  • Foreign keys are not discovered.
  • No error is raised, which makes this hard to debug.

Proposed fix: Adding a USE SCHEMA in apply_session_settings (when @schema_override is set)

attr_reader :pool

def initialize(conn_str:,
pool_size: DEFAULT_POOL_SIZE, pool_timeout: DEFAULT_POOL_TIMEOUT,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function with many parameters (count = 5): initialize [qlty:function-parameters]

Comment on lines +100 to +101
rows.group_by { |r| r[3].to_s.upcase }
.transform_values { |table_rows| table_rows.min_by { |r| r[5].to_i }[4].to_s }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Composite PKs (PRIMARY KEY (a, b)) get truncated to their first column here

Comment on lines +86 to +89
if declared_pk
match = column_names.find { |name| name.to_s.casecmp(declared_pk.to_s).zero? }
return match if match
end
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user mistypes the column name in primary_keys:, we just silently fall back to id/first column. They'd have no idea their override was ignored. IMO this should raise: it's a config typo, not a discovery thing where warn makes sense.

Proposed fix : raise ForestAdminDatasourceSnowflake::Error, "primary_keys override '#{declared_pk}' does not match any column on table '#{@table_name}' (available: #{column_names.join(', ')})"

@bexchauveto bexchauveto force-pushed the feat/snowflake-datasource branch from 1b30a64 to f1eca3e Compare May 5, 2026 16:03
@bexchauveto bexchauveto force-pushed the feat/snowflake-datasource branch from b7e99ef to fd8808e Compare May 6, 2026 08:11
in_term = "#{quoted_field} #{negate ? "NOT IN" : "IN"} (#{placeholders})"
return in_term if non_nils.size == list.size

"(#{null_term}#{negate ? " AND " : " OR "}#{in_term})"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function with high complexity (count = 8): translate_in [qlty:function-complexity]

end
end
rescue ::ODBC::Error
nil
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rescues silently to nil.
If column introspection fails, collections end up with zero fields and Forest crashes later, IMO making the actual cause hard to trace.
Could we mirror what discover_relations does on line 205?

.transform_values { |table_rows| table_rows.sort_by { |r| r[5].to_i }.map { |r| r[4].to_s } }
end
rescue ::ODBC::Error
nil
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same silent rescue, same fix as above (fetch_snowflake_columns).

bexchauveto added 10 commits May 6, 2026 16:15
Introduces forest_admin_datasource_snowflake — a native Forest Admin
datasource backed by Snowflake via ODBC, with no ActiveRecord involvement.
Read-only first pass: list/aggregate query translation, schema introspection,
connection pooling, retry-on-disconnect, optional FK auto-discovery, and
optional per-session statement timeout.

Type translation:
- ODBC SQL types → Forest types via Parser::Column
- Snowflake-native types (VARIANT/OBJECT/ARRAY/BINARY/VARBINARY) detected
  via INFORMATION_SCHEMA.COLUMNS and override the ODBC mapping
- ODBC::Date / ODBC::TimeStamp / ODBC::Time coerced to native Ruby
  Date/Time so JSON serialization works (otherwise they emit `{}`)
- JSON-typed columns parsed into Ruby objects
- Session-level TIMEZONE='UTC' so TIMESTAMP_NTZ/LTZ/TZ all serialize
  consistently as UTC

CI integration: added the package to lint/test matrices in build.yml
with a unixODBC apt step. 77 specs cover the introspection, query
translator, identifier quoting, type mapping, and connection retry paths.
Previously when Aggregation.field was nil, build_aggregation_expression
emitted invalid SQL like SUM() because Identifier.quote(nil) returns an
empty string. Raise an explicit ForestAdminDatasourceSnowflake::Error so
the failure surfaces at translation time instead of as a Snowflake
syntax error at execution. COUNT keeps its existing nil-as-* fallback.
visible_tables and Collection#fetch_fields called stmt.drop after
stmt.fetch_all without an ensure block, so a fetch failure would leak the
statement handle. Wrap the fetch in begin/ensure to match the pattern
already used by fetch_snowflake_native_types and run_session_statement.
…ipeline

Adds the new package to the three .releaserc.js sections so semantic-release
will: bump its VERSION constant on release (prepareCmd), build and push the
gem to RubyGems (successCmd), and commit the version bump back to the repo
(@semantic-release/git assets). Without this the gem stays at 0.0.x and
never reaches RubyGems.
Snowflake's session token has a finite TTL; once it expires the driver
returns "08001 (390114) Authentication token has expired." This message
didn't match any of CONNECTION_LOST_PATTERNS, so the gem propagated the
error to Forest instead of cycling the pool. Cycling forces a fresh
drvconnect, which re-authenticates with the credentials in conn_str.

Adds matchers for the exact Snowflake string and a generic "token
expired" fallback so similar phrasings from other drivers also recover.
… override

Snowflake doesn't expose primary key information through ODBC's standard
column metadata, so the previous fallback ("id" column or first column)
was unreliable for tables that didn't fit the convention — a table like
(created_at TIMESTAMP, id NUMBER) silently designated created_at as PK,
and composite primary keys collapsed to the first declared column.

Datasource#primary_key_for now resolves in this order:
1. Operator-supplied primary_keys: { 'TABLE' => 'COLUMN' } override
   (case-insensitive table lookup)
2. Snowflake-declared primary key from SHOW PRIMARY KEYS IN SCHEMA
   (lowest key_sequence wins for composite PKs)
3. Case-insensitive 'id' column
4. First column (last resort)

The SHOW PRIMARY KEYS query result is cached for the lifetime of the
datasource so we only pay one round-trip even with many tables. Permission
errors on the query are silently swallowed, matching the FK introspection
pattern.
The Ruby `schema:` constructor option only filtered the local table list
through @schema_override. Introspection queries that depend on the session
schema were unaffected:

- fetch_snowflake_native_types: WHERE TABLE_SCHEMA = CURRENT_SCHEMA()
- Parser::Relation.discover:    SHOW IMPORTED KEYS IN SCHEMA
- fetch_snowflake_primary_keys: SHOW PRIMARY KEYS IN SCHEMA

When the operator only set the Ruby option (not Schema= in conn_str), the
session stayed on whatever the connection string defaulted to. The three
queries above ran against the wrong schema and silently returned empty
results: VARIANT/OBJECT/ARRAY columns rendered as String, FK auto-discovery
found nothing, primary keys fell back to the heuristic — all without a
single error.

apply_session_settings now runs USE SCHEMA "<override>" when the Ruby
option is set, keeping the session and the local filter in sync.
…ary_keys

Three knobs were removed because each was either redundant with the
connection string or made the datasource diverge from the agent layer:

- schema:               sourced from Schema= in conn_str only (case-insensitive).
                        When set, USE SCHEMA "<schema>" runs on every new
                        connection so session, table-list filter, and
                        introspection (CURRENT_SCHEMA / SHOW ... IN SCHEMA)
                        all target the same object. Two ways to set the same
                        thing was a sync hazard.

- introspect_relations: gone. FK auto-discovery via SHOW IMPORTED KEYS IN
                        SCHEMA is now unconditional; permission failures are
                        still swallowed silently. Snowflake-defined FKs are
                        free metadata — picking them up should not need an
                        opt-in.

- tables:               gone. Collection filtering belongs at the agent level
                        via add_datasource(ds, include: [...]) or exclude:.
                        That keeps the filtering pattern consistent with every
                        other Forest data source. The data source itself now
                        exposes every readable user-schema table.

Remaining surface: conn_str, pool_size, pool_timeout, statement_timeout,
primary_keys.
- Collection#@primary_keys: initialised and appended to but never read.
  Forest finds primary keys via Schema.primary_keys(collection), which
  iterates fields with is_primary_key: true — the local array was a
  redundant copy.

- Datasource#@conn_str: stored but never accessed after initialize. The
  connection string is captured in the ConnectionPool block closure
  through the local parameter, so the ivar served no purpose.

- Zeitwerk inflector rule mapping odbc -> ODBC: no lib/.../odbc.rb file
  exists in the gem for it to apply to. Leftover scaffolding.
`primary_key_for` returned a single column name and silently dropped the
rest when a Snowflake table declared `PRIMARY KEY (a, b)`. Forest
supports composite PKs natively (each ColumnSchema carries its own
`is_primary_key` flag, and `Schema.primary_keys(collection)` reconstructs
the list by iterating fields), so flattening to one column lost
information.

Renamed to `primary_keys_for(table_name)` returning Array<String>:
- the operator-supplied override accepts either a String or Array
  (`primary_keys: { 'orders' => %w[order_id product_id] }`) and is
  wrapped via Array() for the single-column case;
- Snowflake-declared PKs are read via `SHOW PRIMARY KEYS IN SCHEMA`
  and ordered by `key_sequence` to match the DDL declaration order;
- the legacy 'id' / first-column fallback returns `[fallback]` so the
  return type is consistent.

`Collection#fetch_fields` now sets `is_primary_key` per-field via
`pk_names.include?(column_name)`. Two new collection-level specs cover
composite PKs from both Snowflake declarations and operator overrides.
A typo in the operator-supplied primary_keys override used to silently
fall back to the 'id' / first-column heuristic, leaving the operator
unaware that their override was ignored. Surface the misconfiguration
explicitly so it can be fixed at boot.
Snowflake metadata calls each take 1-5s on the warehouse, and the prior
implementation issued two of them per table (`conn.columns()` plus an
`INFORMATION_SCHEMA.COLUMNS` filter), serialized on a single pool slot.
On a 7-table schema this took ~63s of agent boot.

Replace the per-table queries with a single memoized
`INFORMATION_SCHEMA.COLUMNS` query that returns every column for the
whole schema (filtered by `TABLE_SCHEMA = ?` when `Schema=` is set).
Each Collection now reads its slice from the pre-fetched result.
Total introspection round-trips drop from ~2N+3 to 4 regardless of
table count; observed agent boot drops from ~63s to ~5s.

Source the Forest type from the Snowflake-native DATA_TYPE only, so
the ODBC type-code fallback (and the `conn.columns()` round-trip that
feeds it) becomes unnecessary. Extend the native-type table to cover
every canonical Snowflake type.
`fetch_snowflake_columns` and `fetch_snowflake_primary_keys` rescued
`ODBC::Error` and returned `{}`, which `||=` then cached as a truthy
value — so a transient failure during boot (warehouse momentarily
unavailable, expired auth token after with_connection's single retry,
brief permission glitch) locked the datasource into empty metadata for
its entire lifetime. Every collection ended up with no columns and no
primary keys until the agent process restarted.

Return `nil` on error so the `||=` memoization only caches successful
results. Public accessors still surface `[]` to callers, so the
external contract is unchanged on permanent failures (permission
denied, etc.) — they just re-attempt at the cost of one extra
metadata query each.
A leaf condition `field EQUAL nil` previously emitted `field = ?` with
NULL bound to the parameter. Under SQL three-valued logic, `field =
NULL` evaluates to UNKNOWN — never TRUE — so the filter silently
returned zero rows instead of the rows where the field is actually
NULL. NOT_EQUAL had the symmetric bug.

Route both operators through a `translate_equality` helper that emits
`field IS NULL` / `field IS NOT NULL` when the value is nil and binds
nothing; non-nil values keep producing parameterised `=` / `<>` as
before.
…epeat queries

The previous fix returned nil from the fetch methods on ODBC::Error and
relied on \`||=\` to skip memoizing nil — which meant every subsequent
\`primary_keys_for\` / \`snowflake_columns_for\` call re-issued the failing
metadata query against Snowflake, burning compute on errors that are
overwhelmingly permanent (permission denied, missing role on the schema,
etc.).

Switch the memoizer to \`defined?(@snowflake_*)\` so the ivar is read once
and cached afterwards, regardless of whether the fetch succeeded (Hash)
or errored (nil). The public accessors keep surfacing \`[]\` on nil so the
external contract is unchanged.

\`with_connection\` already retries once on connection-lost ODBC patterns,
so anything reaching this rescue is the kind of permanent failure where
caching the negative result is correct. A genuinely transient error that
slips past with_connection's retry would now require an agent restart to
recover — accepted trade-off vs. uncapped warehouse spend.
…umns

\`@projection.columns\` already drops fields containing \`:\` (relation
paths), so a projection that names only relations — e.g.
\`Projection.new(%w[customer:name product:title])\` — yielded an empty
\`cols\` array and produced \`SELECT  FROM "TABLE"\` which Snowflake
rejects at parse time.

Treat empty-after-filter the same as a missing projection and fall
through to \`SELECT *\`. The previous nil guard only handled the case
where \`@projection\` itself was nil.
…h the user's intent

\`IN (NULL, ...)\` and \`NOT IN (NULL, ...)\` both rely on \`field = NULL\`
or \`field <> NULL\`, which evaluate to UNKNOWN under SQL three-valued
logic. The previous translator silently bound \`nil\` as a placeholder,
so:

- \`Operators::IN [1, 2, nil]\` excluded rows where the field was actually
  NULL (the user listed nil expecting them to match).
- \`Operators::NOT_IN [1, 2, nil]\` returned zero rows altogether — the
  trailing \`field <> NULL\` poisons the whole AND chain to UNKNOWN.

Strip nils before binding and add an \`IS [NOT] NULL\` term joined with
\`OR\` (IN) or \`AND\` (NOT_IN). When the list is only nils, collapse to a
bare \`IS [NOT] NULL\`. Lists without nils keep their existing parameterised
\`IN (?, ?, ...)\` form.
…sions

Lock the lint toolchain to rubocop 1.86.1, rubocop-performance 1.26.1,
and rubocop-rspec 3.9.0 in both Gemfile (local dev) and Gemfile-test
(CI) so contributors and the build matrix run identical cops.
…wing them

\`fetch_snowflake_columns\` and \`fetch_snowflake_primary_keys\` rescued
\`ODBC::Error\` and returned nil with no breadcrumb — when the introspection
silently failed (permission denied, expired auth token after retry, etc.),
collections came up with zero fields and Forest crashed downstream with
no signal pointing at the actual cause.

Mirror the \`discover_relations\` pattern: log a \`[forest_admin_datasource_snowflake]\`
prefixed warning to stderr with the ODBC message before returning nil.
The cached-failure semantics are unchanged; the operator just gets a
traceable signal of why the schema is empty.
@bexchauveto bexchauveto force-pushed the feat/snowflake-datasource branch from ce3f555 to 1c4a27f Compare May 6, 2026 14:15
\`build_aggregation_expression\`'s COUNT branch checked truthiness of
\`aggregation.field\`, which let an empty string slip through and emit
\`COUNT("")\` — invalid SQL Snowflake rejects at parse time. SUM/AVG/MIN/MAX
were already strict via \`field.nil? || field.to_s.empty?\`.

Hoist the blank-field check to a single \`blank_field\` variable so all five
operators share the same definition: COUNT collapses to \`COUNT(*)\` (the
natural meaning of "count without specifying a field"); SUM/AVG/MIN/MAX
keep raising.
…SCHEMA() snapshot

Without \`Schema=\` in the connection string, \`visible_tables\` returned
tables from every readable user schema and the bulk INFORMATION_SCHEMA
query keyed columns by table name only — so two same-named tables in
different schemas would silently collide, with one's metadata
overwriting the other's. Forest collection names are unqualified, so
there's no good way to surface "the same table from two schemas" as
two distinct collections anyway.

Lock each datasource instance to a single schema:

- if \`Schema=\` is in the connection string, use it (existing behavior).
- if not, snapshot \`SELECT CURRENT_SCHEMA()\` once at boot and use that
  resolved value as the active schema for the rest of the instance's
  lifetime. \`USE SCHEMA "<resolved>"\` then runs on every subsequent
  pool connection so all introspection paths stay aligned.
- if \`CURRENT_SCHEMA()\` returns nil (the role has no default), raise
  \`ForestAdminDatasourceSnowflake::Error\` at boot with a clear
  "set Schema=<name>" hint instead of letting collisions happen.

Operators wanting tables from multiple schemas instantiate one
datasource per schema — explicit beats implicit.
…tions

\`parse_conn_str\` used \`Hash#to_h { |o| o.split('=', 2) }\`, which crashed
with a cryptic \`ArgumentError: wrong array length\` when an option had
no \`=\` (e.g. \`"Driver=Snowflake;SomeFlag;Server=X"\` — \`"SomeFlag".split('=', 2)\`
returns a 1-element array). Detect the bad pair and raise
\`ForestAdminDatasourceSnowflake::Error\` naming the offending option.
@bexchauveto bexchauveto merged commit bc79f94 into main May 7, 2026
44 checks passed
@bexchauveto bexchauveto deleted the feat/snowflake-datasource branch May 7, 2026 08:23
forest-bot added a commit that referenced this pull request May 7, 2026
# [1.29.0](v1.28.2...v1.29.0) (2026-05-07)

### Features

* **snowflake:** add read-only Snowflake datasource gem ([#292](#292)) ([bc79f94](bc79f94))
@forest-bot
Copy link
Copy Markdown
Member

🎉 This PR is included in version 1.29.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants