docs: bring README up to date with v0.6.0 / v0.6.1 merges#783
docs: bring README up to date with v0.6.0 / v0.6.1 merges#783anandgupta42 wants to merge 1 commit intomainfrom
Conversation
The README's changelog stopped at v0.5.11 and the warehouse / provider lists were missing additions shipped in the v0.5.14–v0.6.1 range. This catches them up: - Changelog: added v0.6.1, v0.6.0, v0.5.21, v0.5.20, v0.5.18, v0.5.16, v0.5.14; trimmed v0.5.5 / v0.5.3 / v0.5.1 to keep the section length steady. - Key Features: new "Cross-Warehouse Data Parity" entry for the `data_diff` tool and `/data-parity` skill shipped in v0.6.0 (#493, #705). - Supported Warehouses: added Microsoft Fabric (v0.6.0, via T-SQL dialect + `tedious` / Entra ID auth in `data_diff` MSSQL/Fabric path). - Works with Any LLM: added Databricks AI Gateway (v0.6.0, #649), Snowflake Cortex (v0.5.6), and LM Studio (v0.5.7) — the last two were also missing. No code changes; README only.
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
📝 WalkthroughWalkthroughREADME documentation is updated with a new cross-warehouse data parity feature section, extended warehouse support (Microsoft Fabric, Oracle), expanded LLM provider integrations (Databricks AI Gateway, Snowflake Cortex, others), and refreshed changelog entries for recent releases. ChangesDocumentation Updates
Estimated Code Review Effort🎯 1 (Trivial) | ⏱️ ~3 minutes
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
README.md (1)
126-126: ⚡ Quick winClarify the “12 warehouses” scope vs overall supported warehouse list.
At Line 126, data parity is described as “across 12 warehouses,” while Line 157 lists 13 supported warehouses overall. A short qualifier like “12 SQL warehouses for parity” (or naming exclusions) would prevent confusion.
Also applies to: 157-157
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@README.md` at line 126, The README's summary sentence "Row-level and column-level diffing across 12 warehouses..." conflicts with the full supported list (13 warehouses); update that sentence to clarify scope by specifying "12 SQL warehouses for parity" or similar and reference the full supported list, e.g., change the phrase to "Row-level and column-level diffing across 12 SQL warehouses for parity (see full list of 13 supported warehouses below)" or explicitly note which warehouse is excluded from parity; edit the sentence containing `data_diff` and `/data-parity` to add this qualifier so the earlier summary and the later supported-warehouses list are consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@README.md`:
- Line 126: README currently claims "partitioned execution for 100M+ row tables"
while the tool docs in packages/opencode/src/altimate/tools/data-diff.ts
recommend partitioning for ">10M rows"; update one or the other so both are
consistent: either change the README phrasing to match the tool doc (e.g.,
"recommended for >10M rows") or update the tool doc to justify and match "100M+
rows" and add a brief citation to benchmark context; reference the strings
"100M+ row tables" in README and the threshold text ">10M rows" in data-diff.ts
(and mention `data_diff`/`/data-parity` in the doc update) when making the
change.
---
Nitpick comments:
In `@README.md`:
- Line 126: The README's summary sentence "Row-level and column-level diffing
across 12 warehouses..." conflicts with the full supported list (13 warehouses);
update that sentence to clarify scope by specifying "12 SQL warehouses for
parity" or similar and reference the full supported list, e.g., change the
phrase to "Row-level and column-level diffing across 12 SQL warehouses for
parity (see full list of 13 supported warehouses below)" or explicitly note
which warehouse is excluded from parity; edit the sentence containing
`data_diff` and `/data-parity` to add this qualifier so the earlier summary and
the later supported-warehouses list are consistent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
| Transpile SQL between Snowflake, BigQuery, Databricks, Redshift, PostgreSQL, MySQL, SQL Server, and DuckDB. | ||
|
|
||
| ### Cross-Warehouse Data Parity | ||
| Row-level and column-level diffing across 12 warehouses (including cross-dialect pairs like Postgres ↔ Snowflake or Databricks ↔ Fabric) via the `data_diff` tool and `/data-parity` skill. Five algorithms — `auto`, `joindiff`, `hashdiff`, `profile`, and `cascade` — partitioned execution for 100M+ row tables, and a `profile`-only mode for PII / PHI / PCI environments. |
There was a problem hiding this comment.
Align scale claim with tool docs to avoid overstating capability.
At Line 126, README says partitioned execution is for “100M+ row tables,” but the tool description in packages/opencode/src/altimate/tools/data-diff.ts frames partitioning as recommended for “>10M rows.” Please make these thresholds consistent (or cite benchmark context if 100M+ is intentional).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@README.md` at line 126, README currently claims "partitioned execution for
100M+ row tables" while the tool docs in
packages/opencode/src/altimate/tools/data-diff.ts recommend partitioning for
">10M rows"; update one or the other so both are consistent: either change the
README phrasing to match the tool doc (e.g., "recommended for >10M rows") or
update the tool doc to justify and match "100M+ rows" and add a brief citation
to benchmark context; reference the strings "100M+ row tables" in README and the
threshold text ">10M rows" in data-diff.ts (and mention
`data_diff`/`/data-parity` in the doc update) when making the change.
There was a problem hiding this comment.
1 issue found across 1 file
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="README.md">
<violation number="1" location="README.md:126">
P3: The threshold "100M+ row tables" appears inconsistent with the tool implementation in `data-diff.ts`, which recommends partitioned execution for >10M rows. Consider aligning this claim with the actual tool threshold to avoid overstating the capability.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| Transpile SQL between Snowflake, BigQuery, Databricks, Redshift, PostgreSQL, MySQL, SQL Server, and DuckDB. | ||
|
|
||
| ### Cross-Warehouse Data Parity | ||
| Row-level and column-level diffing across 12 warehouses (including cross-dialect pairs like Postgres ↔ Snowflake or Databricks ↔ Fabric) via the `data_diff` tool and `/data-parity` skill. Five algorithms — `auto`, `joindiff`, `hashdiff`, `profile`, and `cascade` — partitioned execution for 100M+ row tables, and a `profile`-only mode for PII / PHI / PCI environments. |
There was a problem hiding this comment.
P3: The threshold "100M+ row tables" appears inconsistent with the tool implementation in data-diff.ts, which recommends partitioned execution for >10M rows. Consider aligning this claim with the actual tool threshold to avoid overstating the capability.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At README.md, line 126:
<comment>The threshold "100M+ row tables" appears inconsistent with the tool implementation in `data-diff.ts`, which recommends partitioned execution for >10M rows. Consider aligning this claim with the actual tool threshold to avoid overstating the capability.</comment>
<file context>
@@ -122,6 +122,9 @@ Credit analysis, expensive query detection, warehouse right-sizing, unused resou
Transpile SQL between Snowflake, BigQuery, Databricks, Redshift, PostgreSQL, MySQL, SQL Server, and DuckDB.
+### Cross-Warehouse Data Parity
+Row-level and column-level diffing across 12 warehouses (including cross-dialect pairs like Postgres ↔ Snowflake or Databricks ↔ Fabric) via the `data_diff` tool and `/data-parity` skill. Five algorithms — `auto`, `joindiff`, `hashdiff`, `profile`, and `cascade` — partitioned execution for 100M+ row tables, and a `profile`-only mode for PII / PHI / PCI environments.
+
### PII Detection & Safety
</file context>
| Row-level and column-level diffing across 12 warehouses (including cross-dialect pairs like Postgres ↔ Snowflake or Databricks ↔ Fabric) via the `data_diff` tool and `/data-parity` skill. Five algorithms — `auto`, `joindiff`, `hashdiff`, `profile`, and `cascade` — partitioned execution for 100M+ row tables, and a `profile`-only mode for PII / PHI / PCI environments. | |
| Row-level and column-level diffing across 12 warehouses (including cross-dialect pairs like Postgres ↔ Snowflake or Databricks ↔ Fabric) via the `data_diff` tool and `/data-parity` skill. Five algorithms — `auto`, `joindiff`, `hashdiff`, `profile`, and `cascade` — partitioned execution for 10M+ row tables, and a `profile`-only mode for PII / PHI / PCI environments. |
What does this PR do?
Updates
README.mdso it reflects what actually shipped between v0.5.12 and v0.6.1 onmain. The Changelog section had stalled at v0.5.11 (March 2026), and the warehouse / LLM provider lists were missing additions from the same period.Changelog section — added one-liners for the missing releases, trimmed three lower-signal v0.5.x entries (
v0.5.5,v0.5.3,v0.5.1) to keep the section length steady:INFORMATION_SCHEMAcolumns, multi-region), advisoryanti-slopworkflowdata_diff+/data-parity), Microsoft Fabric / MSSQL, Databricks AI Gateway, Bedrock custom-endpoints guidesql_explaintrace listpaginationKey Features — new "Cross-Warehouse Data Parity" entry for the
data_difftool and/data-parityskill (v0.6.0). This is a major user-visible capability that wasn't surfaced anywhere on the landing README.Supported Warehouses — added Microsoft Fabric (v0.6.0).
Works with Any LLM — added Databricks AI Gateway (v0.6.0, #649), Snowflake Cortex (v0.5.6), and LM Studio (v0.5.7). The latter two were already supported but had been missed in earlier README sweeps.
No code changes — this is README only.
Type of change
README.md)Issue for this PR
N/A — documentation refresh; no specific tracking issue. The changes are derived directly from
CHANGELOG.mdentries for v0.5.14 → v0.6.1 already merged tomain.How did you verify your code works?
CHANGELOG.md.data_difftool registered (feat: data-parity skill — TypeScript orchestrator, ClickHouse driver, partition support #493) and Fabric/MSSQL adapter shipped (feat: add MSSQL/Fabric support to data-parity skill #705).*.cloud.databricks.com/*.azuredatabricks.net/*.gcp.databricks.com.5 successful, 4 cached, 1 cache miss (altimate-code; passes).packages/opencode/src/.Checklist
Summary by cubic
Updates
README.mdto reflect v0.5.14–v0.6.1: adds the Cross‑Warehouse Data Parity feature (data_diff,/data-parity), updates supported warehouses and LLM providers, and extends the changelog through v0.6.1.Adds Microsoft Fabric and the Databricks AI Gateway, Snowflake Cortex, and LM Studio; trims a few lower‑signal v0.5.x changelog entries to keep the section concise.
Written for commit ea643a7. Summary will update on new commits.
Summary by CodeRabbit
New Features
Documentation