feat: pluggable per-project Slack notifications with KPI comparison#103
Conversation
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Warning Review limit reached
Next review available in: 15 minutes Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available. How can I continue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews. How do review limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please refer docs for additional details. Review details⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughIntroduces a pluggable per-project Slack notification provider framework: a ChangesSlack Notification Provider Framework
Estimated code review effort: 4 (Complex) | ~60 minutes Sequence Diagram(s)sequenceDiagram
participant ExportPipeline
participant MCPGatewaySlackProvider
participant MLflow
participant SlackAPI
ExportPipeline->>MCPGatewaySlackProvider: notify(context)
MCPGatewaySlackProvider->>MCPGatewaySlackProvider: format_message(context)
MCPGatewaySlackProvider->>MCPGatewaySlackProvider: _load_current_kpis(context)
MCPGatewaySlackProvider->>MLflow: _load_previous_kpis_from_mlflow()
MLflow-->>MCPGatewaySlackProvider: previous run metrics
MCPGatewaySlackProvider->>MCPGatewaySlackProvider: build_comparison_table / _skip_notification
MCPGatewaySlackProvider->>SlackAPI: search_channel_message(anchor)
SlackAPI-->>MCPGatewaySlackProvider: existing message or none
MCPGatewaySlackProvider->>SlackAPI: send_message(formatted message)
SlackAPI-->>MCPGatewaySlackProvider: success/failure
Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/test fournos mcp_gateway smoke |
🟢 Execution of
|
🟢 Submission of
|
|
/test fournos mcp_gateway smoke |
1 similar comment
|
/test fournos mcp_gateway smoke |
🔴 Submission of
|
| DEFAULT_SLACK_TOKEN_FILE = "topsail-bot.slack-token" | ||
| DEFAULT_SECRET_ENV_KEYS = ( | ||
| "PSAP_FORGE_NOTIFICATIONS_SECRET_PATH", | ||
| "PSAP_FORGE_JUMP_CI_SECRET_PATH", |
There was a problem hiding this comment.
you shouldn't need that (the jump-ci is deprecated, it was before Fournos got ready)
|
/test fournos mcp_gateway smoke |
1 similar comment
|
/test fournos mcp_gateway smoke |
🟢 Execution of
|
🟢 Submission of
|
|
/test fournos mcp_gateway smoke |
🟢 Execution of
|
🟢 Submission of
|
|
/test fournos mcp_gateway smoke |
1 similar comment
|
/test fournos mcp_gateway smoke |
🟢 Execution of
|
🟢 Submission of
|
|
/test fournos mcp_gateway smoke |
3 similar comments
|
/test fournos mcp_gateway smoke |
|
/test fournos mcp_gateway smoke |
|
/test fournos mcp_gateway smoke |
🟢 Execution of
|
🟢 Submission of
|
🟢 Execution of
|
🟢 Submission of
|
Add a pluggable notification provider system allowing each FORGE project to define its own Slack notification behavior. Implement MCPGatewaySlackProvider with structured messages including KPI comparison against previous MLflow runs. - Add SlackNotificationProvider ABC and NotificationContext dataclass - Auto-discover provider via config.yaml (notifications.slack.provider_module) - Parameterize Slack API helpers to accept arbitrary channel_id - MCP Gateway provider: header, metadata, KPI table, artifact links - Reuse get_ocpci_link from core notification system for link generation Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
- Read duration from 000__ci_metadata/test_duration.yaml instead of
status dict (which is the caliper export result, not test metadata)
- Parse nested test labels format {"version": "1", "labels": {...}}
- Skip CI artifact links when not in OpenShift CI (avoid broken URLs)
- Add MLflow run link from caliper export status when available
- Guard against status being non-dict in link extraction
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
The jump-ci secret path is deprecated (pre-Fournos), only the forge notifications secret path is needed. Co-authored-by: Cursor <cursoragent@cursor.com>
…esolution - Create projects/core/notifications/helpers.py with framework-wide utilities (get_test_artifacts_root, read_test_duration, get_label_value, find_kpis_jsonl, extract_mlflow_url, KPI table formatting) - Use env.BASE_ARTIFACT_DIR to correctly resolve test artifacts root instead of the export phase's ARTIFACT_DIR - Parse nested __test_labels__.yaml format (version/labels structure) - Skip CI links when not in OpenShift CI, show MLflow link from status Co-authored-by: Cursor <cursoragent@cursor.com>
In Fournos, each step runs as a separate container with its own ARTIFACT_DIR and BASE_ARTIFACT_DIR. The correct root containing all test artifacts is caliper.export.from, which the export entrypoint sets to the shared /workspace/artifacts volume. Co-authored-by: Cursor <cursoragent@cursor.com>
MLflow stores metrics from metrics.json (e.g. requests_per_second, p95_ms, failure_rate) while kpis.jsonl uses prefixed IDs (mcp_gw_requests_per_second, mcp_gw_p95_ms, etc). Add mapping to translate between the two when querying previous run data. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
- Replace manual env-var/path secret lookup in provider.py with vault.get_vault_content_path for the Slack token. - Replace broken OCP CI links with MLflow run URL in notifications. - Add run name parsing and smart comparison matching: same load config, same version type (SHA vs release), and same preset. - Skip duplicate notifications when version/SHA is unchanged. Co-authored-by: Cursor <cursoragent@cursor.com>
9dad6a8 to
8ab7b36
Compare
|
/test fournos mcp_gateway demo |
🟢 Execution of
|
🟢 Submission of
|
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (4)
projects/core/notifications/helpers.py (2)
141-147: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick winUnguarded chained
.get()can raise if a nested key is explicitlyNone.If
context.status["caliper_artifacts_export"]or its"backends"key is present but explicitlyNone(rather than omitted),backends.get("mlflow", {})/.get("backends", {})will raiseAttributeErrorsince.get()is called onNone. This propagates out offormat_message()uncaught within this function.🛡️ Proposed fix
- backends = context.status.get("caliper_artifacts_export", {}).get("backends", {}) - mlflow_info = backends.get("mlflow", {}) + backends = (context.status.get("caliper_artifacts_export") or {}).get("backends") or {} + mlflow_info = backends.get("mlflow") or {}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@projects/core/notifications/helpers.py` around lines 141 - 147, The chained lookups in extract_mlflow_url can fail when caliper_artifacts_export or backends is explicitly None. Update extract_mlflow_url to guard each nested value from context.status before calling .get(), defaulting non-dict values to {} so mlflow_info lookup stays safe. Keep the same return behavior for run_url and experiment_url, but make the nested access in extract_mlflow_url resilient to None.
76-81: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick winNon-deterministic "first match" selection from glob results.
read_test_duration(candidates[0]),get_label_value(labels_glob[0]), andfind_kpis_jsonl(for f in artifact_dir.glob(...): return f) all pick an arbitrary match when multiple candidate files exist under the artifact tree. Python'spathlib.Path.glob()returns paths in unspecified filesystem order, so with multi-step Fournos pipelines (prepare/test/export each with their own subdirs, as noted inget_test_artifacts_root's docstring), this can non-deterministically surface duration/label/KPI data from the wrong step or a stale run.♻️ Suggested fix (deterministic ordering)
- candidates = list(root.glob("**/000__ci_metadata/test_duration.yaml")) + candidates = sorted(root.glob("**/000__ci_metadata/test_duration.yaml")) if not candidates: return "" timing_file = candidates[0]Apply the analogous
sorted(...)fix toget_label_valueandfind_kpis_jsonl.Also applies to: 104-109, 125-133
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@projects/core/notifications/helpers.py` around lines 76 - 81, The helper functions are choosing an arbitrary file when multiple matches exist, which makes `read_test_duration`, `get_label_value`, and `find_kpis_jsonl` non-deterministic. Update the glob-based selection in these helpers to use a deterministic order by sorting the matches before picking the first one, and keep the same selection pattern consistent across the `read_test_duration`, `get_label_value`, and `find_kpis_jsonl` logic.projects/core/notifications/provider.py (1)
114-125: 🩺 Stability & Availability | 🔵 Trivial | ⚖️ Poor tradeoffTOCTOU risk on thread-anchor creation.
search_channel_messagethensend_message(create anchor) is not atomic; if two notifications for the same anchor race (e.g., concurrent CI jobs finishing near-simultaneously, combined with Slack search indexing lag), both can fail to find an existing anchor and each posts a duplicate channel header message. This same pattern is duplicated in the MCP Gateway provider override. Not blocking, but worth being aware of as usage scales.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@projects/core/notifications/provider.py` around lines 114 - 125, The thread-anchor creation flow in provider.py is subject to a TOCTOU race because search_channel_message and then send_message are not atomic, so concurrent notifications can create duplicate anchor messages. Update the anchor-creation logic around slack_api.search_channel_message and slack_api.send_message to use a safer single-creator pattern (or tolerate duplicates by rechecking/handling conflicts after send), and apply the same fix in the MCP Gateway provider override where this pattern is duplicated.projects/agentic_tools/ci_base.py (1)
186-197: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winConsider resolving the notification provider lazily, only when needed.
_resolve_notification_provider()runs in the group'smain()callback for every subcommand invocation (test, prepare, export, etc.), not just when a notification will actually be sent. Ifprovider_moduleis configured, this triggers a dynamic import/instantiation — and any resolution failure logs a warning — on every CI phase, even ones unrelated to notifications.Moving this call into the export/notification path (e.g., inside
caliper_export_entrypointorsend_notification) would scope the work and any failure warnings to where they're relevant.♻️ Suggested approach
- app.init_vaults(ctx.invoked_subcommand) - - ctx.obj.notification_provider = app._resolve_notification_provider() + app.init_vaults(ctx.invoked_subcommand) + + ctx.obj.resolve_notification_provider = app._resolve_notification_providerThen call
ctx.obj.resolve_notification_provider()from withincaliper_export_entrypointright before sending notifications.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@projects/agentic_tools/ci_base.py` around lines 186 - 197, The notification provider is being resolved eagerly in the group-level main() callback, which causes unnecessary imports and warning logs for every CI subcommand. Move the _resolve_notification_provider() call out of main() and resolve it lazily only in the notification/export flow, such as within caliper_export_entrypoint or send_notification, using ctx.obj.resolve_notification_provider() right before a notification is actually sent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@projects/core/notifications/provider.py`:
- Around line 91-134: The base notify() flow in Provider should honor the
post-format skip flag instead of forcing subclasses to duplicate the whole
dispatch path. After format_message(context) in notify(), check context.extra
for _skip_notification and return True early when it is set, so
MCPGatewaySlackProvider.notify() no longer needs its custom override. Keep the
existing behavior around should_notify, token/client setup, dry_run, and
slack_api.send_message intact while adding the skip hook in Provider.notify().
In `@projects/mcp_gateway/orchestration/notifications.py`:
- Line 201: The KPI comparison in notifications is not bound to the current
MLflow run, so `_load_previous_kpis_from_mlflow()` can pick the newest unrelated
run when jobs overlap. Update the call site in `notifications.py` to pass the
current run identity from `NotificationContext` or artifact metadata, and change
`_load_previous_kpis_from_mlflow` to locate that exact run first before looking
for older matching runs. Keep the duplicate-suppression and comparison logic
tied to the resolved run so it uses the correct version/SHA.
- Around line 120-136: The duplicate-run suppression in notify() is checked too
late because get_slack_token() runs before format_message() sets
context.extra["_skip_notification"]. Move the duplicate-skip path earlier in
notify() so format_message() (and the _skip_notification check) happens before
any Slack credential or channel lookup, and keep should_notify() as the first
gate in this flow.
- Around line 76-86: The version parsing in the notification path returns a dict
even when a malformed vsha-* or versioned name cannot be parsed, which conflicts
with the documented behavior. Update the parsing logic in the function that
builds the {"config", "is_sha", "version"} result so that if the expected regex
match fails for either vsha-... or the non-SHA version format, it returns None
instead of a dict with version=None. Keep the valid-path behavior unchanged for
successfully parsed names.
---
Nitpick comments:
In `@projects/agentic_tools/ci_base.py`:
- Around line 186-197: The notification provider is being resolved eagerly in
the group-level main() callback, which causes unnecessary imports and warning
logs for every CI subcommand. Move the _resolve_notification_provider() call out
of main() and resolve it lazily only in the notification/export flow, such as
within caliper_export_entrypoint or send_notification, using
ctx.obj.resolve_notification_provider() right before a notification is actually
sent.
In `@projects/core/notifications/helpers.py`:
- Around line 141-147: The chained lookups in extract_mlflow_url can fail when
caliper_artifacts_export or backends is explicitly None. Update
extract_mlflow_url to guard each nested value from context.status before calling
.get(), defaulting non-dict values to {} so mlflow_info lookup stays safe. Keep
the same return behavior for run_url and experiment_url, but make the nested
access in extract_mlflow_url resilient to None.
- Around line 76-81: The helper functions are choosing an arbitrary file when
multiple matches exist, which makes `read_test_duration`, `get_label_value`, and
`find_kpis_jsonl` non-deterministic. Update the glob-based selection in these
helpers to use a deterministic order by sorting the matches before picking the
first one, and keep the same selection pattern consistent across the
`read_test_duration`, `get_label_value`, and `find_kpis_jsonl` logic.
In `@projects/core/notifications/provider.py`:
- Around line 114-125: The thread-anchor creation flow in provider.py is subject
to a TOCTOU race because search_channel_message and then send_message are not
atomic, so concurrent notifications can create duplicate anchor messages. Update
the anchor-creation logic around slack_api.search_channel_message and
slack_api.send_message to use a safer single-creator pattern (or tolerate
duplicates by rechecking/handling conflicts after send), and apply the same fix
in the MCP Gateway provider override where this pattern is duplicated.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 50cbac82-fc19-4d82-bb22-3260e5474e0a
📒 Files selected for processing (7)
projects/agentic_tools/ci_base.pyprojects/core/library/export.pyprojects/core/notifications/helpers.pyprojects/core/notifications/provider.pyprojects/core/notifications/slack/api.pyprojects/mcp_gateway/orchestration/config.yamlprojects/mcp_gateway/orchestration/notifications.py
| def notify(self, context: NotificationContext, *, dry_run: bool = False) -> bool: | ||
| """Execute the full notification flow. | ||
|
|
||
| Returns True on success, False on failure. | ||
| """ | ||
| if not self.should_notify(context): | ||
| logger.info("Provider %s: should_notify returned False, skipping", type(self).__name__) | ||
| return True | ||
|
|
||
| token = self.get_slack_token() | ||
| if not token: | ||
| logger.warning("Provider %s: no Slack token available", type(self).__name__) | ||
| return False | ||
|
|
||
| channel_id = self.get_channel_id() | ||
| message = self.format_message(context) | ||
| anchor = self.get_thread_anchor(context) | ||
|
|
||
| client = slack_api.init_client(token) | ||
| if not client: | ||
| logger.error("Provider %s: failed to init Slack client", type(self).__name__) | ||
| return False | ||
|
|
||
| channel_msg_ts, _ = slack_api.search_channel_message(client, anchor, channel_id=channel_id) | ||
|
|
||
| if not channel_msg_ts: | ||
| channel_message = f"🧵 {anchor}" | ||
| if dry_run: | ||
| logger.info("Would post channel message: %s", channel_message) | ||
| else: | ||
| channel_msg_ts, ok = slack_api.send_message( | ||
| client, message=channel_message, channel_id=channel_id | ||
| ) | ||
| if not ok: | ||
| return False | ||
|
|
||
| if dry_run: | ||
| logger.info("Would post thread message:\n%s", message) | ||
| return True | ||
|
|
||
| _, ok = slack_api.send_message( | ||
| client, message=message, main_ts=channel_msg_ts, channel_id=channel_id | ||
| ) | ||
| return ok |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win
Base notify() doesn't support post-format skip, forcing full duplication downstream.
MCPGatewaySlackProvider.notify() (see projects/mcp_gateway/orchestration/notifications.py) overrides this entire ~40-line method just to add one check — if context.extra.get("_skip_notification"): return True — after format_message(). Since format_message() is where _skip_notification gets set (via _format_kpi_table), the base class should support this hook natively instead of forcing subclasses to copy the whole dispatch flow.
♻️ Proposed fix
channel_id = self.get_channel_id()
message = self.format_message(context)
+ if context.extra.get("_skip_notification"):
+ logger.info("Provider %s: skip_notification flag set, skipping", type(self).__name__)
+ return True
anchor = self.get_thread_anchor(context)Downstream MCPGatewaySlackProvider.notify() could then be removed entirely.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def notify(self, context: NotificationContext, *, dry_run: bool = False) -> bool: | |
| """Execute the full notification flow. | |
| Returns True on success, False on failure. | |
| """ | |
| if not self.should_notify(context): | |
| logger.info("Provider %s: should_notify returned False, skipping", type(self).__name__) | |
| return True | |
| token = self.get_slack_token() | |
| if not token: | |
| logger.warning("Provider %s: no Slack token available", type(self).__name__) | |
| return False | |
| channel_id = self.get_channel_id() | |
| message = self.format_message(context) | |
| anchor = self.get_thread_anchor(context) | |
| client = slack_api.init_client(token) | |
| if not client: | |
| logger.error("Provider %s: failed to init Slack client", type(self).__name__) | |
| return False | |
| channel_msg_ts, _ = slack_api.search_channel_message(client, anchor, channel_id=channel_id) | |
| if not channel_msg_ts: | |
| channel_message = f"🧵 {anchor}" | |
| if dry_run: | |
| logger.info("Would post channel message: %s", channel_message) | |
| else: | |
| channel_msg_ts, ok = slack_api.send_message( | |
| client, message=channel_message, channel_id=channel_id | |
| ) | |
| if not ok: | |
| return False | |
| if dry_run: | |
| logger.info("Would post thread message:\n%s", message) | |
| return True | |
| _, ok = slack_api.send_message( | |
| client, message=message, main_ts=channel_msg_ts, channel_id=channel_id | |
| ) | |
| return ok | |
| def notify(self, context: NotificationContext, *, dry_run: bool = False) -> bool: | |
| """Execute the full notification flow. | |
| Returns True on success, False on failure. | |
| """ | |
| if not self.should_notify(context): | |
| logger.info("Provider %s: should_notify returned False, skipping", type(self).__name__) | |
| return True | |
| token = self.get_slack_token() | |
| if not token: | |
| logger.warning("Provider %s: no Slack token available", type(self).__name__) | |
| return False | |
| channel_id = self.get_channel_id() | |
| message = self.format_message(context) | |
| if context.extra.get("_skip_notification"): | |
| logger.info("Provider %s: skip_notification flag set, skipping", type(self).__name__) | |
| return True | |
| anchor = self.get_thread_anchor(context) | |
| client = slack_api.init_client(token) | |
| if not client: | |
| logger.error("Provider %s: failed to init Slack client", type(self).__name__) | |
| return False | |
| channel_msg_ts, _ = slack_api.search_channel_message(client, anchor, channel_id=channel_id) | |
| if not channel_msg_ts: | |
| channel_message = f"🧵 {anchor}" | |
| if dry_run: | |
| logger.info("Would post channel message: %s", channel_message) | |
| else: | |
| channel_msg_ts, ok = slack_api.send_message( | |
| client, message=channel_message, channel_id=channel_id | |
| ) | |
| if not ok: | |
| return False | |
| if dry_run: | |
| logger.info("Would post thread message:\n%s", message) | |
| return True | |
| _, ok = slack_api.send_message( | |
| client, message=message, main_ts=channel_msg_ts, channel_id=channel_id | |
| ) | |
| return ok |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@projects/core/notifications/provider.py` around lines 91 - 134, The base
notify() flow in Provider should honor the post-format skip flag instead of
forcing subclasses to duplicate the whole dispatch path. After
format_message(context) in notify(), check context.extra for _skip_notification
and return True early when it is set, so MCPGatewaySlackProvider.notify() no
longer needs its custom override. Keep the existing behavior around
should_notify, token/client setup, dry_run, and slack_api.send_message intact
while adding the skip hook in Provider.notify().
| if rest.startswith("vsha-"): | ||
| is_sha = True | ||
| sha_match = re.match(r"vsha-([a-f0-9]+)-\d{8}-\d{6}$", rest) | ||
| if sha_match: | ||
| version = sha_match.group(1) | ||
| else: | ||
| version_match = re.match(r"v?(.+?)-\d{8}-\d{6}$", rest) | ||
| if version_match: | ||
| version = version_match.group(1) | ||
|
|
||
| return {"config": config, "is_sha": is_sha, "version": version} |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Return None when the version cannot be parsed.
The docstring says unparseable names return None, but malformed vsha-... names currently return a dict with version=None. That can later make two malformed runs compare equal and suppress a notification.
Proposed fix
else:
version_match = re.match(r"v?(.+?)-\d{8}-\d{6}$", rest)
if version_match:
version = version_match.group(1)
+ if version is None:
+ return None
+
return {"config": config, "is_sha": is_sha, "version": version}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if rest.startswith("vsha-"): | |
| is_sha = True | |
| sha_match = re.match(r"vsha-([a-f0-9]+)-\d{8}-\d{6}$", rest) | |
| if sha_match: | |
| version = sha_match.group(1) | |
| else: | |
| version_match = re.match(r"v?(.+?)-\d{8}-\d{6}$", rest) | |
| if version_match: | |
| version = version_match.group(1) | |
| return {"config": config, "is_sha": is_sha, "version": version} | |
| if rest.startswith("vsha-"): | |
| is_sha = True | |
| sha_match = re.match(r"vsha-([a-f0-9]+)-\d{8}-\d{6}$", rest) | |
| if sha_match: | |
| version = sha_match.group(1) | |
| else: | |
| version_match = re.match(r"v?(.+?)-\d{8}-\d{6}$", rest) | |
| if version_match: | |
| version = version_match.group(1) | |
| if version is None: | |
| return None | |
| return {"config": config, "is_sha": is_sha, "version": version} |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@projects/mcp_gateway/orchestration/notifications.py` around lines 76 - 86,
The version parsing in the notification path returns a dict even when a
malformed vsha-* or versioned name cannot be parsed, which conflicts with the
documented behavior. Update the parsing logic in the function that builds the
{"config", "is_sha", "version"} result so that if the expected regex match fails
for either vsha-... or the non-SHA version format, it returns None instead of a
dict with version=None. Keep the valid-path behavior unchanged for successfully
parsed names.
| def notify(self, context: NotificationContext, *, dry_run: bool = False) -> bool: | ||
| """Override to skip notification for duplicate version/SHA runs.""" | ||
| if not self.should_notify(context): | ||
| logger.info("Provider %s: should_notify returned False, skipping", type(self).__name__) | ||
| return True | ||
|
|
||
| token = self.get_slack_token() | ||
| if not token: | ||
| logger.warning("Provider %s: no Slack token available", type(self).__name__) | ||
| return False | ||
|
|
||
| channel_id = self.get_channel_id() | ||
| message = self.format_message(context) | ||
|
|
||
| if context.extra.get("_skip_notification"): | ||
| logger.info("Skipping notification: duplicate version/SHA already notified") | ||
| return True |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Evaluate duplicate suppression before requiring Slack credentials.
format_message() is what sets context.extra["_skip_notification"], but the current order returns False on a missing Slack token before that skip can be detected. Duplicate runs should be skipped without needing Slack token/channel resolution.
Proposed fix
if not self.should_notify(context):
logger.info("Provider %s: should_notify returned False, skipping", type(self).__name__)
return True
+ message = self.format_message(context)
+
+ if context.extra.get("_skip_notification"):
+ logger.info("Skipping notification: duplicate version/SHA already notified")
+ return True
+
token = self.get_slack_token()
if not token:
logger.warning("Provider %s: no Slack token available", type(self).__name__)
return False
channel_id = self.get_channel_id()
- message = self.format_message(context)
-
- if context.extra.get("_skip_notification"):
- logger.info("Skipping notification: duplicate version/SHA already notified")
- return True📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def notify(self, context: NotificationContext, *, dry_run: bool = False) -> bool: | |
| """Override to skip notification for duplicate version/SHA runs.""" | |
| if not self.should_notify(context): | |
| logger.info("Provider %s: should_notify returned False, skipping", type(self).__name__) | |
| return True | |
| token = self.get_slack_token() | |
| if not token: | |
| logger.warning("Provider %s: no Slack token available", type(self).__name__) | |
| return False | |
| channel_id = self.get_channel_id() | |
| message = self.format_message(context) | |
| if context.extra.get("_skip_notification"): | |
| logger.info("Skipping notification: duplicate version/SHA already notified") | |
| return True | |
| def notify(self, context: NotificationContext, *, dry_run: bool = False) -> bool: | |
| """Override to skip notification for duplicate version/SHA runs.""" | |
| if not self.should_notify(context): | |
| logger.info("Provider %s: should_notify returned False, skipping", type(self).__name__) | |
| return True | |
| message = self.format_message(context) | |
| if context.extra.get("_skip_notification"): | |
| logger.info("Skipping notification: duplicate version/SHA already notified") | |
| return True | |
| token = self.get_slack_token() | |
| if not token: | |
| logger.warning("Provider %s: no Slack token available", type(self).__name__) | |
| return False | |
| channel_id = self.get_channel_id() |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@projects/mcp_gateway/orchestration/notifications.py` around lines 120 - 136,
The duplicate-run suppression in notify() is checked too late because
get_slack_token() runs before format_message() sets
context.extra["_skip_notification"]. Move the duplicate-skip path earlier in
notify() so format_message() (and the _skip_notification check) happens before
any Slack credential or channel lookup, and keep should_notify() as the first
gate in this flow.
| if not current_kpis: | ||
| return "" | ||
|
|
||
| previous_kpis, previous_run_name, skip = _load_previous_kpis_from_mlflow() |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy lift
Bind the comparison to this notification’s MLflow run.
_load_previous_kpis_from_mlflow() has no context, so it treats the newest MLflow run as “current”. Parallel or delayed jobs can make runs[0] belong to another run, causing the KPI comparison and duplicate suppression to target the wrong version/SHA. Pass the current run identity from NotificationContext/artifact metadata and find that run explicitly before scanning older matches.
Also applies to: 276-340
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@projects/mcp_gateway/orchestration/notifications.py` at line 201, The KPI
comparison in notifications is not bound to the current MLflow run, so
`_load_previous_kpis_from_mlflow()` can pick the newest unrelated run when jobs
overlap. Update the call site in `notifications.py` to pass the current run
identity from `NotificationContext` or artifact metadata, and change
`_load_previous_kpis_from_mlflow` to locate that exact run first before looking
for older matching runs. Keep the duplicate-suppression and comparison logic
tied to the resolved run so it uses the correct version/SHA.
- Add _skip_notification check in base SlackNotificationProvider.notify() so subclasses don't need to duplicate the entire dispatch flow. - Remove notify() override from MCPGatewaySlackProvider. - Accept current_run_id in _load_previous_kpis_from_mlflow() and locate the current run by ID to avoid races with parallel/delayed jobs. Co-authored-by: Cursor <cursoragent@cursor.com>
|
/test fournos mcp_gateway demo |
🟢 Execution of
|
🟢 Submission of
|
| # --------------------------------------------------------------------------- | ||
| # KPI comparison table formatting | ||
| # --------------------------------------------------------------------------- | ||
|
|
There was a problem hiding this comment.
in another PR, this will have to move to the caliper project (as helper functions just like that)
| # --------------------------------------------------------------------------- | ||
| # KPI file discovery | ||
| # --------------------------------------------------------------------------- | ||
|
|
||
|
|
||
| def find_kpis_jsonl(artifact_dir: Path) -> Path | None: | ||
| """Find kpis.jsonl in artifact directory tree.""" | ||
| direct = artifact_dir / "kpis.jsonl" | ||
| if direct.exists(): | ||
| return direct | ||
|
|
||
| for f in artifact_dir.glob("**/kpis.jsonl"): | ||
| return f | ||
| return None |
There was a problem hiding this comment.
same, will have to move to caliper
| TARGET_KPIS = [ | ||
| ("mcp_gw_requests_per_second", "RPS", "req/s"), | ||
| ("mcp_gw_p95_ms", "P95 latency", "ms"), | ||
| ("mcp_gw_p99_ms", "P99 latency", "ms"), | ||
| ("mcp_gw_failure_rate", "Failure rate", "%"), | ||
| ] |
There was a problem hiding this comment.
FYI, this file will break when I merge the PR refining the KPI structure
|
thanks Avi, let's merge this. There will be some work to do:
|
Add a pluggable notification provider system allowing each FORGE project to define its own Slack notification behavior. Implement MCPGatewaySlackProvider with structured messages including KPI comparison against previous MLflow runs.
Summary by CodeRabbit
New Features
Bug Fixes