[llm-d] Work on the KPI generation by kpouget · Pull Request #100 · openshift-psap/forge

kpouget · 2026-06-30T17:32:34Z

Summary by CodeRabbit

New Features
- Added MLflow artifact import from a web UI URL, including improved download reporting and result summaries.
- Introduced KPI auto-discovery with KPI CSV export and an HTML KPI summary report for GuideLLM.
- Enhanced postprocessing to generate richer step logs, KPI/AI-eval exports (with per-entry artifacts), and include postprocess status details in export notifications.
- Expanded replot with URL-based downloading and a --keep-download option.
Bug Fixes
- Improved CLI error messaging (including missing artifact-import output directory), with clearer handling for TLS/cert, connection/auth, timeouts, and filesystem failures.

openshift-ci · 2026-06-30T17:32:39Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign tosokin for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2026-06-30T17:33:08Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR adds MLflow web-UI URL-based artifact import to the Caliper CLI, introduces a KPI decorator/discovery/catalog framework, refactors GuideLLM KPI computation and adds HTML/CSV KPI reporting, reworks postprocess orchestration into a class-based flow with step logging and AI-eval artifact export, changes replot's download path to shell out via CLI, and renames a shared CI decorator.

Changes

Caliper and GuideLLM pipeline changes

Layer / File(s)	Summary
MLflow import flow `projects/caliper/cli/main.py`	Adds MLflow URL parsing/validation, connection setup (secrets, TLS), client/experiment init, timeout-bound artifact download, result formatting, new `--from-mlflow-url` option, and Click `MissingParameter` handling.
KPI decorators and discovery `projects/caliper/engine/kpi/decorators.py`, `projects/caliper/engine/kpi/__init__.py`	Adds decorator classes for KPI metadata/formatting/2D axes, label extraction utilities, KPI function discovery, and catalog building, exported via package `__init__`.
GuideLLM KPI computation `projects/guidellm/postprocess/guidellm/parsing/kpis.py`	Replaces hardcoded KPI mappings with decorated module-level KPI functions, dynamic catalog/computation via discovery, and per-record label/metadata extraction.
KPI CSV export support `projects/guidellm/postprocess/guidellm/csv_export/*`	Adds `KPICsvRow`/`KPICsvSchema` models, `KPICsvExporter`, list/model export helpers, and package exports.
KPI report generation and plugin wiring `projects/guidellm/postprocess/guidellm/plotting/kpi_report.py`, `projects/guidellm/postprocess/guidellm/plugin.py`, `projects/guidellm/postprocess/guidellm/visualize-groups.yaml`	Adds HTML KPI report generation, registers `report_kpi_summary` in `PLOT_REGISTRY`, adds `get_ai_eval_artifact_files` hook, and updates visualization groups.

Estimated code review effort: 4 (Complex) | ~75 minutes

Postprocess orchestration, step logging, and AI-eval export

Layer / File(s)	Summary
AI evaluation artifact export `projects/caliper/engine/model.py`, `projects/caliper/engine/ai_eval.py`	Adds `get_ai_eval_artifact_files` plugin hook and reworks AI-eval export to copy per-test-entry artifacts and record exported entry metadata.
KPI CSV export configuration `projects/caliper/orchestration/postprocess_config.py`	Adds `CaliperOrchestrationKpiCsvExportSection` and wires it into KPI orchestration config with defaults.
Step-scoped logging and command banners `projects/caliper/orchestration/step_logging.py`	Introduces thread-local step log handlers, indexed step logging context managers, cleanup, and reproducible command-banner logging per step.
Postprocess orchestrator refactor `projects/caliper/orchestration/postprocess.py`	Refactors KPI generate/export/AI-eval export, adds KPI CSV export and analyze stub with command logging, and replaces prior function flow with `CaliperPostprocessOrchestrator`.
Postprocess status notification formatting `projects/caliper/orchestration/notification.py`	Adds `PostprocessStepResult`/`PostprocessResult` dataclasses and functions to format Markdown notifications and parse raw status data.

Estimated code review effort: 4 (Complex) | ~60 minutes

Replot MLflow download refactor

Layer / File(s)	Summary
CLI-based MLflow artifact download `projects/caliper/orchestration/replot.py`	Replaces direct MLflow SDK download with a subprocess call to the CLI `artifacts import` command, diffing files and centralizing error handling.
Conditional download and postprocess skipping `projects/caliper/orchestration/replot.py`	Adds skip-download logic when no URL is provided or directory already has content, and refines cleanup status.
Replot CLI options and keep-directory precedence `projects/core/library/replot.py`	Adds `--url` and `--keep-download` CLI options, removes AWS vault secret handling, and gives CLI keep-flag precedence over config.

Estimated code review effort: 3 (Moderate) | ~30 minutes

CI entrypoint decorator rename

Layer / File(s)	Summary
safe_ci_entrypoint decorator rename `projects/core/library/postprocess.py`, `projects/core/agentic/on_failure/__init__.py`, `projects/core/ci_entrypoint/fournos_resolve.py`, `docs/orchestration/ci.md`	Renames `@ci_lib.safe_ci_command` to `@ci_lib.safe_ci_entrypoint` across entrypoints and docs.
Export notification MLflow link wiring `projects/core/library/export.py`	Adds MLflow URL helpers for step/file links and integrates postprocess status parsing/notification into step log export.

Estimated code review effort: 2 (Simple) | ~15 minutes

KServe deployment capture

Layer / File(s)	Summary
Related deployments capture task `projects/kserve/toolbox/capture_llmisvc_state/main.py`	Adds a task that runs `oc get deployments` for a service label and writes the result to a JSON artifact file.

Estimated code review effort: 1 (Trivial) | ~5 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title is related to the main theme of the PR and accurately indicates KPI-generation work, even though it is broad.
Docstring Coverage	✅ Passed	Docstring coverage is 88.74% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

kpouget · 2026-06-30T17:34:42Z

/test fournos llm_d smoke
/cluster forge-smoke-testing
/pipeline forge-full

psap-forge-bot · 2026-06-30T17:48:20Z

🟢 Execution of `llm_d` smoke 🟢

Execution Engine Configuration

forge:
  args:
  - smoke
  configOverrides: {}
  project: llm_d

Artifact Links

Test Logs

✅ 00 Pre-Cleanup `5 seconds`

✅ 01 Prepare `2 minutes, 2 seconds`

✅ 02 Preflight `1 second`

✅ 02 Test `6 minutes, 36 seconds`

Test Description

This test performs a smoke validation of the llm_d project by deploying the Qwen3-0.6B model using the approximate-prefix-cache deployment profile. It verifies inference functionality with a short benchmark workload (256 prompt tokens, 128 output tokens at 1 req/sec) and a default completion request.

✅ 04 Post-Cleanup `8 seconds`

🔄 05 Export-Artifacts

psap-forge-bot · 2026-06-30T17:48:32Z

🟢 Submission of llm_d smoke finished after 13 minutes, 7 seconds 🟢

/test fournos llm_d smoke
/cluster forge-smoke-testing
/pipeline forge-full

kpouget · 2026-06-30T18:48:37Z

/test fournos llm_d smoke
/cluster forge-smoke-testing
/pipeline forge-full

psap-forge-bot · 2026-06-30T19:01:12Z

🟢 Execution of `llm_d` smoke 🟢

Execution Engine Configuration

forge:
  args:
  - smoke
  configOverrides: {}
  project: llm_d

Artifact Links

Test Logs

✅ 00 Pre-Cleanup `5 seconds`

✅ 01 Prepare `2 minutes, 6 seconds`

✅ 02 Preflight `1 second`

✅ 02 Test `6 minutes, 36 seconds`

Test Description

This test validates the llm_d project by executing a smoke check and short benchmark using the Qwen/Qwen3-0.6B model deployed with the approximate-prefix-cache profile and internal gateway routing.

✅ 04 Post-Cleanup `8 seconds`

🔄 05 Export-Artifacts

psap-forge-bot · 2026-06-30T19:01:35Z

🟢 Submission of llm_d smoke finished after 12 minutes, 10 seconds 🟢

/test fournos llm_d smoke
/cluster forge-smoke-testing
/pipeline forge-full

kpouget · 2026-06-30T20:34:54Z

/test fournos llm_d smoke
/cluster forge-smoke-testing
/pipeline forge-full

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (1)

projects/caliper/engine/kpi/decorators.py (1)

166-173: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Drop the hardcoded KPI name prefixes from discovery.

This engine-level helper only finds decorated functions whose names start with guidellm_, matbench_, or kpi_, so a correctly decorated KPI silently disappears if it uses any other name. Prefer filtering by decorator metadata plus obj.__module__ == module.__name__ instead of encoding downstream plugin prefixes here.

♻️ Possible direction

-    for name, obj in inspect.getmembers(module, inspect.isfunction):
-        if (
-            name.startswith(("guidellm_", "matbench_", "kpi_"))
-            and hasattr(obj, "_kpi_help")
-            and hasattr(obj, "_kpi_unit")
-            and hasattr(obj, "_kpi_higher_is_better")
-        ):
-            kpi_functions[name] = obj
+    for name, obj in inspect.getmembers(module, inspect.isfunction):
+        if (
+            obj.__module__ == module.__name__
+            and hasattr(obj, "_kpi_help")
+            and hasattr(obj, "_kpi_unit")
+            and hasattr(obj, "_kpi_higher_is_better")
+        ):
+            kpi_functions[name] = obj

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/caliper/engine/kpi/decorators.py` around lines 166 - 173, Remove the
hardcoded name-prefix check from the KPI discovery loop in the module-scanning
helper so decorated KPIs are found regardless of function name. Update the logic
in the inspect.getmembers-based scan to rely on the decorator metadata fields
(_kpi_help, _kpi_unit, _kpi_higher_is_better) and ensure the function belongs to
the current module via obj.__module__ == module.__name__ before adding it to
kpi_functions.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@projects/caliper/cli/main.py`:
- Around line 209-214: The secrets-based tracking URI is only applied inside the
verbose logging path, so `MLFLOW_TRACKING_URI` and
`MLFLOW_TRACKING_INSECURE_TLS` from `--mlflow-secrets` should be processed
regardless of `-v`. Update the flow in `main()` so the connection settings
(`final_tracking_uri`, `final_insecure_tls`) are populated before any
verbose-only exit/logic, and make sure the secrets handling branch that reads
`MLFLOW_TRACKING_URI` and `MLFLOW_TRACKING_INSECURE_TLS` is not gated by
`click.echo` or `verbose` state.
- Around line 58-60: The endpoint construction in main.py is stripping the path
from parsed URLs, which breaks path-prefixed MLflow tracking links. Update the
endpoint logic around the parsed.scheme/parsed.netloc handling to preserve any
non-empty parsed.path (while still removing fragment/query as needed) so URLs
like https://host/mlflow/#/... continue to point to the correct tracking URI.
- Around line 1037-1041: The --timeout option in main.py currently allows 0 or
negative values, which can disable the alarm and remove the download timeout.
Update the click.option definition for --timeout to enforce a minimum of 1,
using the existing timeout option in the CLI entrypoint so invalid values are
rejected before reaching the alarm setup.

In `@projects/guidellm/postprocess/guidellm/parsing/kpis.py`:
- Around line 137-145: The fallback in guidellm_throughput_curve() is
synthesizing fake 2D KPI points from request_rate when throughput_curve is
missing, which makes GuideLLMKpiHandler.compute_kpis() and the HTML report treat
invented data as real measurements. Remove the scalar-derived fallback in the
KPI parsing path (including the duplicate logic around the other referenced
section) and return an empty list whenever unified_record.metrics does not
contain throughput_curve, keeping only the real curve parsing from the existing
tuple data.

In `@projects/guidellm/postprocess/guidellm/plotting/kpi_report.py`:
- Around line 30-40: The KPI report is incorrectly treating a multi-record input
as a single record by reading only records[0] in the label/KPI assembly path.
Update GuideLLM KPI report generation to either explicitly aggregate/group the
records before building labels and values, or raise an error when more than one
record is passed; make the change in the code path that uses first_record,
LABEL_EXTRACTOR.extract, and all_labels so the output cannot silently ignore
extra records.
- Line 133: The HTML generation in kpi_report.py is interpolating untrusted
benchmark fields directly into the report, so escape every dynamic value before
rendering. Update the code around the report_title interpolation and the other
HTML builders that use labels, KPI descriptions, and test metadata so they all
pass through the same escaping helper before being inserted into f-strings. Use
the existing report rendering functions and any shared formatting helpers in
kpi_report.py to apply this consistently across all affected sections.
- Around line 64-80: The KPI rendering in kpi_report is re-deriving precision
instead of honoring the decorator metadata, so update the report generation and
2D rendering paths to use the format strings already attached to the KPI
function (_kpi_format, _kpi_x_format, and _kpi_y_format). In the kpi_record
assembly and the HTML/value formatting logic that consumes it, preserve and
apply those declared formats from the KPI function rather than forcing decimal
places, including the 2D-specific branches for x and y values.

---

Nitpick comments:
In `@projects/caliper/engine/kpi/decorators.py`:
- Around line 166-173: Remove the hardcoded name-prefix check from the KPI
discovery loop in the module-scanning helper so decorated KPIs are found
regardless of function name. Update the logic in the inspect.getmembers-based
scan to rely on the decorator metadata fields (_kpi_help, _kpi_unit,
_kpi_higher_is_better) and ensure the function belongs to the current module via
obj.__module__ == module.__name__ before adding it to kpi_functions.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6f5ba352-2848-4e1b-9740-70a85402604b

📥 Commits

Reviewing files that changed from the base of the PR and between 0fb7938 and 93d092d.

📒 Files selected for processing (7)

projects/caliper/cli/main.py
projects/caliper/engine/kpi/__init__.py
projects/caliper/engine/kpi/decorators.py
projects/guidellm/postprocess/guidellm/parsing/kpis.py
projects/guidellm/postprocess/guidellm/plotting/kpi_report.py
projects/guidellm/postprocess/guidellm/plugin.py
projects/guidellm/postprocess/guidellm/visualize-groups.yaml

psap-forge-bot · 2026-06-30T20:58:48Z

🟢 Execution of `llm_d` smoke 🟢

Execution Engine Configuration

forge:
  args:
  - smoke
  configOverrides: {}
  project: llm_d

Artifact Links

Test Logs

✅ 00 Pre-Cleanup `5 seconds`

✅ 01 Prepare `2 minutes, 5 seconds`

✅ 02 Preflight `1 second`

✅ 02 Test `6 minutes, 33 seconds`

Test Description

This test performs a smoke validation for the llm_d project to verify the approximate-prefix-cache deployment profile using the lightweight Qwen/Qwen3-0.6B model. It specifically evaluates basic inference under a short benchmark configuration with 256 prompt tokens, 128 output tokens, and a concurrency rate of 1.

✅ 04 Post-Cleanup `8 seconds`

🔄 05 Export-Artifacts

psap-forge-bot · 2026-06-30T20:59:11Z

🟢 Submission of llm_d smoke finished after 23 minutes, 48 seconds 🟢

/test fournos llm_d smoke
/cluster forge-smoke-testing
/pipeline forge-full

kpouget · 2026-07-01T08:09:31Z

/test fournos llm_d smoke
/cluster forge-smoke-testing
/pipeline forge-full #

coderabbitai

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

projects/caliper/orchestration/step_logging.py (1)
1-422: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Run ruff format on this file before merge.

The current formatting is already failing both Ruff format checks in CI.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/caliper/orchestration/step_logging.py` around lines 1 - 422, The
file’s formatting is not Ruff-compliant, so run ruff format on step_logging.py
and keep the resulting style changes only. Focus on preserving the existing
behavior in functions like _get_next_step_index, step_logging_indexed,
step_logging, and the log_*_command helpers while aligning spacing, wrapping,
and line breaks with Ruff’s formatter so CI passes.
Source: Pipeline failures

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@projects/caliper/orchestration/postprocess_config.py`:
- Around line 81-84: The csv_export output field currently allows null in
postprocess_config, which later breaks _run_kpi_csv_export() when it combines
output_dir with postprocess_config.kpi.csv_export.output. Update the
CsvExport/PostprocessConfig field definition so csv_export.output is required to
be a non-null string, keeping the default of "kpis.csv" and removing any None
allowance from the type or validation.

In `@projects/caliper/orchestration/postprocess.py`:
- Around line 583-600: The HTML report generation in the postprocess flow is
using the optional visualize path instead of the resolved visualize output
directory, so the final pages can be skipped when callers rely on
caliper.postprocess.visualize.output_dir. Update the report-writing block in
postprocess.py to use the resolved output directory chosen by run_visualize(),
and pass that resolved path into generate_caliper_reports_index and
generate_postprocess_status_report so both reports are always written to the
actual visualization output location.
- Around line 228-230: The CSV export path in postprocess output handling is not
sandboxed, so absolute paths or path traversal can escape the artifact
directory. Update the output path logic in the postprocess flow around
output_file creation to resolve the configured CSV export path against
output_dir and enforce that the final path stays within that directory, using
the existing postprocess_config.kpi.csv_export.output and output_file handling.
Reject or normalize any config value that would write outside the postprocess
output dir.
- Around line 503-510: The KPI CSV export step in postprocess.py is logging an
input path from postprocess_config.kpi.generate.output even when the KPI
generate step did not run or failed, so adjust the _run_kpi_csv_export call path
handling to only reference a real JSONL file. Use the existing identifiers
step_logging, kpi_csv_export, and postprocess_config.kpi.generate to determine
whether the generate output exists; if it does not, avoid emitting a
reproduction banner with a fake --input value and instead pass a valid created
path or skip logging that command.

In `@projects/guidellm/postprocess/guidellm/csv_export/kpi_csv_exporter.py`:
- Around line 49-53: The KPI CSV export flow in KpiCsvExporter currently treats
2D records as scalar data when include_2d_kpis is enabled, which silently
corrupts the export. Update the 2D handling branch in the exporter logic to fail
fast instead of appending those records to scalar_records, and surface a clear
error from the export path so callers know 2D KPI CSV export is not supported
yet. Keep the fix localized to the record-processing code that checks
record.get("is_2d", False) and the subsequent CSV conversion path.

In `@projects/guidellm/postprocess/guidellm/csv_export/kpi_csv_model.py`:
- Around line 205-208: The KPI CSV export header is using the column name
instead of the actual schema version value. Update the header generation in the
KPI CSV model so the Schema Version comment reads from the stored schema version
data rather than looking up the 'schema_version' entry in self.columns, and keep
the change localized to the header-building logic.

In `@projects/guidellm/postprocess/guidellm/parsing/kpis.py`:
- Around line 344-350: `performance_tier` is comparing
`record.metrics.get("total_tok/sec", 0)` directly, so string values can raise
TypeError and break `TestLabelExtractor.extract()` fallback behavior. Update the
KPI derivation in `compute_kpis()`/the `performance_tier` mapping to normalize
`total_tok/sec` to a numeric type before the threshold checks, and make sure any
access through `record.metrics` handles missing or non-numeric values
consistently.
- Around line 245-250: `guidellm_request_latency_max` is incorrectly annotated
as higher-is-better, which reverses the ranking semantics for this latency KPI.
Update the decorators on `guidellm_request_latency_max` in `kpis.py` to use the
lower-is-better equivalent instead of `@HigherBetter()`, while keeping the
existing `@Format("{:.4f}")` and `@KPIMetadata` metadata unchanged so consumers
interpret higher latency as worse.
- Around line 424-432: The KPI record built in compute_kpis is missing decorator
metadata such as help, so create_csv_row_from_kpi_record cannot populate the CSV
kpi_help column. Update the kpi_record assembly in compute_kpis to carry through
the existing decorator fields from kpi_func, especially the help text (and any
other relevant metadata already available on the KPI definition), so downstream
CSV export reads them instead of emitting empty values.

---

Outside diff comments:
In `@projects/caliper/orchestration/step_logging.py`:
- Around line 1-422: The file’s formatting is not Ruff-compliant, so run ruff
format on step_logging.py and keep the resulting style changes only. Focus on
preserving the existing behavior in functions like _get_next_step_index,
step_logging_indexed, step_logging, and the log_*_command helpers while aligning
spacing, wrapping, and line breaks with Ruff’s formatter so CI passes.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 69d184da-e489-49ef-9e9b-27f5dbfc9f81

📥 Commits

Reviewing files that changed from the base of the PR and between 93d092d and 73e4ca4.

📒 Files selected for processing (13)

projects/caliper/cli/main.py
projects/caliper/engine/kpi/__init__.py
projects/caliper/engine/kpi/decorators.py
projects/caliper/orchestration/postprocess.py
projects/caliper/orchestration/postprocess_config.py
projects/caliper/orchestration/step_logging.py
projects/guidellm/postprocess/guidellm/csv_export/__init__.py
projects/guidellm/postprocess/guidellm/csv_export/kpi_csv_exporter.py
projects/guidellm/postprocess/guidellm/csv_export/kpi_csv_model.py
projects/guidellm/postprocess/guidellm/parsing/kpis.py
projects/guidellm/postprocess/guidellm/plotting/kpi_report.py
projects/guidellm/postprocess/guidellm/plugin.py
projects/guidellm/postprocess/guidellm/visualize-groups.yaml

✅ Files skipped from review due to trivial changes (1)

projects/caliper/engine/kpi/init.py

🚧 Files skipped from review as they are similar to previous changes (5)

projects/guidellm/postprocess/guidellm/plugin.py
projects/guidellm/postprocess/guidellm/visualize-groups.yaml
projects/guidellm/postprocess/guidellm/plotting/kpi_report.py
projects/caliper/engine/kpi/decorators.py
projects/caliper/cli/main.py

coderabbitai · 2026-07-01T08:14:44Z

+    output: str | None = Field(
+        default="kpis.csv",
+        description="CSV filename or path; relative paths resolve under the post-processing artifact dir.",
+    )


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Don't allow csv_export.output to be null.

_run_kpi_csv_export() later does output_dir / postprocess_config.kpi.csv_export.output, so output: null passes validation here and then fails at runtime with TypeError as soon as the step is enabled.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@projects/caliper/orchestration/postprocess_config.py` around lines 81 - 84, The csv_export output field currently allows null in postprocess_config, which later breaks _run_kpi_csv_export() when it combines output_dir with postprocess_config.kpi.csv_export.output. Update the CsvExport/PostprocessConfig field definition so csv_export.output is required to be a non-null string, keeping the default of "kpis.csv" and removing any None allowance from the type or validation.

coderabbitai · 2026-07-01T08:14:44Z

+        # Create output file path
+        output_file = output_dir / postprocess_config.kpi.csv_export.output
+        output_file.parent.mkdir(parents=True, exist_ok=True)


🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Keep CSV exports inside the postprocess output dir.

output_dir / postprocess_config.kpi.csv_export.output is not a sandbox: an absolute value discards output_dir, and .. segments still escape it after resolution. A bad config can therefore write outside the artifact tree despite the config contract saying relative paths stay under the post-processing dir.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@projects/caliper/orchestration/postprocess.py` around lines 228 - 230, The CSV export path in postprocess output handling is not sandboxed, so absolute paths or path traversal can escape the artifact directory. Update the output path logic in the postprocess flow around output_file creation to resolve the configured CSV export path against output_dir and enforce that the final path stays within that directory, using the existing postprocess_config.kpi.csv_export.output and output_file handling. Reject or normalize any config value that would write outside the postprocess output dir.

coderabbitai · 2026-07-01T08:14:44Z

+                # KPI CSV export (uses the same computed KPIs)
+                if postprocess_config.kpi.csv_export.enabled:
+                    with step_logging("caliper_kpi_csv_export", step_logs_dir):
+                        # Path to the JSONL file for reference in command logging
+                        kpi_jsonl_path = output_dir / postprocess_config.kpi.generate.output
+                        kpi_results["kpi_csv_export"] = _run_kpi_csv_export(
+                            postprocess_config, plugin, model, output_dir, kpi_jsonl_path
+                        )


🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Don't log a CSV-export input file that was never created.

This always points the reproduction banner at output_dir / postprocess_config.kpi.generate.output, even when kpi.generate was disabled or failed. The logged caliper kpi csv-export --input ... command is then not actually reproducible.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@projects/caliper/orchestration/postprocess.py` around lines 503 - 510, The KPI CSV export step in postprocess.py is logging an input path from postprocess_config.kpi.generate.output even when the KPI generate step did not run or failed, so adjust the _run_kpi_csv_export call path handling to only reference a real JSONL file. Use the existing identifiers step_logging, kpi_csv_export, and postprocess_config.kpi.generate to determine whether the generate output exists; if it does not, avoid emitting a reproduction banner with a fake --input value and instead pass a valid created path or skip logging that command.

coderabbitai · 2026-07-01T08:14:44Z

+            if record.get("is_2d", False):
+                if self.include_2d_kpis:
+                    # TODO: Implement 2D KPI export
+                    scalar_records.append(record)
+                else:


🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Fail fast when include_2d_kpis=True.

That branch currently appends raw 2D KPI records into a scalar CSV schema. The later conversion just stringifies the point list into one cell and drops the x/y metadata, so callers get a “successful” export with corrupted 2D data.

Proposed fix

if record.get("is_2d", False): if self.include_2d_kpis: - # TODO: Implement 2D KPI export - scalar_records.append(record) + raise NotImplementedError("2D KPI CSV export is not implemented yet") else: skipped_2d_count += 1

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if record.get("is_2d", False):

if self.include_2d_kpis:

# TODO: Implement 2D KPI export

scalar_records.append(record)

else:

if record.get("is_2d", False):

if self.include_2d_kpis:

raise NotImplementedError("2D KPI CSV export is not implemented yet")

else:

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@projects/guidellm/postprocess/guidellm/csv_export/kpi_csv_exporter.py` around lines 49 - 53, The KPI CSV export flow in KpiCsvExporter currently treats 2D records as scalar data when include_2d_kpis is enabled, which silently corrupts the export. Update the 2D handling branch in the exporter logic to fail fast instead of appending those records to scalar_records, and surface a clear error from the export path so callers know 2D KPI CSV export is not supported yet. Keep the fix localized to the record-processing code that checks record.get("is_2d", False) and the subsequent CSV conversion path.

coderabbitai · 2026-07-01T08:14:45Z

+        header_lines.append("# GuideLLM KPI Export")
+        header_lines.append(
+            f"# Schema Version: {self.columns[self.columns.index('schema_version')]}"
+        )


🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Write the actual schema version in the header comment.

This currently emits # Schema Version: schema_version because it reads the column name from self.columns instead of the version value. The exported header is misleading as written.

Proposed fix

- header_lines.append( - f"# Schema Version: {self.columns[self.columns.index('schema_version')]}" - ) + header_lines.append("# Schema Version: 1")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

header_lines.append("# GuideLLM KPI Export")

header_lines.append(

f"# Schema Version: {self.columns[self.columns.index('schema_version')]}"

)

header_lines.append("# GuideLLM KPI Export")

header_lines.append("# Schema Version: 1")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@projects/guidellm/postprocess/guidellm/csv_export/kpi_csv_model.py` around lines 205 - 208, The KPI CSV export header is using the column name instead of the actual schema version value. Update the header generation in the KPI CSV model so the Schema Version comment reads from the stored schema version data rather than looking up the 'schema_version' entry in self.columns, and keep the change localized to the header-building logic.

coderabbitai · 2026-07-01T08:14:45Z

+            "performance_tier": (
+                "high"
+                if record.metrics.get("total_tok/sec", 0) > 1000
+                else "medium"
+                if record.metrics.get("total_tok/sec", 0) > 100
+                else "low"
+            ),


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Cast total_tok/sec before deriving performance_tier.

If record.metrics["total_tok/sec"] arrives as a string, these > comparisons raise TypeError. TestLabelExtractor.extract() then falls back to {}, which drops every extracted test-condition label for that record in both compute_kpis() and the HTML report.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@projects/guidellm/postprocess/guidellm/parsing/kpis.py` around lines 344 - 350, `performance_tier` is comparing `record.metrics.get("total_tok/sec", 0)` directly, so string values can raise TypeError and break `TestLabelExtractor.extract()` fallback behavior. Update the KPI derivation in `compute_kpis()`/the `performance_tier` mapping to normalize `total_tok/sec` to a numeric type before the threshold checks, and make sure any access through `record.metrics` handles missing or non-numeric values consistently.

psap-forge-bot · 2026-07-01T08:20:57Z

🟢 Execution of `llm_d` smoke 🟢

Execution Engine Configuration

forge:
  args:
  - smoke
  configOverrides: {}
  project: llm_d

Artifact Links

Test Logs

✅ 00 Pre-Cleanup `5 seconds`

✅ 01 Prepare `1 minute, 55 seconds`

✅ 02 Preflight `1 second`

❌ 02 Test `6 minutes, 38 seconds`

Test Description

This test validates the llm_d project by running a smoke evaluation of the Qwen/Qwen3-0.6B model deployed with the approximate-prefix-cache profile. It specifically tests basic inference stability and prefix caching behavior using a lightweight short benchmark configured for 256 prompt tokens, 128 output tokens, and a single concurrent request rate.

✅ 04 Post-Cleanup `9 seconds`

🔄 05 Export-Artifacts

psap-forge-bot · 2026-07-01T08:21:10Z

🔴 Submission of llm_d smoke finished after 10 minutes, 54 seconds 🔴

/test fournos llm_d smoke
/cluster forge-smoke-testing
/pipeline forge-full #

kpouget · 2026-07-01T08:36:22Z

/test fournos llm_d smoke
/cluster forge-smoke-testing
/pipeline forge-full

psap-forge-bot · 2026-07-01T08:48:58Z

🟢 Execution of `llm_d` smoke 🟢

Execution Engine Configuration

forge:
  args:
  - smoke
  configOverrides: {}
  project: llm_d

Artifact Links

Test Logs

✅ 00 Pre-Cleanup `5 seconds`

✅ 01 Prepare `2 minutes, 19 seconds`

✅ 02 Preflight `1 second`

✅ 02 Test `6 minutes, 34 seconds`

Test Description

This test executes a smoke test for the llm_d project using the Qwen/Qwen3-0.6B model deployed with the approximate-prefix-cache profile, running a short benchmark workload of 20 samples with 256 prompt tokens and 128 output tokens at a concurrent rate of 1.

✅ 04 Post-Cleanup `7 seconds`

🔄 05 Export-Artifacts

psap-forge-bot · 2026-07-01T08:49:13Z

🟢 Submission of llm_d smoke succeeded after 11 minutes, 14 seconds 🟢

/test fournos llm_d smoke
/cluster forge-smoke-testing
/pipeline forge-full

kpouget · 2026-07-01T09:02:56Z

/test fournos llm_d smoke
/cluster forge-smoke-testing
/pipeline forge-full

psap-forge-bot · 2026-07-01T09:14:02Z

🟢 Execution of `llm_d` smoke 🟢

Execution Engine Configuration

forge:
  args:
  - smoke
  configOverrides: {}
  project: llm_d

Artifact Links

Test Logs

✅ 00 Pre-Cleanup `5 seconds`

✅ 01 Prepare `1 minute, 52 seconds`

✅ 02 Preflight `1 second`

✅ 02 Test `6 minutes, 33 seconds`

Test Description

This test validates the llm_d project by applying the smoke and deployment-approximate-prefix-cache presets to evaluate the approximate prefix caching deployment profile. It verifies basic inference functionality and latency using a short benchmark workload (256 input/128 output tokens at rate 1) with the Qwen/Qwen3-0.6B model.

✅ 04 Post-Cleanup `7 seconds`

🔄 05 Export-Artifacts

psap-forge-bot · 2026-07-01T09:14:18Z

🟢 Submission of llm_d smoke succeeded after 10 minutes, 40 seconds 🟢

/test fournos llm_d smoke
/cluster forge-smoke-testing
/pipeline forge-full

…tification

…the generated files

…e steps

…he smoke test

…PI value, not 0

kpouget · 2026-07-03T10:10:15Z

/test fournos llm_d smoke
/cluster psap-mgmt
/pipeline forge-replot
/replot.url https://mlflow.apps.psap-automation.ibm.rhperfscale.org/#/experiments/231/runs/392dafd5908a4447bad9d3b427838bb5/artifacts/02__test/001__llmd_test?workspace=forge-llmd

psap-forge-bot · 2026-07-03T10:14:01Z

🟢 Execution of `llm_d` smoke 🟢

Execution Engine Configuration

forge:
  args:
  - smoke
  configOverrides:
    caliper.replot.url: https://mlflow.apps.psap-automation.ibm.rhperfscale.org/#/experiments/231/runs/392dafd5908a4447bad9d3b427838bb5/artifacts/02__test/001__llmd_test?workspace=forge-llmd
  project: llm_d

Artifact Links

Test Logs

✅ 000 Replot `20 seconds`

🔄 001 Export-Artifacts

Post-processing Status ✅

✅ parse: success
✅ visualize: success
- 📄 report_00_guidellm_performance_analysis.html
- 📄 kpi_report.html
✅ kpi_generate: success
- 📄 kpis.json
✅ kpi_csv_export: success
- 📊 kpis.csv
⏭️ kpi_export: skipped

KPI export not yet implemented
✅ ai_eval_export: success
- 📁 AI Eval Directory
- 📄 ai_eval_payload.json

psap-forge-bot · 2026-07-03T10:14:19Z

🟢 Submission of llm_d smoke succeeded after 3 minutes, 15 seconds 🟢

/test fournos llm_d smoke
/cluster psap-mgmt
/pipeline forge-replot
/replot.url https://mlflow.apps.psap-automation.ibm.rhperfscale.org/#/experiments/231/runs/392dafd5908a4447bad9d3b427838bb5/artifacts/02__test/001__llmd_test?workspace=forge-llmd

kpouget · 2026-07-03T10:45:50Z

/test fournos llm_d smoke
/cluster forge-smoke-testing
/pipeline forge-full

psap-forge-bot · 2026-07-03T10:50:31Z

🟢 Execution of `llm_d` smoke 🟢

Execution Engine Configuration

forge:
  args:
  - smoke
  configOverrides: {}
  project: llm_d

Artifact Links

Test Logs

✅ 00 Pre-Cleanup `5 seconds`

❌ 01 Prepare `2 minutes, 22 seconds`

Failure Review 011 Cluster Deploy Operator Rhcl-Operator

011__cluster_deploy_operator__rhcl-operator

The cluster_deploy_operator task failed during the wait_for_csv_ready step because the rhcl-operator.v1.4.1 ClusterServiceVersion (CSV) transitioned to a Failed phase instead of reaching Succeeded. Although the subscription and install plan were successfully created and approved, the operator installation encountered an error that prevented completion, causing the deployment workflow to abort with a critical runtime exception.

✅ 04 Post-Cleanup `6 seconds`

🔄 05 Export-Artifacts

psap-forge-bot · 2026-07-03T10:50:45Z

🔴 Submission of llm_d smoke failed after 4 minutes, 13 seconds 🔴

/test fournos llm_d smoke
/cluster forge-smoke-testing
/pipeline forge-full

openshift-ci · 2026-07-03T10:50:53Z

@kpouget: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/fournos	`0e5b508`	link	true	`/test fournos`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

kpouget force-pushed the llmd branch from fd7c2ab to c8d2109 Compare June 30, 2026 17:34

kpouget force-pushed the llmd branch from c8d2109 to 5a1e3f8 Compare June 30, 2026 18:24

kpouget force-pushed the llmd branch from 5a1e3f8 to 93d092d Compare June 30, 2026 20:33

coderabbitai Bot reviewed Jun 30, 2026

View reviewed changes

kpouget force-pushed the llmd branch 2 times, most recently from 73e4ca4 to b1bc7f7 Compare July 1, 2026 08:09

coderabbitai Bot reviewed Jul 1, 2026

View reviewed changes

kpouget force-pushed the llmd branch from b1bc7f7 to 2913cf8 Compare July 1, 2026 08:34

kpouget force-pushed the llmd branch from 5082a3a to fadad76 Compare July 1, 2026 09:02

kpouget force-pushed the llmd branch from fadad76 to 2d13f6d Compare July 1, 2026 09:17

kpouget added 20 commits July 3, 2026 10:42

[caliper] orchestration: postprocess: restructure

fc79178

[caliper] orchestration: keep working

3897e18

[caliper] engine: ai_eval: rework

356073d

[core] library: replot: finish removing the AWS vault token

f9196dd

[core] library: replot: accept a URL to replot

10c3cc4

[caliper] orchestration: replot: finish removing MLFlow/AWS references

bcfbe84

[caliper] orchestration: replot: use 'caliper import'

c4add06

[caliper] orchestration: replot: finish the update

6463800

[projects] Finish using the @ci_lib.safe_ci_entrypoint annotation

4aa4443

[core] library: export: include the caliper-generated files in the no…

32b04be

…tification

[caliper] orchestration: notification: generate the notification for …

3398c46

…the generated files

[caliper] orchestration: postprocess: track the completion time of th…

50723a4

…e steps

[llm_d] orchestration: config: enable the csv_export

5dda20d

[projects] Use JSON instead of JSONL for kpi files

9ef33a0

[llm_d] orchestration: config: workloads: capture multiple rates in t…

f76330b

…he smoke test

[caliper] orchestration: postprocess: work on the kpi.json definition

828ce9d

[projects] postprocess: use None as default KPI value, not 0

036d333

[rhaiis] postprocess: kpis: s/run_id/run_path in the KPIs definition

7a30c8c

[skeleton] postprocess: default: parsing/kpis: use the KPI annotations

3170ad9

[guidellm] postprocess: guidellm: parsing/kpis: use None as default K…

cae3d60

…PI value, not 0

kpouget force-pushed the llmd branch from 17b0262 to 770ccf8 Compare July 3, 2026 09:55

[guidellm] postprocess: guidellm: parsing: enhance

0e5b508

kpouget force-pushed the llmd branch from 770ccf8 to 0e5b508 Compare July 3, 2026 10:08

Uh oh!

Conversation

kpouget commented Jun 30, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

openshift-ci Bot commented Jun 30, 2026

Uh oh!

coderabbitai Bot commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Uh oh!

kpouget commented Jun 30, 2026

Uh oh!

psap-forge-bot Bot commented Jun 30, 2026

🟢 Execution of llm_d smoke 🟢

✅ 00 Pre-Cleanup 5 seconds

✅ 01 Prepare 2 minutes, 2 seconds

✅ 02 Preflight 1 second

✅ 02 Test 6 minutes, 36 seconds

Test Description

✅ 04 Post-Cleanup 8 seconds

🔄 05 Export-Artifacts

Uh oh!

psap-forge-bot Bot commented Jun 30, 2026

Uh oh!

kpouget commented Jun 30, 2026

Uh oh!

psap-forge-bot Bot commented Jun 30, 2026

🟢 Execution of llm_d smoke 🟢

✅ 00 Pre-Cleanup 5 seconds

✅ 01 Prepare 2 minutes, 6 seconds

✅ 02 Preflight 1 second

✅ 02 Test 6 minutes, 36 seconds

Test Description

✅ 04 Post-Cleanup 8 seconds

🔄 05 Export-Artifacts

Uh oh!

psap-forge-bot Bot commented Jun 30, 2026

Uh oh!

kpouget commented Jun 30, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

psap-forge-bot Bot commented Jun 30, 2026

🟢 Execution of llm_d smoke 🟢

✅ 00 Pre-Cleanup 5 seconds

✅ 01 Prepare 2 minutes, 5 seconds

✅ 02 Preflight 1 second

✅ 02 Test 6 minutes, 33 seconds

Test Description

✅ 04 Post-Cleanup 8 seconds

🔄 05 Export-Artifacts

Uh oh!

psap-forge-bot Bot commented Jun 30, 2026

Uh oh!

kpouget commented Jul 1, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

kpouget commented Jun 30, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 30, 2026 •

edited

Loading

🟢 Execution of `llm_d` smoke 🟢

✅ 00 Pre-Cleanup `5 seconds`

✅ 01 Prepare `2 minutes, 2 seconds`

✅ 02 Preflight `1 second`

✅ 02 Test `6 minutes, 36 seconds`

✅ 04 Post-Cleanup `8 seconds`

🟢 Execution of `llm_d` smoke 🟢

✅ 00 Pre-Cleanup `5 seconds`

✅ 01 Prepare `2 minutes, 6 seconds`

✅ 02 Preflight `1 second`

✅ 02 Test `6 minutes, 36 seconds`

✅ 04 Post-Cleanup `8 seconds`

🟢 Execution of `llm_d` smoke 🟢

✅ 00 Pre-Cleanup `5 seconds`

✅ 01 Prepare `2 minutes, 5 seconds`

✅ 02 Preflight `1 second`

✅ 02 Test `6 minutes, 33 seconds`

✅ 04 Post-Cleanup `8 seconds`

🟢 Execution of `llm_d` smoke 🟢

✅ 00 Pre-Cleanup `5 seconds`

✅ 01 Prepare `1 minute, 55 seconds`

✅ 02 Preflight `1 second`

❌ 02 Test `6 minutes, 38 seconds`

✅ 04 Post-Cleanup `9 seconds`

🟢 Execution of `llm_d` smoke 🟢

✅ 00 Pre-Cleanup `5 seconds`

✅ 01 Prepare `2 minutes, 19 seconds`

✅ 02 Preflight `1 second`

✅ 02 Test `6 minutes, 34 seconds`

✅ 04 Post-Cleanup `7 seconds`

🟢 Execution of `llm_d` smoke 🟢

✅ 00 Pre-Cleanup `5 seconds`

✅ 01 Prepare `1 minute, 52 seconds`

✅ 02 Preflight `1 second`

✅ 02 Test `6 minutes, 33 seconds`

✅ 04 Post-Cleanup `7 seconds`

🟢 Execution of `llm_d` smoke 🟢

✅ 000 Replot `20 seconds`