Skip to content

[llm-d] Work on the KPI generation#100

Open
kpouget wants to merge 27 commits into
openshift-psap:mainfrom
kpouget:llmd
Open

[llm-d] Work on the KPI generation#100
kpouget wants to merge 27 commits into
openshift-psap:mainfrom
kpouget:llmd

Conversation

@kpouget

@kpouget kpouget commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Summary by CodeRabbit

  • New Features
    • Added MLflow artifact import from a web UI URL, including improved download reporting and result summaries.
    • Introduced KPI auto-discovery with KPI CSV export and an HTML KPI summary report for GuideLLM.
    • Enhanced postprocessing to generate richer step logs, KPI/AI-eval exports (with per-entry artifacts), and include postprocess status details in export notifications.
    • Expanded replot with URL-based downloading and a --keep-download option.
  • Bug Fixes
    • Improved CLI error messaging (including missing artifact-import output directory), with clearer handling for TLS/cert, connection/auth, timeouts, and filesystem failures.

@openshift-ci

openshift-ci Bot commented Jun 30, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign tosokin for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds MLflow web-UI URL-based artifact import to the Caliper CLI, introduces a KPI decorator/discovery/catalog framework, refactors GuideLLM KPI computation and adds HTML/CSV KPI reporting, reworks postprocess orchestration into a class-based flow with step logging and AI-eval artifact export, changes replot's download path to shell out via CLI, and renames a shared CI decorator.

Changes

Caliper and GuideLLM pipeline changes

Layer / File(s) Summary
MLflow import flow
projects/caliper/cli/main.py
Adds MLflow URL parsing/validation, connection setup (secrets, TLS), client/experiment init, timeout-bound artifact download, result formatting, new --from-mlflow-url option, and Click MissingParameter handling.
KPI decorators and discovery
projects/caliper/engine/kpi/decorators.py, projects/caliper/engine/kpi/__init__.py
Adds decorator classes for KPI metadata/formatting/2D axes, label extraction utilities, KPI function discovery, and catalog building, exported via package __init__.
GuideLLM KPI computation
projects/guidellm/postprocess/guidellm/parsing/kpis.py
Replaces hardcoded KPI mappings with decorated module-level KPI functions, dynamic catalog/computation via discovery, and per-record label/metadata extraction.
KPI CSV export support
projects/guidellm/postprocess/guidellm/csv_export/*
Adds KPICsvRow/KPICsvSchema models, KPICsvExporter, list/model export helpers, and package exports.
KPI report generation and plugin wiring
projects/guidellm/postprocess/guidellm/plotting/kpi_report.py, projects/guidellm/postprocess/guidellm/plugin.py, projects/guidellm/postprocess/guidellm/visualize-groups.yaml
Adds HTML KPI report generation, registers report_kpi_summary in PLOT_REGISTRY, adds get_ai_eval_artifact_files hook, and updates visualization groups.

Estimated code review effort: 4 (Complex) | ~75 minutes

Postprocess orchestration, step logging, and AI-eval export

Layer / File(s) Summary
AI evaluation artifact export
projects/caliper/engine/model.py, projects/caliper/engine/ai_eval.py
Adds get_ai_eval_artifact_files plugin hook and reworks AI-eval export to copy per-test-entry artifacts and record exported entry metadata.
KPI CSV export configuration
projects/caliper/orchestration/postprocess_config.py
Adds CaliperOrchestrationKpiCsvExportSection and wires it into KPI orchestration config with defaults.
Step-scoped logging and command banners
projects/caliper/orchestration/step_logging.py
Introduces thread-local step log handlers, indexed step logging context managers, cleanup, and reproducible command-banner logging per step.
Postprocess orchestrator refactor
projects/caliper/orchestration/postprocess.py
Refactors KPI generate/export/AI-eval export, adds KPI CSV export and analyze stub with command logging, and replaces prior function flow with CaliperPostprocessOrchestrator.
Postprocess status notification formatting
projects/caliper/orchestration/notification.py
Adds PostprocessStepResult/PostprocessResult dataclasses and functions to format Markdown notifications and parse raw status data.

Estimated code review effort: 4 (Complex) | ~60 minutes

Replot MLflow download refactor

Layer / File(s) Summary
CLI-based MLflow artifact download
projects/caliper/orchestration/replot.py
Replaces direct MLflow SDK download with a subprocess call to the CLI artifacts import command, diffing files and centralizing error handling.
Conditional download and postprocess skipping
projects/caliper/orchestration/replot.py
Adds skip-download logic when no URL is provided or directory already has content, and refines cleanup status.
Replot CLI options and keep-directory precedence
projects/core/library/replot.py
Adds --url and --keep-download CLI options, removes AWS vault secret handling, and gives CLI keep-flag precedence over config.

Estimated code review effort: 3 (Moderate) | ~30 minutes

CI entrypoint decorator rename

Layer / File(s) Summary
safe_ci_entrypoint decorator rename
projects/core/library/postprocess.py, projects/core/agentic/on_failure/__init__.py, projects/core/ci_entrypoint/fournos_resolve.py, docs/orchestration/ci.md
Renames @ci_lib.safe_ci_command to @ci_lib.safe_ci_entrypoint across entrypoints and docs.
Export notification MLflow link wiring
projects/core/library/export.py
Adds MLflow URL helpers for step/file links and integrates postprocess status parsing/notification into step log export.

Estimated code review effort: 2 (Simple) | ~15 minutes

KServe deployment capture

Layer / File(s) Summary
Related deployments capture task
projects/kserve/toolbox/capture_llmisvc_state/main.py
Adds a task that runs oc get deployments for a service label and writes the result to a JSON artifact file.

Estimated code review effort: 1 (Trivial) | ~5 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title is related to the main theme of the PR and accurately indicates KPI-generation work, even though it is broad.
Docstring Coverage ✅ Passed Docstring coverage is 88.74% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@kpouget

kpouget commented Jun 30, 2026

Copy link
Copy Markdown
Contributor Author

/test fournos llm_d smoke
/cluster forge-smoke-testing
/pipeline forge-full

@psap-forge-bot

Copy link
Copy Markdown

🟢 Execution of llm_d smoke 🟢

Execution Engine Configuration

forge:
  args:
  - smoke
  configOverrides: {}
  project: llm_d

Artifact Links

Test Logs

00 Pre-Cleanup 5 seconds

01 Prepare 2 minutes, 2 seconds

02 Preflight 1 second

02 Test 6 minutes, 36 seconds

Test Description

This test performs a smoke validation of the llm_d project by deploying the Qwen3-0.6B model using the approximate-prefix-cache deployment profile. It verifies inference functionality with a short benchmark workload (256 prompt tokens, 128 output tokens at 1 req/sec) and a default completion request.

04 Post-Cleanup 8 seconds

🔄 05 Export-Artifacts

@psap-forge-bot

Copy link
Copy Markdown

@kpouget

kpouget commented Jun 30, 2026

Copy link
Copy Markdown
Contributor Author

/test fournos llm_d smoke
/cluster forge-smoke-testing
/pipeline forge-full

@psap-forge-bot

Copy link
Copy Markdown

🟢 Execution of llm_d smoke 🟢

Execution Engine Configuration

forge:
  args:
  - smoke
  configOverrides: {}
  project: llm_d

Artifact Links

Test Logs

00 Pre-Cleanup 5 seconds

01 Prepare 2 minutes, 6 seconds

02 Preflight 1 second

02 Test 6 minutes, 36 seconds

Test Description

This test validates the llm_d project by executing a smoke check and short benchmark using the Qwen/Qwen3-0.6B model deployed with the approximate-prefix-cache profile and internal gateway routing.

04 Post-Cleanup 8 seconds

🔄 05 Export-Artifacts

@psap-forge-bot

Copy link
Copy Markdown

@kpouget

kpouget commented Jun 30, 2026

Copy link
Copy Markdown
Contributor Author

/test fournos llm_d smoke
/cluster forge-smoke-testing
/pipeline forge-full

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (1)
projects/caliper/engine/kpi/decorators.py (1)

166-173: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Drop the hardcoded KPI name prefixes from discovery.

This engine-level helper only finds decorated functions whose names start with guidellm_, matbench_, or kpi_, so a correctly decorated KPI silently disappears if it uses any other name. Prefer filtering by decorator metadata plus obj.__module__ == module.__name__ instead of encoding downstream plugin prefixes here.

♻️ Possible direction
-    for name, obj in inspect.getmembers(module, inspect.isfunction):
-        if (
-            name.startswith(("guidellm_", "matbench_", "kpi_"))
-            and hasattr(obj, "_kpi_help")
-            and hasattr(obj, "_kpi_unit")
-            and hasattr(obj, "_kpi_higher_is_better")
-        ):
-            kpi_functions[name] = obj
+    for name, obj in inspect.getmembers(module, inspect.isfunction):
+        if (
+            obj.__module__ == module.__name__
+            and hasattr(obj, "_kpi_help")
+            and hasattr(obj, "_kpi_unit")
+            and hasattr(obj, "_kpi_higher_is_better")
+        ):
+            kpi_functions[name] = obj
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/caliper/engine/kpi/decorators.py` around lines 166 - 173, Remove the
hardcoded name-prefix check from the KPI discovery loop in the module-scanning
helper so decorated KPIs are found regardless of function name. Update the logic
in the inspect.getmembers-based scan to rely on the decorator metadata fields
(_kpi_help, _kpi_unit, _kpi_higher_is_better) and ensure the function belongs to
the current module via obj.__module__ == module.__name__ before adding it to
kpi_functions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@projects/caliper/cli/main.py`:
- Around line 209-214: The secrets-based tracking URI is only applied inside the
verbose logging path, so `MLFLOW_TRACKING_URI` and
`MLFLOW_TRACKING_INSECURE_TLS` from `--mlflow-secrets` should be processed
regardless of `-v`. Update the flow in `main()` so the connection settings
(`final_tracking_uri`, `final_insecure_tls`) are populated before any
verbose-only exit/logic, and make sure the secrets handling branch that reads
`MLFLOW_TRACKING_URI` and `MLFLOW_TRACKING_INSECURE_TLS` is not gated by
`click.echo` or `verbose` state.
- Around line 58-60: The endpoint construction in main.py is stripping the path
from parsed URLs, which breaks path-prefixed MLflow tracking links. Update the
endpoint logic around the parsed.scheme/parsed.netloc handling to preserve any
non-empty parsed.path (while still removing fragment/query as needed) so URLs
like https://host/mlflow/#/... continue to point to the correct tracking URI.
- Around line 1037-1041: The --timeout option in main.py currently allows 0 or
negative values, which can disable the alarm and remove the download timeout.
Update the click.option definition for --timeout to enforce a minimum of 1,
using the existing timeout option in the CLI entrypoint so invalid values are
rejected before reaching the alarm setup.

In `@projects/guidellm/postprocess/guidellm/parsing/kpis.py`:
- Around line 137-145: The fallback in guidellm_throughput_curve() is
synthesizing fake 2D KPI points from request_rate when throughput_curve is
missing, which makes GuideLLMKpiHandler.compute_kpis() and the HTML report treat
invented data as real measurements. Remove the scalar-derived fallback in the
KPI parsing path (including the duplicate logic around the other referenced
section) and return an empty list whenever unified_record.metrics does not
contain throughput_curve, keeping only the real curve parsing from the existing
tuple data.

In `@projects/guidellm/postprocess/guidellm/plotting/kpi_report.py`:
- Around line 30-40: The KPI report is incorrectly treating a multi-record input
as a single record by reading only records[0] in the label/KPI assembly path.
Update GuideLLM KPI report generation to either explicitly aggregate/group the
records before building labels and values, or raise an error when more than one
record is passed; make the change in the code path that uses first_record,
LABEL_EXTRACTOR.extract, and all_labels so the output cannot silently ignore
extra records.
- Line 133: The HTML generation in kpi_report.py is interpolating untrusted
benchmark fields directly into the report, so escape every dynamic value before
rendering. Update the code around the report_title interpolation and the other
HTML builders that use labels, KPI descriptions, and test metadata so they all
pass through the same escaping helper before being inserted into f-strings. Use
the existing report rendering functions and any shared formatting helpers in
kpi_report.py to apply this consistently across all affected sections.
- Around line 64-80: The KPI rendering in kpi_report is re-deriving precision
instead of honoring the decorator metadata, so update the report generation and
2D rendering paths to use the format strings already attached to the KPI
function (_kpi_format, _kpi_x_format, and _kpi_y_format). In the kpi_record
assembly and the HTML/value formatting logic that consumes it, preserve and
apply those declared formats from the KPI function rather than forcing decimal
places, including the 2D-specific branches for x and y values.

---

Nitpick comments:
In `@projects/caliper/engine/kpi/decorators.py`:
- Around line 166-173: Remove the hardcoded name-prefix check from the KPI
discovery loop in the module-scanning helper so decorated KPIs are found
regardless of function name. Update the logic in the inspect.getmembers-based
scan to rely on the decorator metadata fields (_kpi_help, _kpi_unit,
_kpi_higher_is_better) and ensure the function belongs to the current module via
obj.__module__ == module.__name__ before adding it to kpi_functions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6f5ba352-2848-4e1b-9740-70a85402604b

📥 Commits

Reviewing files that changed from the base of the PR and between 0fb7938 and 93d092d.

📒 Files selected for processing (7)
  • projects/caliper/cli/main.py
  • projects/caliper/engine/kpi/__init__.py
  • projects/caliper/engine/kpi/decorators.py
  • projects/guidellm/postprocess/guidellm/parsing/kpis.py
  • projects/guidellm/postprocess/guidellm/plotting/kpi_report.py
  • projects/guidellm/postprocess/guidellm/plugin.py
  • projects/guidellm/postprocess/guidellm/visualize-groups.yaml

Comment thread projects/caliper/cli/main.py Outdated
Comment thread projects/caliper/cli/main.py Outdated
Comment thread projects/caliper/cli/main.py
Comment thread projects/guidellm/postprocess/guidellm/parsing/kpis.py Outdated
Comment thread projects/guidellm/postprocess/guidellm/plotting/kpi_report.py Outdated
Comment thread projects/guidellm/postprocess/guidellm/plotting/kpi_report.py
Comment thread projects/guidellm/postprocess/guidellm/plotting/kpi_report.py Outdated
@psap-forge-bot

Copy link
Copy Markdown

🟢 Execution of llm_d smoke 🟢

Execution Engine Configuration

forge:
  args:
  - smoke
  configOverrides: {}
  project: llm_d

Artifact Links

Test Logs

00 Pre-Cleanup 5 seconds

01 Prepare 2 minutes, 5 seconds

02 Preflight 1 second

02 Test 6 minutes, 33 seconds

Test Description

This test performs a smoke validation for the llm_d project to verify the approximate-prefix-cache deployment profile using the lightweight Qwen/Qwen3-0.6B model. It specifically evaluates basic inference under a short benchmark configuration with 256 prompt tokens, 128 output tokens, and a concurrency rate of 1.

04 Post-Cleanup 8 seconds

🔄 05 Export-Artifacts

@psap-forge-bot

Copy link
Copy Markdown

@kpouget kpouget force-pushed the llmd branch 2 times, most recently from 73e4ca4 to b1bc7f7 Compare July 1, 2026 08:09
@kpouget

kpouget commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

/test fournos llm_d smoke
/cluster forge-smoke-testing
/pipeline forge-full #

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
projects/caliper/orchestration/step_logging.py (1)

1-422: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Run ruff format on this file before merge.

The current formatting is already failing both Ruff format checks in CI.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/caliper/orchestration/step_logging.py` around lines 1 - 422, The
file’s formatting is not Ruff-compliant, so run ruff format on step_logging.py
and keep the resulting style changes only. Focus on preserving the existing
behavior in functions like _get_next_step_index, step_logging_indexed,
step_logging, and the log_*_command helpers while aligning spacing, wrapping,
and line breaks with Ruff’s formatter so CI passes.

Source: Pipeline failures

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@projects/caliper/orchestration/postprocess_config.py`:
- Around line 81-84: The csv_export output field currently allows null in
postprocess_config, which later breaks _run_kpi_csv_export() when it combines
output_dir with postprocess_config.kpi.csv_export.output. Update the
CsvExport/PostprocessConfig field definition so csv_export.output is required to
be a non-null string, keeping the default of "kpis.csv" and removing any None
allowance from the type or validation.

In `@projects/caliper/orchestration/postprocess.py`:
- Around line 583-600: The HTML report generation in the postprocess flow is
using the optional visualize path instead of the resolved visualize output
directory, so the final pages can be skipped when callers rely on
caliper.postprocess.visualize.output_dir. Update the report-writing block in
postprocess.py to use the resolved output directory chosen by run_visualize(),
and pass that resolved path into generate_caliper_reports_index and
generate_postprocess_status_report so both reports are always written to the
actual visualization output location.
- Around line 228-230: The CSV export path in postprocess output handling is not
sandboxed, so absolute paths or path traversal can escape the artifact
directory. Update the output path logic in the postprocess flow around
output_file creation to resolve the configured CSV export path against
output_dir and enforce that the final path stays within that directory, using
the existing postprocess_config.kpi.csv_export.output and output_file handling.
Reject or normalize any config value that would write outside the postprocess
output dir.
- Around line 503-510: The KPI CSV export step in postprocess.py is logging an
input path from postprocess_config.kpi.generate.output even when the KPI
generate step did not run or failed, so adjust the _run_kpi_csv_export call path
handling to only reference a real JSONL file. Use the existing identifiers
step_logging, kpi_csv_export, and postprocess_config.kpi.generate to determine
whether the generate output exists; if it does not, avoid emitting a
reproduction banner with a fake --input value and instead pass a valid created
path or skip logging that command.

In `@projects/guidellm/postprocess/guidellm/csv_export/kpi_csv_exporter.py`:
- Around line 49-53: The KPI CSV export flow in KpiCsvExporter currently treats
2D records as scalar data when include_2d_kpis is enabled, which silently
corrupts the export. Update the 2D handling branch in the exporter logic to fail
fast instead of appending those records to scalar_records, and surface a clear
error from the export path so callers know 2D KPI CSV export is not supported
yet. Keep the fix localized to the record-processing code that checks
record.get("is_2d", False) and the subsequent CSV conversion path.

In `@projects/guidellm/postprocess/guidellm/csv_export/kpi_csv_model.py`:
- Around line 205-208: The KPI CSV export header is using the column name
instead of the actual schema version value. Update the header generation in the
KPI CSV model so the Schema Version comment reads from the stored schema version
data rather than looking up the 'schema_version' entry in self.columns, and keep
the change localized to the header-building logic.

In `@projects/guidellm/postprocess/guidellm/parsing/kpis.py`:
- Around line 344-350: `performance_tier` is comparing
`record.metrics.get("total_tok/sec", 0)` directly, so string values can raise
TypeError and break `TestLabelExtractor.extract()` fallback behavior. Update the
KPI derivation in `compute_kpis()`/the `performance_tier` mapping to normalize
`total_tok/sec` to a numeric type before the threshold checks, and make sure any
access through `record.metrics` handles missing or non-numeric values
consistently.
- Around line 245-250: `guidellm_request_latency_max` is incorrectly annotated
as higher-is-better, which reverses the ranking semantics for this latency KPI.
Update the decorators on `guidellm_request_latency_max` in `kpis.py` to use the
lower-is-better equivalent instead of `@HigherBetter()`, while keeping the
existing `@Format("{:.4f}")` and `@KPIMetadata` metadata unchanged so consumers
interpret higher latency as worse.
- Around line 424-432: The KPI record built in compute_kpis is missing decorator
metadata such as help, so create_csv_row_from_kpi_record cannot populate the CSV
kpi_help column. Update the kpi_record assembly in compute_kpis to carry through
the existing decorator fields from kpi_func, especially the help text (and any
other relevant metadata already available on the KPI definition), so downstream
CSV export reads them instead of emitting empty values.

---

Outside diff comments:
In `@projects/caliper/orchestration/step_logging.py`:
- Around line 1-422: The file’s formatting is not Ruff-compliant, so run ruff
format on step_logging.py and keep the resulting style changes only. Focus on
preserving the existing behavior in functions like _get_next_step_index,
step_logging_indexed, step_logging, and the log_*_command helpers while aligning
spacing, wrapping, and line breaks with Ruff’s formatter so CI passes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 69d184da-e489-49ef-9e9b-27f5dbfc9f81

📥 Commits

Reviewing files that changed from the base of the PR and between 93d092d and 73e4ca4.

📒 Files selected for processing (13)
  • projects/caliper/cli/main.py
  • projects/caliper/engine/kpi/__init__.py
  • projects/caliper/engine/kpi/decorators.py
  • projects/caliper/orchestration/postprocess.py
  • projects/caliper/orchestration/postprocess_config.py
  • projects/caliper/orchestration/step_logging.py
  • projects/guidellm/postprocess/guidellm/csv_export/__init__.py
  • projects/guidellm/postprocess/guidellm/csv_export/kpi_csv_exporter.py
  • projects/guidellm/postprocess/guidellm/csv_export/kpi_csv_model.py
  • projects/guidellm/postprocess/guidellm/parsing/kpis.py
  • projects/guidellm/postprocess/guidellm/plotting/kpi_report.py
  • projects/guidellm/postprocess/guidellm/plugin.py
  • projects/guidellm/postprocess/guidellm/visualize-groups.yaml
✅ Files skipped from review due to trivial changes (1)
  • projects/caliper/engine/kpi/init.py
🚧 Files skipped from review as they are similar to previous changes (5)
  • projects/guidellm/postprocess/guidellm/plugin.py
  • projects/guidellm/postprocess/guidellm/visualize-groups.yaml
  • projects/guidellm/postprocess/guidellm/plotting/kpi_report.py
  • projects/caliper/engine/kpi/decorators.py
  • projects/caliper/cli/main.py

Comment on lines +81 to +84
output: str | None = Field(
default="kpis.csv",
description="CSV filename or path; relative paths resolve under the post-processing artifact dir.",
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Don't allow csv_export.output to be null.

_run_kpi_csv_export() later does output_dir / postprocess_config.kpi.csv_export.output, so output: null passes validation here and then fails at runtime with TypeError as soon as the step is enabled.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/caliper/orchestration/postprocess_config.py` around lines 81 - 84,
The csv_export output field currently allows null in postprocess_config, which
later breaks _run_kpi_csv_export() when it combines output_dir with
postprocess_config.kpi.csv_export.output. Update the CsvExport/PostprocessConfig
field definition so csv_export.output is required to be a non-null string,
keeping the default of "kpis.csv" and removing any None allowance from the type
or validation.

Comment on lines +228 to +230
# Create output file path
output_file = output_dir / postprocess_config.kpi.csv_export.output
output_file.parent.mkdir(parents=True, exist_ok=True)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Keep CSV exports inside the postprocess output dir.

output_dir / postprocess_config.kpi.csv_export.output is not a sandbox: an absolute value discards output_dir, and .. segments still escape it after resolution. A bad config can therefore write outside the artifact tree despite the config contract saying relative paths stay under the post-processing dir.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/caliper/orchestration/postprocess.py` around lines 228 - 230, The
CSV export path in postprocess output handling is not sandboxed, so absolute
paths or path traversal can escape the artifact directory. Update the output
path logic in the postprocess flow around output_file creation to resolve the
configured CSV export path against output_dir and enforce that the final path
stays within that directory, using the existing
postprocess_config.kpi.csv_export.output and output_file handling. Reject or
normalize any config value that would write outside the postprocess output dir.

Comment on lines +503 to +510
# KPI CSV export (uses the same computed KPIs)
if postprocess_config.kpi.csv_export.enabled:
with step_logging("caliper_kpi_csv_export", step_logs_dir):
# Path to the JSONL file for reference in command logging
kpi_jsonl_path = output_dir / postprocess_config.kpi.generate.output
kpi_results["kpi_csv_export"] = _run_kpi_csv_export(
postprocess_config, plugin, model, output_dir, kpi_jsonl_path
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Don't log a CSV-export input file that was never created.

This always points the reproduction banner at output_dir / postprocess_config.kpi.generate.output, even when kpi.generate was disabled or failed. The logged caliper kpi csv-export --input ... command is then not actually reproducible.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/caliper/orchestration/postprocess.py` around lines 503 - 510, The
KPI CSV export step in postprocess.py is logging an input path from
postprocess_config.kpi.generate.output even when the KPI generate step did not
run or failed, so adjust the _run_kpi_csv_export call path handling to only
reference a real JSONL file. Use the existing identifiers step_logging,
kpi_csv_export, and postprocess_config.kpi.generate to determine whether the
generate output exists; if it does not, avoid emitting a reproduction banner
with a fake --input value and instead pass a valid created path or skip logging
that command.

Comment thread projects/caliper/orchestration/postprocess.py Outdated
Comment on lines +49 to +53
if record.get("is_2d", False):
if self.include_2d_kpis:
# TODO: Implement 2D KPI export
scalar_records.append(record)
else:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Fail fast when include_2d_kpis=True.

That branch currently appends raw 2D KPI records into a scalar CSV schema. The later conversion just stringifies the point list into one cell and drops the x/y metadata, so callers get a “successful” export with corrupted 2D data.

Proposed fix
             if record.get("is_2d", False):
                 if self.include_2d_kpis:
-                    # TODO: Implement 2D KPI export
-                    scalar_records.append(record)
+                    raise NotImplementedError("2D KPI CSV export is not implemented yet")
                 else:
                     skipped_2d_count += 1
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if record.get("is_2d", False):
if self.include_2d_kpis:
# TODO: Implement 2D KPI export
scalar_records.append(record)
else:
if record.get("is_2d", False):
if self.include_2d_kpis:
raise NotImplementedError("2D KPI CSV export is not implemented yet")
else:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/guidellm/postprocess/guidellm/csv_export/kpi_csv_exporter.py` around
lines 49 - 53, The KPI CSV export flow in KpiCsvExporter currently treats 2D
records as scalar data when include_2d_kpis is enabled, which silently corrupts
the export. Update the 2D handling branch in the exporter logic to fail fast
instead of appending those records to scalar_records, and surface a clear error
from the export path so callers know 2D KPI CSV export is not supported yet.
Keep the fix localized to the record-processing code that checks
record.get("is_2d", False) and the subsequent CSV conversion path.

Comment on lines +205 to +208
header_lines.append("# GuideLLM KPI Export")
header_lines.append(
f"# Schema Version: {self.columns[self.columns.index('schema_version')]}"
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Write the actual schema version in the header comment.

This currently emits # Schema Version: schema_version because it reads the column name from self.columns instead of the version value. The exported header is misleading as written.

Proposed fix
-        header_lines.append(
-            f"# Schema Version: {self.columns[self.columns.index('schema_version')]}"
-        )
+        header_lines.append("# Schema Version: 1")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
header_lines.append("# GuideLLM KPI Export")
header_lines.append(
f"# Schema Version: {self.columns[self.columns.index('schema_version')]}"
)
header_lines.append("# GuideLLM KPI Export")
header_lines.append("# Schema Version: 1")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/guidellm/postprocess/guidellm/csv_export/kpi_csv_model.py` around
lines 205 - 208, The KPI CSV export header is using the column name instead of
the actual schema version value. Update the header generation in the KPI CSV
model so the Schema Version comment reads from the stored schema version data
rather than looking up the 'schema_version' entry in self.columns, and keep the
change localized to the header-building logic.

Comment thread projects/guidellm/postprocess/guidellm/parsing/kpis.py Outdated
Comment on lines +344 to +350
"performance_tier": (
"high"
if record.metrics.get("total_tok/sec", 0) > 1000
else "medium"
if record.metrics.get("total_tok/sec", 0) > 100
else "low"
),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Cast total_tok/sec before deriving performance_tier.

If record.metrics["total_tok/sec"] arrives as a string, these > comparisons raise TypeError. TestLabelExtractor.extract() then falls back to {}, which drops every extracted test-condition label for that record in both compute_kpis() and the HTML report.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/guidellm/postprocess/guidellm/parsing/kpis.py` around lines 344 -
350, `performance_tier` is comparing `record.metrics.get("total_tok/sec", 0)`
directly, so string values can raise TypeError and break
`TestLabelExtractor.extract()` fallback behavior. Update the KPI derivation in
`compute_kpis()`/the `performance_tier` mapping to normalize `total_tok/sec` to
a numeric type before the threshold checks, and make sure any access through
`record.metrics` handles missing or non-numeric values consistently.

Comment thread projects/guidellm/postprocess/guidellm/parsing/kpis.py
@psap-forge-bot

psap-forge-bot Bot commented Jul 1, 2026

Copy link
Copy Markdown

🟢 Execution of llm_d smoke 🟢

Execution Engine Configuration

forge:
  args:
  - smoke
  configOverrides: {}
  project: llm_d

Artifact Links

Test Logs

00 Pre-Cleanup 5 seconds

01 Prepare 1 minute, 55 seconds

02 Preflight 1 second

02 Test 6 minutes, 38 seconds

Test Description

This test validates the llm_d project by running a smoke evaluation of the Qwen/Qwen3-0.6B model deployed with the approximate-prefix-cache profile. It specifically tests basic inference stability and prefix caching behavior using a lightweight short benchmark configured for 256 prompt tokens, 128 output tokens, and a single concurrent request rate.

04 Post-Cleanup 9 seconds

🔄 05 Export-Artifacts

@psap-forge-bot

psap-forge-bot Bot commented Jul 1, 2026

Copy link
Copy Markdown

@kpouget

kpouget commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

/test fournos llm_d smoke
/cluster forge-smoke-testing
/pipeline forge-full

@psap-forge-bot

psap-forge-bot Bot commented Jul 1, 2026

Copy link
Copy Markdown

🟢 Execution of llm_d smoke 🟢

Execution Engine Configuration

forge:
  args:
  - smoke
  configOverrides: {}
  project: llm_d

Artifact Links

Test Logs

00 Pre-Cleanup 5 seconds

01 Prepare 2 minutes, 19 seconds

02 Preflight 1 second

02 Test 6 minutes, 34 seconds

Test Description

This test executes a smoke test for the llm_d project using the Qwen/Qwen3-0.6B model deployed with the approximate-prefix-cache profile, running a short benchmark workload of 20 samples with 256 prompt tokens and 128 output tokens at a concurrent rate of 1.

04 Post-Cleanup 7 seconds

🔄 05 Export-Artifacts

@psap-forge-bot

psap-forge-bot Bot commented Jul 1, 2026

Copy link
Copy Markdown

@kpouget

kpouget commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

/test fournos llm_d smoke
/cluster forge-smoke-testing
/pipeline forge-full

@psap-forge-bot

psap-forge-bot Bot commented Jul 1, 2026

Copy link
Copy Markdown

🟢 Execution of llm_d smoke 🟢

Execution Engine Configuration

forge:
  args:
  - smoke
  configOverrides: {}
  project: llm_d

Artifact Links

Test Logs

00 Pre-Cleanup 5 seconds

01 Prepare 1 minute, 52 seconds

02 Preflight 1 second

02 Test 6 minutes, 33 seconds

Test Description

This test validates the llm_d project by applying the smoke and deployment-approximate-prefix-cache presets to evaluate the approximate prefix caching deployment profile. It verifies basic inference functionality and latency using a short benchmark workload (256 input/128 output tokens at rate 1) with the Qwen/Qwen3-0.6B model.

04 Post-Cleanup 7 seconds

🔄 05 Export-Artifacts

@psap-forge-bot

psap-forge-bot Bot commented Jul 1, 2026

Copy link
Copy Markdown

kpouget added 20 commits July 3, 2026 10:42
@kpouget

kpouget commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author

@psap-forge-bot

psap-forge-bot Bot commented Jul 3, 2026

Copy link
Copy Markdown

🟢 Execution of llm_d smoke 🟢

Execution Engine Configuration

forge:
  args:
  - smoke
  configOverrides:
    caliper.replot.url: https://mlflow.apps.psap-automation.ibm.rhperfscale.org/#/experiments/231/runs/392dafd5908a4447bad9d3b427838bb5/artifacts/02__test/001__llmd_test?workspace=forge-llmd
  project: llm_d

Artifact Links

Test Logs

000 Replot 20 seconds

🔄 001 Export-Artifacts

Post-processing Status ✅

@psap-forge-bot

psap-forge-bot Bot commented Jul 3, 2026

Copy link
Copy Markdown
🟢 Submission of llm_d smoke succeeded after 3 minutes, 15 seconds 🟢
/test fournos llm_d smoke
/cluster psap-mgmt
/pipeline forge-replot
/replot.url https://mlflow.apps.psap-automation.ibm.rhperfscale.org/#/experiments/231/runs/392dafd5908a4447bad9d3b427838bb5/artifacts/02__test/001__llmd_test?workspace=forge-llmd

@kpouget

kpouget commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author

/test fournos llm_d smoke
/cluster forge-smoke-testing
/pipeline forge-full

@psap-forge-bot

psap-forge-bot Bot commented Jul 3, 2026

Copy link
Copy Markdown

🟢 Execution of llm_d smoke 🟢

Execution Engine Configuration

forge:
  args:
  - smoke
  configOverrides: {}
  project: llm_d

Artifact Links

Test Logs

00 Pre-Cleanup 5 seconds

01 Prepare 2 minutes, 22 seconds

Failure Review 011 Cluster Deploy Operator Rhcl-Operator

011__cluster_deploy_operator__rhcl-operator

The cluster_deploy_operator task failed during the wait_for_csv_ready step because the rhcl-operator.v1.4.1 ClusterServiceVersion (CSV) transitioned to a Failed phase instead of reaching Succeeded. Although the subscription and install plan were successfully created and approved, the operator installation encountered an error that prevented completion, causing the deployment workflow to abort with a critical runtime exception.

04 Post-Cleanup 6 seconds

🔄 05 Export-Artifacts

@psap-forge-bot

psap-forge-bot Bot commented Jul 3, 2026

Copy link
Copy Markdown

@openshift-ci

openshift-ci Bot commented Jul 3, 2026

Copy link
Copy Markdown

@kpouget: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/fournos 0e5b508 link true /test fournos

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant