Skip to content

fix output for query cli#2017

Open
jperez999 wants to merge 5 commits into
NVIDIA:mainfrom
jperez999:clean-cli-skills
Open

fix output for query cli#2017
jperez999 wants to merge 5 commits into
NVIDIA:mainfrom
jperez999:clean-cli-skills

Conversation

@jperez999
Copy link
Copy Markdown
Collaborator

Description

cleans output of query sub command to only display source, page_number and text.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

@jperez999 jperez999 self-assigned this May 12, 2026
@jperez999 jperez999 requested review from a team as code owners May 12, 2026 00:05
@jperez999 jperez999 requested a review from edknv May 12, 2026 00:05
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 12, 2026

Greptile Summary

This PR filters the output of the query CLI sub-command to expose only text, source, and page_number, stripping internal fields like _distance from the JSON printed to the user.

  • sdk_workflow.py: a projection list-comprehension is added after retriever.query(), using .get() with defaults for each public field.
  • test_root_cli_workflow.py: the fixture is split into raw_hits (full VDB response) and public_hits (projected output), but the final assertions were not updated and still reference the now-undefined hits variable, causing a NameError at test runtime.

Confidence Score: 4/5

Safe to merge once the stale hits reference in the test assertions is corrected — the source change itself is correct.

The production code change is sound, but the test was left with two assertions referencing a deleted variable (hits), so the test suite will error out every time it runs until this is fixed.

nemo_retriever/tests/test_root_cli_workflow.py — lines 369-370 reference the undefined hits variable.

Important Files Changed

Filename Overview
nemo_retriever/src/nemo_retriever/adapters/cli/sdk_workflow.py Adds a projection step after retriever.query() to expose only text/source/page_number; uses .get() safely for all fields.
nemo_retriever/tests/test_root_cli_workflow.py Renames hits to raw_hits/public_hits but leaves two assertions referencing the deleted hits variable, causing a NameError at runtime.

Sequence Diagram

sequenceDiagram
    participant User
    participant CLI as CLI (main.py)
    participant QD as query_documents()
    participant R as Retriever

    User->>CLI: retriever query "..."
    CLI->>QD: "query_documents(query, **opts)"
    QD->>R: "Retriever(**retriever_kwargs)"
    QD->>R: retriever.query(query)
    R-->>QD: raw_hits [text, source, page_number, _distance, ...]
    Note over QD: Project to {text, source, page_number} only
    QD-->>CLI: filtered_hits
    CLI-->>User: JSON output (no scores/internal fields)
Loading

Comments Outside Diff (1)

  1. nemo_retriever/tests/test_root_cli_workflow.py, line 369-370 (link)

    P1 Stale variable reference breaks the test. The variable hits was renamed to raw_hits / public_hits in this PR, but these two assertions still reference the old name. Running the test suite as-is will raise a NameError: name 'hits' is not defined at both lines, so the test always fails.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: nemo_retriever/tests/test_root_cli_workflow.py
    Line: 369-370
    
    Comment:
    Stale variable reference breaks the test. The variable `hits` was renamed to `raw_hits` / `public_hits` in this PR, but these two assertions still reference the old name. Running the test suite as-is will raise a `NameError: name 'hits' is not defined` at both lines, so the test always fails.
    
    
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
nemo_retriever/tests/test_root_cli_workflow.py:369-370
Stale variable reference breaks the test. The variable `hits` was renamed to `raw_hits` / `public_hits` in this PR, but these two assertions still reference the old name. Running the test suite as-is will raise a `NameError: name 'hits' is not defined` at both lines, so the test always fails.

```suggestion
    assert json.loads(result.output) == public_hits
    assert result.output == json.dumps(public_hits, indent=2, sort_keys=True, default=str) + "\n"
```

Reviews (4): Last reviewed commit: "Merge branch 'main' into clean-cli-skill..." | Re-trigger Greptile

Comment thread nemo_retriever/src/nemo_retriever/adapters/cli/sdk_workflow.py Outdated
jperez999 and others added 3 commits May 11, 2026 20:41
retriever = Retriever(top_k=top_k, vdb_kwargs={"uri": lancedb_uri, "table_name": table_name})
return retriever.query(query)
hits = retriever.query(query)
hits = [{"text": hit.get("text", ""), "source": hit.get("source", ""), "page_number": hit.get("page_number")} for hit in hits]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 The list comprehension on this line is ~130 characters, exceeding the project's enforced 120-character limit (Black + Flake8). The pre-commit hook will reject the file as-is, blocking CI. Black would reformat it to a multi-line style.

Suggested change
hits = [{"text": hit.get("text", ""), "source": hit.get("source", ""), "page_number": hit.get("page_number")} for hit in hits]
hits = [
{
"text": hit.get("text", ""),
"source": hit.get("source", ""),
"page_number": hit.get("page_number"),
}
for hit in hits
]
Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/adapters/cli/sdk_workflow.py
Line: 85

Comment:
The list comprehension on this line is ~130 characters, exceeding the project's enforced 120-character limit (Black + Flake8). The pre-commit hook will reject the file as-is, blocking CI. Black would reformat it to a multi-line style.

```suggestion
    hits = [
        {
            "text": hit.get("text", ""),
            "source": hit.get("source", ""),
            "page_number": hit.get("page_number"),
        }
        for hit in hits
    ]
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Collaborator

@randerzander randerzander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returns are much cleaner, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants