fix output for query cli#2017
Conversation
Greptile SummaryThis PR filters the output of the
|
| Filename | Overview |
|---|---|
| nemo_retriever/src/nemo_retriever/adapters/cli/sdk_workflow.py | Adds a projection step after retriever.query() to expose only text/source/page_number; uses .get() safely for all fields. |
| nemo_retriever/tests/test_root_cli_workflow.py | Renames hits to raw_hits/public_hits but leaves two assertions referencing the deleted hits variable, causing a NameError at runtime. |
Sequence Diagram
sequenceDiagram
participant User
participant CLI as CLI (main.py)
participant QD as query_documents()
participant R as Retriever
User->>CLI: retriever query "..."
CLI->>QD: "query_documents(query, **opts)"
QD->>R: "Retriever(**retriever_kwargs)"
QD->>R: retriever.query(query)
R-->>QD: raw_hits [text, source, page_number, _distance, ...]
Note over QD: Project to {text, source, page_number} only
QD-->>CLI: filtered_hits
CLI-->>User: JSON output (no scores/internal fields)
Comments Outside Diff (1)
-
nemo_retriever/tests/test_root_cli_workflow.py, line 369-370 (link)Stale variable reference breaks the test. The variable
hitswas renamed toraw_hits/public_hitsin this PR, but these two assertions still reference the old name. Running the test suite as-is will raise aNameError: name 'hits' is not definedat both lines, so the test always fails.Prompt To Fix With AI
This is a comment left during a code review. Path: nemo_retriever/tests/test_root_cli_workflow.py Line: 369-370 Comment: Stale variable reference breaks the test. The variable `hits` was renamed to `raw_hits` / `public_hits` in this PR, but these two assertions still reference the old name. Running the test suite as-is will raise a `NameError: name 'hits' is not defined` at both lines, so the test always fails. How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
nemo_retriever/tests/test_root_cli_workflow.py:369-370
Stale variable reference breaks the test. The variable `hits` was renamed to `raw_hits` / `public_hits` in this PR, but these two assertions still reference the old name. Running the test suite as-is will raise a `NameError: name 'hits' is not defined` at both lines, so the test always fails.
```suggestion
assert json.loads(result.output) == public_hits
assert result.output == json.dumps(public_hits, indent=2, sort_keys=True, default=str) + "\n"
```
Reviews (4): Last reviewed commit: "Merge branch 'main' into clean-cli-skill..." | Re-trigger Greptile
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
…gest into clean-cli-skills
| retriever = Retriever(top_k=top_k, vdb_kwargs={"uri": lancedb_uri, "table_name": table_name}) | ||
| return retriever.query(query) | ||
| hits = retriever.query(query) | ||
| hits = [{"text": hit.get("text", ""), "source": hit.get("source", ""), "page_number": hit.get("page_number")} for hit in hits] |
There was a problem hiding this comment.
The list comprehension on this line is ~130 characters, exceeding the project's enforced 120-character limit (Black + Flake8). The pre-commit hook will reject the file as-is, blocking CI. Black would reformat it to a multi-line style.
| hits = [{"text": hit.get("text", ""), "source": hit.get("source", ""), "page_number": hit.get("page_number")} for hit in hits] | |
| hits = [ | |
| { | |
| "text": hit.get("text", ""), | |
| "source": hit.get("source", ""), | |
| "page_number": hit.get("page_number"), | |
| } | |
| for hit in hits | |
| ] |
Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/adapters/cli/sdk_workflow.py
Line: 85
Comment:
The list comprehension on this line is ~130 characters, exceeding the project's enforced 120-character limit (Black + Flake8). The pre-commit hook will reject the file as-is, blocking CI. Black would reformat it to a multi-line style.
```suggestion
hits = [
{
"text": hit.get("text", ""),
"source": hit.get("source", ""),
"page_number": hit.get("page_number"),
}
for hit in hits
]
```
How can I resolve this? If you propose a fix, please make it concise.
randerzander
left a comment
There was a problem hiding this comment.
returns are much cleaner, thanks!
Description
cleans output of query sub command to only display source, page_number and text.
Checklist