Side-by-side eval harness for video-understanding models — retrieval, reasoning, and structured extraction — with an LLM-as-judge and cost-aware scoring. A Solutions-Architect reference scaffold.
pegasus clip video-understanding claude video-ai anthropic llm-as-judge twelve-labs eval-harness marengo
-
Updated
May 30, 2026 - Python