Add `ReasoningParser` by junya-takayama · Pull Request #287 · sbintuitions/flexeval

junya-takayama · 2026-05-21T09:53:56Z

Add a mechanism to the VLLM and HuggingFaceLM classes to extract the reasoning content and store it in LMOutput.reasoning_text.

Example

flexeval_lm --eval_setup rakuda-v2-ja --language_model VLLM --language_model.model Qwen/Qwen3-4B --language_model.reasoning_parser UnifiedRegexReasoningParser --language_model.reasoning_parser.pattern="<think>(?P<reasoning_content>.*?)</think>(?P<content>.*)"

...
2026-05-21 21:04:48.236 | INFO     | flexeval.core.evaluate_chat_response:evaluate_chat_response:181 - Example of the output
2026-05-21 21:04:48.236 | INFO     | flexeval.core.evaluate_chat_response:evaluate_chat_response:182 - {
    "lm_output": {
        "text": "\n\n四国地方には以下の4つの都道府県があります。それぞれの県庁所在地を以下に列挙します。\n\n1. **香川県（かわかん）**  \n   - **県庁所在地**：香川市（かわかし）\n\n2. **岡山県（おうさんけん）**  \n   - **県庁所在地**：岡山市（おうさんし）\n\n3. **徳島県（とくしまけん）**  \n   - **県庁所在地**：徳島市（とくしまし）\n\n4. **広島県（ひろしまけん）**  \n   - **県庁所在地**：広島市（ひろしまし）\n\n※四国地方は日本の四つの地方の一つで、これらの4県が含まれます。",
        "raw_text": "<think>\nOkay, the user is asking for the four prefectures in the Shikoku region and their respective prefectural capitals. Let me start by recalling the four prefectures in Shikoku. I think they are Kagawa, Okayama, Tokushima, and Hiroshima. Wait, but I need to make sure. Let me think again. Shikoku is the fourth largest island in Japan, right? The four prefectures there are Kagawa, Okayama, Tokushima, and Hiroshima. Yes, that's correct.\n\nNow, the prefectural capitals. For Kagawa, I believe the capital is Kagawa City. Okayama's capital is Okayama City. Tokushima's capital is Tokushima City. And Hiroshima's capital is Hiroshima City. Let me double-check each one to be sure. Kagawa City is indeed the capital of Kagawa Prefecture. Okayama City is the capital of Okayama Prefecture. Tokushima City is the capital of Tokushima Prefecture. Hiroshima City is the capital of Hiroshima Prefecture. That seems right. \n\nWait, but sometimes people might confuse the capitals. For example, Hiroshima is a major city, but is it the capital? Yes, Hiroshima City is the capital of Hiroshima Prefecture. Okay, so the answer should be the four prefectures and their capitals as listed. I think that's all. Let me just confirm once more. Yes, those are the four prefectures in Shikoku, and their capitals are as mentioned. No mistakes there.\n</think>\n\n四国地方には以下の4つの都道府県があります。それぞれの県庁所在地を以下に列挙します。\n\n1. **香川県（かわかん）**  \n   - **県庁所在地**：香川市（かわかし）\n\n2. **岡山県（おうさんけん）**  \n   - **県庁所在地**：岡山市（おうさんし）\n\n3. **徳島県（とくしまけん）**  \n   - **県庁所在地**：徳島市（とくしまし）\n\n4. **広島県（ひろしまけん）**  \n   - **県庁所在地**：広島市（ひろしまし）\n\n※四国地方は日本の四つの地方の一つで、これらの4県が含まれます。",
        "reasoning_text": "\nOkay, the user is asking for the four prefectures in the Shikoku region and their respective prefectural capitals. Let me start by recalling the four prefectures in Shikoku. I think they are Kagawa, Okayama, Tokushima, and Hiroshima. Wait, but I need to make sure. Let me think again. Shikoku is the fourth largest island in Japan, right? The four prefectures there are Kagawa, Okayama, Tokushima, and Hiroshima. Yes, that's correct.\n\nNow, the prefectural capitals. For Kagawa, I believe the capital is Kagawa City. Okayama's capital is Okayama City. Tokushima's capital is Tokushima City. And Hiroshima's capital is Hiroshima City. Let me double-check each one to be sure. Kagawa City is indeed the capital of Kagawa Prefecture. Okayama City is the capital of Okayama Prefecture. Tokushima City is the capital of Tokushima Prefecture. Hiroshima City is the capital of Hiroshima Prefecture. That seems right. \n\nWait, but sometimes people might confuse the capitals. For example, Hiroshima is a major city, but is it the capital? Yes, Hiroshima City is the capital of Hiroshima Prefecture. Okay, so the answer should be the four prefectures and their capitals as listed. I think that's all. Let me just confirm once more. Yes, those are the four prefectures in Shikoku, and their capitals are as mentioned. No mistakes there.\n",
...

Kotaro-Aono · 2026-05-25T05:51:39Z

        ]
        lm_outputs = self._batch_complete_text(chat_messages_as_string, **kwargs)
+        if self.reasoning_parser:
+            for lm_output in lm_outputs:


ここではtoolのところにある

if lm_output.text is None: continue

的な処理はいらないんですかね？

そもそもtool, reasoning でfor文分かれてる意味ないなと思ったのでまとめた上で一番上にセットしました！
https://github.com/sbintuitions/flexeval/pull/287/changes#diff-9c4eb04cb0e3beb9678e4067391a29b6bfb2c25b6f1cf1de4729ddb6dba932deR429-R431

Kotaro-Aono · 2026-05-25T05:55:53Z

+                lm_output.text = reasoning.text
+                lm_output.reasoning_text = reasoning.reasoning_text
+
        if self.tool_parser:


resoning parserとtool_parserの同時使用って想定されていますか？
想定しているなら下でraw_textとかが上書きされてしまうのでまずい気がします

同時使用はあり得ます。
想定としては raw_text は何であれ後処理が入る前のテキストが入るので、そうなるように修正しました
（commit 操作ミスって色々くっついちゃいましたが…）

https://github.com/sbintuitions/flexeval/pull/287/changes#diff-9c4eb04cb0e3beb9678e4067391a29b6bfb2c25b6f1cf1de4729ddb6dba932deR435-R437

base.py: https://github.com/sbintuitions/flexeval/pull/287/changes#diff-3f56c1a60f9931bd179b6a566c4391ce34db99296812dc1590020d26de1f69aaR240-R243

junya-takayama · 2026-06-01T06:22:00Z

@Kotaro-Aono レビューありがとうございます！諸々対応しました

Kotaro-Aono · 2026-06-01T09:06:59Z

+            response = chat_lm.generate_chat_response([{"role": "user", "content": "test"}], max_new_tokens=1)
+        assert response.raw_text == raw_output
+        assert response.text is None
+        assert response.reasoning_text is raw_output


ここが悪さしてそうです
reasoning_text == None
raw_output == "no think tags here"
なので

Kotaro-Aono

LGTM！

junya-takayama force-pushed the add_reasoning_parser branch from 3827b57 to 9661b17 Compare May 21, 2026 09:57

junya-takayama marked this pull request as ready for review May 21, 2026 12:06

junya-takayama changed the title ~~[WIP] Add ReasoningParser~~ Add ReasoningParser May 21, 2026

junya-takayama requested a review from Kotaro-Aono May 25, 2026 05:42

Kotaro-Aono reviewed May 25, 2026

View reviewed changes

fix so that raw text is properly saved to raw_text

7dc3f8c

junya-takayama force-pushed the add_reasoning_parser branch from 6a5523e to 7dc3f8c Compare June 1, 2026 05:59

refactor

9acc5ad

junya-takayama force-pushed the add_reasoning_parser branch from 6a6fd48 to 9acc5ad Compare June 1, 2026 06:15

junya-takayama requested a review from Kotaro-Aono June 1, 2026 06:21

add mark

0baab63

Kotaro-Aono reviewed Jun 1, 2026

View reviewed changes

fix test

5d67a88

junya-takayama requested a review from Kotaro-Aono June 2, 2026 04:33

Kotaro-Aono approved these changes Jun 2, 2026

View reviewed changes

junya-takayama merged commit 9b0fe54 into main Jun 2, 2026
8 checks passed

junya-takayama deleted the add_reasoning_parser branch June 2, 2026 04:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `ReasoningParser`#287

Add `ReasoningParser`#287
junya-takayama merged 4 commits into
mainfrom
add_reasoning_parser

junya-takayama commented May 21, 2026 •

edited

Loading

Uh oh!

Kotaro-Aono May 25, 2026

Uh oh!

junya-takayama Jun 1, 2026 •

edited

Loading

Uh oh!

Kotaro-Aono May 25, 2026

Uh oh!

junya-takayama Jun 1, 2026

Uh oh!

junya-takayama commented Jun 1, 2026

Uh oh!

Kotaro-Aono Jun 1, 2026

Uh oh!

Kotaro-Aono left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

junya-takayama commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kotaro-Aono May 25, 2026

Choose a reason for hiding this comment

Uh oh!

junya-takayama Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Kotaro-Aono May 25, 2026

Choose a reason for hiding this comment

Uh oh!

junya-takayama Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

junya-takayama commented Jun 1, 2026

Uh oh!

Kotaro-Aono Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Kotaro-Aono left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

junya-takayama commented May 21, 2026 •

edited

Loading

junya-takayama Jun 1, 2026 •

edited

Loading