Evaluating `story_scores` and `window_scores` incorrectly?

Hi!

I'm trying to re-recreate the results of your paper and was wondering if you can explain why `story_scores` and `window_scores` are computed identically [here](https://github.com/HuthLab/semantic-decoding/blob/6b25dd5720ced1c4199cd9b9149061683b77ba6b/decoding/evaluate_predictions.py#L66):

```py
# get raw score and normalized score for each window
window_scores[(reference, mname)] = metric.score(ref = ref_windows, pred = pred_windows)
window_zscores[(reference, mname)] = (window_scores[(reference, mname)] - window_null_scores.mean(0)) / window_null_scores.std(0)

# get raw score and normalized score for the entire story
story_scores[(reference, mname)] = metric.score(ref = ref_windows, pred = pred_windows)
story_zscores[(reference, mname)] = (story_scores[(reference, mname)].mean() - story_null_scores.mean()) / story_null_scores.std()
```

My understanding is that `story_scores` and `window_scores` should compute different things:
* `story_scores` - a single score for the whole story, all words at once
* `window_scores` - a list of scores, one score for each window of text.
However, these values are the same and I wasn't sure if this was a mistake.

Also, does Table 1 of the paper report the story_scores or window_scores?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluating `story_scores` and `window_scores` incorrectly? #10

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Evaluating story_scores and window_scores incorrectly? #10

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Evaluating `story_scores` and `window_scores` incorrectly? #10