Skip to content

Evaluating story_scores and window_scores incorrectly? #10

@eitanturok

Description

@eitanturok

Hi!

I'm trying to re-recreate the results of your paper and was wondering if you can explain why story_scores and window_scores are computed identically here:

# get raw score and normalized score for each window
window_scores[(reference, mname)] = metric.score(ref = ref_windows, pred = pred_windows)
window_zscores[(reference, mname)] = (window_scores[(reference, mname)] - window_null_scores.mean(0)) / window_null_scores.std(0)

# get raw score and normalized score for the entire story
story_scores[(reference, mname)] = metric.score(ref = ref_windows, pred = pred_windows)
story_zscores[(reference, mname)] = (story_scores[(reference, mname)].mean() - story_null_scores.mean()) / story_null_scores.std()

My understanding is that story_scores and window_scores should compute different things:

  • story_scores - a single score for the whole story, all words at once
  • window_scores - a list of scores, one score for each window of text.
    However, these values are the same and I wasn't sure if this was a mistake.

Also, does Table 1 of the paper report the story_scores or window_scores?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions