Skip to content

Aspects data fixes#170

Open
saraburns1 wants to merge 6 commits into
openedx:mainfrom
saraburns1:aspects_changes
Open

Aspects data fixes#170
saraburns1 wants to merge 6 commits into
openedx:mainfrom
saraburns1:aspects_changes

Conversation

@saraburns1

@saraburns1 saraburns1 commented Jun 12, 2026

Copy link
Copy Markdown
Contributor
  1. deprecate unneeded performance model (the dataset that used this is changing)

  2. Remove 'emission_time' from model (was preventing replacingmerge)

  3. use last response model to get more accurate data

the original query was taking the first successful response for each actor and left joining all attempts - which means that if an actor never had a successful attempt, their actor_id and number of attempts would be NULL. we then did a distinct at the end which would only keep 1 record for each problem that never had a correct attempt instead of actually counting how many attempts were made

the new query uses the last response for each actor regardless of if its successful or not. this way, we can get an accurate count of incorrect and correct responses and all data is populated for each attempt.

part of
openedx/openedx-aspects#369
openedx/openedx-aspects#370
openedx/openedx-aspects#372

group by
org,
course_key,
emission_time,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand how this is causing the replacing merge tree issues, were there events with duplicate timestamps being incorrectly aggregated here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the timestamps were causing events to NOT be aggregated but we need them to. the mv is keyed on response and the response_count should have been updated to the aggregate each time a new event came in, but the emission_time made the count always 1 and then the mv would just replace the previous record with the same values and still a count of 1

first_success.attempts as attempts,
first_success.actor_id as actor_id,
splitByChar('@', events.problem_id)[3] as block_id_short,
last_response.org as org,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been out of this for a while, can you write up a quick explanation of the fix?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the original query was taking the first successful response for each actor and left joining all attempts - which means that if an actor never had a successful attempt, their actor_id and number of attempts would be NULL. we then did a distinct at the end which would only keep 1 record for each problem that never had a correct attempt instead of actually counting how many attempts were made

the new query uses the last response for each actor regardless of if its successful or not. this way, we can get an accurate count of incorrect and correct responses and all data is populated for each attempt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants