Aspects data fixes#170
Conversation
| group by | ||
| org, | ||
| course_key, | ||
| emission_time, |
There was a problem hiding this comment.
I'm trying to understand how this is causing the replacing merge tree issues, were there events with duplicate timestamps being incorrectly aggregated here?
There was a problem hiding this comment.
the timestamps were causing events to NOT be aggregated but we need them to. the mv is keyed on response and the response_count should have been updated to the aggregate each time a new event came in, but the emission_time made the count always 1 and then the mv would just replace the previous record with the same values and still a count of 1
| first_success.attempts as attempts, | ||
| first_success.actor_id as actor_id, | ||
| splitByChar('@', events.problem_id)[3] as block_id_short, | ||
| last_response.org as org, |
There was a problem hiding this comment.
I've been out of this for a while, can you write up a quick explanation of the fix?
There was a problem hiding this comment.
the original query was taking the first successful response for each actor and left joining all attempts - which means that if an actor never had a successful attempt, their actor_id and number of attempts would be NULL. we then did a distinct at the end which would only keep 1 record for each problem that never had a correct attempt instead of actually counting how many attempts were made
the new query uses the last response for each actor regardless of if its successful or not. this way, we can get an accurate count of incorrect and correct responses and all data is populated for each attempt.
deprecate unneeded performance model (the dataset that used this is changing)
Remove 'emission_time' from model (was preventing replacingmerge)
use last response model to get more accurate data
the original query was taking the first successful response for each actor and left joining all attempts - which means that if an actor never had a successful attempt, their actor_id and number of attempts would be NULL. we then did a distinct at the end which would only keep 1 record for each problem that never had a correct attempt instead of actually counting how many attempts were made
the new query uses the last response for each actor regardless of if its successful or not. this way, we can get an accurate count of incorrect and correct responses and all data is populated for each attempt.
part of
openedx/openedx-aspects#369
openedx/openedx-aspects#370
openedx/openedx-aspects#372