Conversation
…larly added overflow branch mimicking flow of INTSXP overflow setup
…=0 within outer for loop
…to Int64_Overflow_fix
jangorecki
left a comment
There was a problem hiding this comment.
I haven't looked closely and changes seem non trivial. It would be good someone look at it well.
Afaiu changes impact sonly int64 branches, if that's not the case then would be nice to see worst case benchmark so see the overhead of the change.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #7719 +/- ##
==========================================
- Coverage 99.04% 99.04% -0.01%
==========================================
Files 87 87
Lines 17037 17130 +93
==========================================
+ Hits 16874 16966 +92
- Misses 163 164 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| for (int i=0; i<howMany; i++) { | ||
| _ans[my_low[i]] += my_gx[i]; // does not propagate INT64 for !narm | ||
| const int64_t a = _ans[my_low[i]]; | ||
| const int64_t c = my_gx[i]; |
There was a problem hiding this comment.
any particular reason for using a and c over a and b as in the int branch?
|
|
||
| # test for correct reponse for Datatable sum int64 overflow #7574 | ||
| DT = data.table(i = c(1L, 1L), x = lim.integer64()[2L]) | ||
| test(2368.1, DT[, sum(x), by = i],data.table(i = 1L, V1 = as.integer64("4895412794951729152")),warning = "The sum of an integer_64 column for a group was more than type 'integer_64' can hold so the result has been coerced to 'numeric' automatically for convenience. Precision has been lost in the result. Consider using 'as.numeric' on the column beforehand to avoid this warning.") |
There was a problem hiding this comment.
shouldnt the result be something like data.table(1L, 1.844674e+19) ?
| if (overflow) { | ||
| UNPROTECT(1); // discard the result with overflow | ||
| warning(_("The sum of an integer_64 column for a group was more than type 'integer_64' can hold so the result has been coerced to 'numeric' automatically for convenience. Precision has been lost in the result. Consider using 'as.numeric' on the column beforehand to avoid this warning.")); | ||
| const int64_t *restrict gx = gather(x, &anyNA); |
There was a problem hiding this comment.
why do we need a 2nd gather call?
| ans = PROTECT(allocVector(REALSXP, ngrp)); | ||
| double *restrict ansp = REAL(ans); | ||
| memset(ansp, 0, ngrp*sizeof(double)); | ||
| if (!anyNA) { |
There was a problem hiding this comment.
and what are we doing in the case of anyNA? panic?
|
|
||
| 28. `rbindlist()` now avoids the crash when working with many non-UTF-8 column names, [#7452](https://github.com/Rdatatable/data.table/issues/7452). Thanks @aitap for the report and the fix. | ||
|
|
||
| 29. `gsum()` now handles correctly handles integer64 overflow in data.table aggregations (e.g `DT = data.table(i = c(1L, 1L), x = lim.integer64()`), [#7574](https://github.com/Rdatatable/data.table/issues/7574). Thanks @MichaelChirico for reporting and @Will-78 for the fix. |
There was a problem hiding this comment.
| 29. `gsum()` now handles correctly handles integer64 overflow in data.table aggregations (e.g `DT = data.table(i = c(1L, 1L), x = lim.integer64()`), [#7574](https://github.com/Rdatatable/data.table/issues/7574). Thanks @MichaelChirico for reporting and @Will-78 for the fix. | |
| 29. `gsum()` now correctly handles integer64 overflow in data.table aggregations (e.g `DT = data.table(i = c(1L, 1L), x = lim.integer64()`), [#7574](https://github.com/Rdatatable/data.table/issues/7574). Thanks @MichaelChirico for reporting and @Will-78 for the fix. |
| # test for correct reponse for Datatable sum int64 overflow #7574 | ||
| DT = data.table(i = c(1L, 1L), x = lim.integer64()[2L]) | ||
| test(2368.1, DT[, sum(x), by = i],data.table(i = 1L, V1 = as.integer64("4895412794951729152")),warning = "The sum of an integer_64 column for a group was more than type 'integer_64' can hold so the result has been coerced to 'numeric' automatically for convenience. Precision has been lost in the result. Consider using 'as.numeric' on the column beforehand to avoid this warning.") | ||
| rm(DT) |
There was a problem hiding this comment.
Im also missing some test cases e.g.:
sum(x, na.rm=TRUE) with NA + overflow
sum(x, na.rm=FALSE) with NA + overflow
|
|
||
| # test for correct reponse for Datatable sum int64 overflow #7574 | ||
| DT = data.table(i = c(1L, 1L), x = lim.integer64()[2L]) | ||
| test(2368.1, DT[, sum(x), by = i],data.table(i = 1L, V1 = as.integer64("4895412794951729152")),warning = "The sum of an integer_64 column for a group was more than type 'integer_64' can hold so the result has been coerced to 'numeric' automatically for convenience. Precision has been lost in the result. Consider using 'as.numeric' on the column beforehand to avoid this warning.") |
There was a problem hiding this comment.
The class of the result is wrong. As the warning says, it must be numeric, not integer64. Once you obtain the right answer, use dput(control = 'exact') to get an exact floating point literal.
| default: | ||
| error(_("Type '%s' is not supported by GForce %s. Either add the prefix %s or turn off GForce optimization using options(datatable.optimize=1)"), type2char(TYPEOF(x)), "sum (gsum)", "base::sum(.)"); | ||
| } | ||
| copyMostAttrib(x, ans); |
There was a problem hiding this comment.
This line (see copyMostAttrib) is responsible for copying the class attribute from x to ans. Somehow, the code will have to drop the integer64 class in case of an overflow. The easiest solution is to drop the class attribute altogether, although if x inherits from other classes as well, retaining them may be better.
Closes #7574
This PR should fix the bug with running the code discussed within the issue. Below is the returned output we made.
All changes were made to reflect responses from int32 overflow in code above. Please let us know what we can do to improve our work or anything to change within the contribution.