Skip to content

Int64 overflow fix#7719

Open
Will-78 wants to merge 8 commits intoRdatatable:masterfrom
Will-78:Int64_Overflow_fix
Open

Int64 overflow fix#7719
Will-78 wants to merge 8 commits intoRdatatable:masterfrom
Will-78:Int64_Overflow_fix

Conversation

@Will-78
Copy link
Copy Markdown

@Will-78 Will-78 commented Apr 22, 2026

Closes #7574

This PR should fix the bug with running the code discussed within the issue. Below is the returned output we made.

> DT = data.table(i = c(1L, 1L), x = lim.integer64()[2L])
> DT[, sum(x), by=i]
       i                  V1
   <int>               <i64>
1:     1 4895412794951729152
Warning message:
In gsum(x) :
  The sum of an integer_64 column for a group was more than type 'integer_64' can hold so the result has been coerced to 'numeric' automatically for convenience. Precision has been lost in the result. Consider using 'as.numeric' on the column beforehand to avoid this warning.

All changes were made to reflect responses from int32 overflow in code above. Please let us know what we can do to improve our work or anything to change within the contribution.

Copy link
Copy Markdown
Member

@jangorecki jangorecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked closely and changes seem non trivial. It would be good someone look at it well.
Afaiu changes impact sonly int64 branches, if that's not the case then would be nice to see worst case benchmark so see the overhead of the change.

Comment thread inst/tests/tests.Rraw Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 23, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.04%. Comparing base (8364344) to head (1950bfe).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7719      +/-   ##
==========================================
- Coverage   99.04%   99.04%   -0.01%     
==========================================
  Files          87       87              
  Lines       17037    17130      +93     
==========================================
+ Hits        16874    16966      +92     
- Misses        163      164       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread src/gsumm.c
for (int i=0; i<howMany; i++) {
_ans[my_low[i]] += my_gx[i]; // does not propagate INT64 for !narm
const int64_t a = _ans[my_low[i]];
const int64_t c = my_gx[i];
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any particular reason for using a and c over a and b as in the int branch?

Comment thread inst/tests/tests.Rraw

# test for correct reponse for Datatable sum int64 overflow #7574
DT = data.table(i = c(1L, 1L), x = lim.integer64()[2L])
test(2368.1, DT[, sum(x), by = i],data.table(i = 1L, V1 = as.integer64("4895412794951729152")),warning = "The sum of an integer_64 column for a group was more than type 'integer_64' can hold so the result has been coerced to 'numeric' automatically for convenience. Precision has been lost in the result. Consider using 'as.numeric' on the column beforehand to avoid this warning.")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldnt the result be something like data.table(1L, 1.844674e+19) ?

Comment thread src/gsumm.c
if (overflow) {
UNPROTECT(1); // discard the result with overflow
warning(_("The sum of an integer_64 column for a group was more than type 'integer_64' can hold so the result has been coerced to 'numeric' automatically for convenience. Precision has been lost in the result. Consider using 'as.numeric' on the column beforehand to avoid this warning."));
const int64_t *restrict gx = gather(x, &anyNA);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need a 2nd gather call?

Comment thread src/gsumm.c
ans = PROTECT(allocVector(REALSXP, ngrp));
double *restrict ansp = REAL(ans);
memset(ansp, 0, ngrp*sizeof(double));
if (!anyNA) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and what are we doing in the case of anyNA? panic?

Comment thread NEWS.md

28. `rbindlist()` now avoids the crash when working with many non-UTF-8 column names, [#7452](https://github.com/Rdatatable/data.table/issues/7452). Thanks @aitap for the report and the fix.

29. `gsum()` now handles correctly handles integer64 overflow in data.table aggregations (e.g `DT = data.table(i = c(1L, 1L), x = lim.integer64()`), [#7574](https://github.com/Rdatatable/data.table/issues/7574). Thanks @MichaelChirico for reporting and @Will-78 for the fix.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
29. `gsum()` now handles correctly handles integer64 overflow in data.table aggregations (e.g `DT = data.table(i = c(1L, 1L), x = lim.integer64()`), [#7574](https://github.com/Rdatatable/data.table/issues/7574). Thanks @MichaelChirico for reporting and @Will-78 for the fix.
29. `gsum()` now correctly handles integer64 overflow in data.table aggregations (e.g `DT = data.table(i = c(1L, 1L), x = lim.integer64()`), [#7574](https://github.com/Rdatatable/data.table/issues/7574). Thanks @MichaelChirico for reporting and @Will-78 for the fix.

Comment thread inst/tests/tests.Rraw
# test for correct reponse for Datatable sum int64 overflow #7574
DT = data.table(i = c(1L, 1L), x = lim.integer64()[2L])
test(2368.1, DT[, sum(x), by = i],data.table(i = 1L, V1 = as.integer64("4895412794951729152")),warning = "The sum of an integer_64 column for a group was more than type 'integer_64' can hold so the result has been coerced to 'numeric' automatically for convenience. Precision has been lost in the result. Consider using 'as.numeric' on the column beforehand to avoid this warning.")
rm(DT)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im also missing some test cases e.g.:

sum(x, na.rm=TRUE) with NA + overflow
sum(x, na.rm=FALSE) with NA + overflow

Comment thread inst/tests/tests.Rraw

# test for correct reponse for Datatable sum int64 overflow #7574
DT = data.table(i = c(1L, 1L), x = lim.integer64()[2L])
test(2368.1, DT[, sum(x), by = i],data.table(i = 1L, V1 = as.integer64("4895412794951729152")),warning = "The sum of an integer_64 column for a group was more than type 'integer_64' can hold so the result has been coerced to 'numeric' automatically for convenience. Precision has been lost in the result. Consider using 'as.numeric' on the column beforehand to avoid this warning.")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class of the result is wrong. As the warning says, it must be numeric, not integer64. Once you obtain the right answer, use dput(control = 'exact') to get an exact floating point literal.

Comment thread src/gsumm.c
default:
error(_("Type '%s' is not supported by GForce %s. Either add the prefix %s or turn off GForce optimization using options(datatable.optimize=1)"), type2char(TYPEOF(x)), "sum (gsum)", "base::sum(.)");
}
copyMostAttrib(x, ans);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line (see copyMostAttrib) is responsible for copying the class attribute from x to ans. Somehow, the code will have to drop the integer64 class in case of an overflow. The easiest solution is to drop the class attribute altogether, although if x inherits from other classes as well, retaining them may be better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GForce fails to do overflow checks for integer64

4 participants