Int64 overflow fix by Will-78 · Pull Request #7719 · Rdatatable/data.table

Will-78 · 2026-04-22T23:57:11Z

This PR should fix the bug with running the code discussed within the issue. Below is the returned output we made.

> DT = data.table(i = c(1L, 1L), x = lim.integer64()[2L])
> DT[, sum(x), by=i]
       i                  V1
   <int>               <i64>
1:     1 4895412794951729152
Warning message:
In gsum(x) :
  The sum of an integer_64 column for a group was more than type 'integer_64' can hold so the result has been coerced to 'numeric' automatically for convenience. Precision has been lost in the result. Consider using 'as.numeric' on the column beforehand to avoid this warning.

All changes were made to reflect responses from int32 overflow in code above. Please let us know what we can do to improve our work or anything to change within the contribution.

…larly added overflow branch mimicking flow of INTSXP overflow setup

…=0 within outer for loop

…to Int64_Overflow_fix

jangorecki

I haven't looked closely and changes seem non trivial. It would be good someone look at it well.
Afaiu changes impact sonly int64 branches, if that's not the case then would be nice to see worst case benchmark so see the overhead of the change.

codecov · 2026-04-23T03:41:18Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.04%. Comparing base (8364344) to head (1950bfe).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #7719      +/-   ##
==========================================
- Coverage   99.04%   99.04%   -0.01%     
==========================================
  Files          87       87              
  Lines       17037    17130      +93     
==========================================
+ Hits        16874    16966      +92     
- Misses        163      164       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ben-schwen · 2026-04-23T12:07:08Z

            for (int i=0; i<howMany; i++) {
-              _ans[my_low[i]] += my_gx[i]; // does not propagate INT64 for !narm
+              const int64_t a = _ans[my_low[i]];
+              const int64_t c = my_gx[i];


any particular reason for using a and c over a and b as in the int branch?

ben-schwen · 2026-04-23T12:19:03Z

+
+# test for correct reponse for Datatable sum int64 overflow #7574
+DT = data.table(i = c(1L, 1L), x = lim.integer64()[2L])
+test(2368.1, DT[, sum(x), by = i],data.table(i = 1L, V1 = as.integer64("4895412794951729152")),warning = "The sum of an integer_64 column for a group was more than type 'integer_64' can hold so the result has been coerced to 'numeric' automatically for convenience. Precision has been lost in the result. Consider using 'as.numeric' on the column beforehand to avoid this warning.")


shouldnt the result be something like data.table(1L, 1.844674e+19) ?

ben-schwen · 2026-04-23T12:21:11Z

+    if (overflow) {
+      UNPROTECT(1); // discard the result with overflow
+      warning(_("The sum of an integer_64 column for a group was more than type 'integer_64' can hold so the result has been coerced to 'numeric' automatically for convenience. Precision has been lost in the result. Consider using 'as.numeric' on the column beforehand to avoid this warning."));
+      const int64_t *restrict gx = gather(x, &anyNA);


why do we need a 2nd gather call?

ben-schwen · 2026-04-23T12:22:08Z

+      ans = PROTECT(allocVector(REALSXP, ngrp));
+      double *restrict ansp = REAL(ans);
+      memset(ansp, 0, ngrp*sizeof(double));
+      if (!anyNA) {


and what are we doing in the case of anyNA? panic?

ben-schwen · 2026-04-23T12:23:25Z


 28. `rbindlist()` now avoids the crash when working with many non-UTF-8 column names, [#7452](https://github.com/Rdatatable/data.table/issues/7452). Thanks @aitap for the report and the fix.

+29. `gsum()` now handles correctly handles integer64 overflow in data.table aggregations (e.g `DT = data.table(i = c(1L, 1L), x = lim.integer64()`), [#7574](https://github.com/Rdatatable/data.table/issues/7574). Thanks @MichaelChirico for reporting and @Will-78 for the fix.


Suggested change

29. `gsum()` now handles correctly handles integer64 overflow in data.table aggregations (e.g `DT = data.table(i = c(1L, 1L), x = lim.integer64()`), [#7574](https://github.com/Rdatatable/data.table/issues/7574). Thanks @MichaelChirico for reporting and @Will-78 for the fix.

29. `gsum()` now correctly handles integer64 overflow in data.table aggregations (e.g `DT = data.table(i = c(1L, 1L), x = lim.integer64()`), [#7574](https://github.com/Rdatatable/data.table/issues/7574). Thanks @MichaelChirico for reporting and @Will-78 for the fix.

ben-schwen · 2026-04-23T12:24:21Z

+# test for correct reponse for Datatable sum int64 overflow #7574
+DT = data.table(i = c(1L, 1L), x = lim.integer64()[2L])
+test(2368.1, DT[, sum(x), by = i],data.table(i = 1L, V1 = as.integer64("4895412794951729152")),warning = "The sum of an integer_64 column for a group was more than type 'integer_64' can hold so the result has been coerced to 'numeric' automatically for convenience. Precision has been lost in the result. Consider using 'as.numeric' on the column beforehand to avoid this warning.")
+rm(DT)


Im also missing some test cases e.g.:

sum(x, na.rm=TRUE) with NA + overflow
sum(x, na.rm=FALSE) with NA + overflow

aitap · 2026-04-23T12:14:56Z

+
+# test for correct reponse for Datatable sum int64 overflow #7574
+DT = data.table(i = c(1L, 1L), x = lim.integer64()[2L])
+test(2368.1, DT[, sum(x), by = i],data.table(i = 1L, V1 = as.integer64("4895412794951729152")),warning = "The sum of an integer_64 column for a group was more than type 'integer_64' can hold so the result has been coerced to 'numeric' automatically for convenience. Precision has been lost in the result. Consider using 'as.numeric' on the column beforehand to avoid this warning.")


The class of the result is wrong. As the warning says, it must be numeric, not integer64. Once you obtain the right answer, use dput(control = 'exact') to get an exact floating point literal.

aitap · 2026-04-23T12:26:35Z

  default:
    error(_("Type '%s' is not supported by GForce %s. Either add the prefix %s or turn off GForce optimization using options(datatable.optimize=1)"), type2char(TYPEOF(x)), "sum (gsum)", "base::sum(.)");
  }
  copyMostAttrib(x, ans);


This line (see copyMostAttrib) is responsible for copying the class attribute from x to ans. Somehow, the code will have to drop the integer64 class in case of an overflow. The easiest solution is to drop the class attribute altogether, although if x inherits from other classes as well, retaining them may be better.

William Barnett and others added 7 commits April 20, 2026 15:26

Added logic checks mimmicking checks for overflow above REALSXP. Simi…

c307ed8

…larly added overflow branch mimicking flow of INTSXP overflow setup

Renamed variables within overflow logic to not confuse with for int b…

66bebcf

…=0 within outer for loop

Merge branch 'Rdatatable:master' into Int64_Overflow_fix

664c52a

Added test for the int64 overflow, looking for warning message

c4083b0

Merge branch 'Int64_Overflow_fix' of github.com:Will-78/data.table in…

c8b0393

…to Int64_Overflow_fix

Added issue to tests.Rraw and addition to NEWS.md

5d9e20c

Changes to fix test made within tests.Rraw

15dbca6

Will-78 requested review from MichaelChirico and ben-schwen as code owners April 22, 2026 23:57

jangorecki reviewed Apr 23, 2026

View reviewed changes

Comment thread inst/tests/tests.Rraw Outdated

Added specifications to aggregation within test()

1950bfe

ben-schwen reviewed Apr 23, 2026

View reviewed changes

aitap reviewed Apr 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Int64 overflow fix#7719

Int64 overflow fix#7719
Will-78 wants to merge 8 commits intoRdatatable:masterfrom
Will-78:Int64_Overflow_fix

Will-78 commented Apr 22, 2026

Uh oh!

jangorecki left a comment

Uh oh!

Uh oh!

codecov Bot commented Apr 23, 2026

Uh oh!

ben-schwen Apr 23, 2026

Uh oh!

ben-schwen Apr 23, 2026

Uh oh!

ben-schwen Apr 23, 2026

Uh oh!

ben-schwen Apr 23, 2026

Uh oh!

ben-schwen Apr 23, 2026

Uh oh!

ben-schwen Apr 23, 2026

Uh oh!

aitap Apr 23, 2026

Uh oh!

aitap Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		28. `rbindlist()` now avoids the crash when working with many non-UTF-8 column names, [#7452](https://github.com/Rdatatable/data.table/issues/7452). Thanks @aitap for the report and the fix.

		29. `gsum()` now handles correctly handles integer64 overflow in data.table aggregations (e.g `DT = data.table(i = c(1L, 1L), x = lim.integer64()`), [#7574](https://github.com/Rdatatable/data.table/issues/7574). Thanks @MichaelChirico for reporting and @Will-78 for the fix.

	29. `gsum()` now handles correctly handles integer64 overflow in data.table aggregations (e.g `DT = data.table(i = c(1L, 1L), x = lim.integer64()`), [#7574](https://github.com/Rdatatable/data.table/issues/7574). Thanks @MichaelChirico for reporting and @Will-78 for the fix.
	29. `gsum()` now correctly handles integer64 overflow in data.table aggregations (e.g `DT = data.table(i = c(1L, 1L), x = lim.integer64()`), [#7574](https://github.com/Rdatatable/data.table/issues/7574). Thanks @MichaelChirico for reporting and @Will-78 for the fix.

Conversation

Will-78 commented Apr 22, 2026

Uh oh!

jangorecki left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov Bot commented Apr 23, 2026

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants