Consolidate Dataclass data update methods - use DIB for update only by XingY · Pull Request #7483 · LabKey/platform

XingY · 2026-03-10T03:06:33Z

Rationale

This PR consolidates DataClass data update operations to use the DataIteratorBuilder (DIB) pipeline exclusively — the same approach already used for SampleType updates.

Related Pull Requests

Changes

DataClass updates now go through the DIB pipeline exclusively.
LSID is rejected as a standalone update key with clear error messages.
The provisioned table lsid column is dropped via deferred upgrade.
Partition-based DIB logic is extracted and shared.
getAltKeysForUpdate is removed from the interface and API.

Copilot

Pull request overview

This PR consolidates DataClass and Sample update behavior around DataIteratorBuilder (DIB)-based updates, removing LSID as an update/merge key and cleaning up deprecated DataClass storage by dropping the provisioned lsid column via a module upgrade.

Changes:

Remove LSID-as-key support for sample/data update & merge; enforce RowId or Name-based keys and ignore LSID when provided alongside valid keys.
Introduce shared updateRowsUsingPartitionedDIB() logic in AbstractQueryUpdateService and route Sample/DataClass updates through it.
Remove provisioned DataClass lsid column (domain kind + upgrade code + DB scripts) and update integration tests accordingly.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
experiment/src/org/labkey/experiment/samples/AbstractExpFolderImporter.java	Stops setting the removed `UseLsidForUpdate` option during TSV import.
experiment/src/org/labkey/experiment/api/SampleTypeUpdateServiceDI.java	Routes updates through partitioned DIB; removes LSID-key paths and legacy update methods.
experiment/src/org/labkey/experiment/api/ExpDataImpl.java	Selects DataClass provisioned properties by RowId (not LSID) to support dropped provisioned LSID column.
experiment/src/org/labkey/experiment/api/ExpDataClassDataTableImpl.java	Enforces RowId/Name keys, blocks LSID-only keying, blocks RowId in merge (behind feature), and uses partitioned DIB updates.
experiment/src/org/labkey/experiment/api/DataClassDomainKind.java	Removes `lsid` from provisioned DataClass base properties, indexes, and FKs.
experiment/src/org/labkey/experiment/ExperimentUpgradeCode.java	Adds deferred upgrade to drop provisioned DataClass `lsid` column (and related indices).
experiment/src/org/labkey/experiment/ExperimentModule.java	Bumps schema version to 26.005 and centralizes experimental flag constant usage.
experiment/src/org/labkey/experiment/ExpDataIterators.java	Updates derivation/update logic to use RowId/Name instead of LSID-for-update config; removes LSID update config usage.
experiment/src/client/test/integration/DataClassCrud.ispec.ts	Expands integration coverage for keying behavior (RowId/Name/LSID) and partitioned-duplicate detection.
experiment/resources/schemas/dbscripts/sqlserver/exp-26.004-26.005.sql	Executes Java upgrade to drop provisioned DataClass `lsid` column (SQL Server).
experiment/resources/schemas/dbscripts/postgresql/exp-26.004-26.005.sql	Executes Java upgrade to drop provisioned DataClass `lsid` column (PostgreSQL).
api/src/org/labkey/api/query/DefaultQueryUpdateService.java	Removes conversion-table hook and always uses `getDbTable()` for `convertTypes()` entrypoint.
api/src/org/labkey/api/query/AbstractQueryUpdateService.java	Adds `updateRowsUsingPartitionedDIB()` and small null-iterator safety in `_pump()`.
api/src/org/labkey/api/exp/query/ExpDataTable.java	Reorders enum constants (no functional behavior change expected).
api/src/org/labkey/api/exp/api/ExperimentService.java	Removes `UseLsidForUpdate` query option; adds shared experimental feature constant.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-10T03:15:16Z

+        // using update from file, verify update using rowId for data that doesn't exist on this datacalss should fail.
+        errorResp = await ExperimentCRUDUtils.importData(server, "RowId\tDescription\n" + row3RowId + "\tupdate\n", emptyDataClass, "UPDATE", topFolderOptions, editorUserOptions);
+        expect(errorResp.text).toContain('Data not found for [' + row3RowId + ']');


Typo in comment: "datacalss" should be "dataclass".

Copilot · 2026-03-10T03:15:17Z

+        // update date twice specifying the rowId across multiple partitions
+        await server.post('query', 'updateRows', {


Typo in comment: "update date" should be "update data".

Copilot · 2026-03-10T03:15:17Z

+        OptionalFeatureService.get().addExperimentalFeatureFlag(ExperimentService.EXPERIMENTAL_FEATURE_ALLOW_ROW_ID_MERGE, "Allow RowId to be accepted when merging samples or dataclass data",
                "If the incoming data includes a RowId column we will allow the column but ignore it's values.", false);


Grammar in this feature description: "ignore it's values" should be "ignore its values".

Copilot · 2026-03-10T03:15:17Z

                di = new SampleUpdateDerivationDataIterator(di, context, _container, _user, _currentDataType, _checkRequiredParents);
            else
-                di = new DataUpdateDerivationDataIterator(di, context, _container, _user, _currentDataType, _checkRequiredParents);
+                di = new DataUpdateDerivationDataIterator(di, context, _container, _user, _currentDataType, _checkRequiredParents);//


Remove the trailing "//" at the end of this line; it looks like leftover debugging text and adds noise to the code.

Copilot · 2026-03-10T03:15:18Z

            if (hasNext)
            {
-                String lsid = (String) get(_lsidCol);
+                String lsid = (String) get(_lsidCol); // why lsid?, insert or merge


Please remove the inline comment "// why lsid?, insert or merge". It reads like a note-to-self and doesn't clarify behavior for future readers.

Copilot · 2026-03-10T03:15:18Z

            for (Map<String, Object> dataRow : dataRows.values())
-                lsids.add((String) dataRow.get("lsid"));
+                lsids.add((String) dataRow.get("lsid")); // ?


Remove the "// ?" comment here. If there's an open question about whether LSIDs are always present in these rows, it should be resolved in code (e.g., null-check + clear error) rather than left as an inline question.

Copilot · 2026-03-10T03:15:18Z

+        // update date twice specifying the name across multiple partitions
        await server.post('query', 'updateRows', {


Typo in comment: "update date" should be "update data".

labkey-adam · 2026-03-10T03:19:51Z

@XingY there's an exp-26.004-26.005.sql script in 26.3 that should be merged to develop shortly.

XingY · 2026-03-10T16:56:49Z

@XingY there's an exp-26.004-26.005.sql script in 26.3 that should be merged to develop shortly.

OK. Thanks for the heads-up.

labkey-nicka · 2026-04-24T18:49:24Z

    @Override
    public Object get(int i)
    {
+        assert(i <= existingColIndex);


Provide a message when this fails so we have a bit more context.

labkey-nicka · 2026-04-24T18:50:51Z

+    final IntHashMap<String> lsids = new IntHashMap<>();
+    final DataIteratorContext _context;
+
+    public DataClassUpdateAddColumnsDataIterator(DataIterator in, @NotNull DataIteratorContext context, TableInfo target, Container container, long dataClassId, String keyColumnName)


Switch the first argument to be of type CachingDataIterator and remove the casting a couple lines below. I recommend the same for SampleUpdateAddColumnsDataIterator.

labkey-nicka · 2026-04-24T18:54:06Z

+ * Queries the LSID from exp.data based on the provided key (rowId or name) and dataClassId.
+ * The LSID is needed downstream for attachment handling.
+ */
+public class DataClassUpdateAddColumnsDataIterator extends WrapperDataIterator


nit: Feels like this could somehow extend a base class of ExistingRecordDataIterator so we could share a lot of the logic (same for SampleUpdateAddColumnsDataIterator). Not necessarily in scope for this PR but now that we have three implementations...

labkey-nicka · 2026-04-24T18:55:56Z

        _unwrapped.mark();  // unwrapped _delegate
        boolean ret = super.next();
-        if (ret)
+        if (!_context.getErrors().hasErrors() && ret)


nit: Switch the checks so the "cheaper" check is first.

if (ret && !_context.getErrors().hasErrors())

labkey-nicka · 2026-04-24T18:57:08Z

            else if ("Sample".equals(prefix) || ExpMaterial.DEFAULT_CPAS_TYPE.equals(prefix))
            {
-                String xarJobId = "." + XAR_JOB_ID_NAME_SUB; // XarJobId is more concise than XarFileId
+                String xarJobId = "." + XAR_JOB_ID_NAME_SUB + "."; // XarJobId is more concise than XarFileId


How did this ever work?

The old way is obviously wrong and generates a LSID that's not ideal, but it 'works' most of the time. The only time actual error would arise is when the following collision happens: XarJob11.11:3 vs XarJob1.111:3, which would have both been XarJob1111:3 with the old format. Otherwise, export/import works fine because the bad LSID does match up between sample/data and experimentRun.

labkey-nicka · 2026-04-24T19:27:23Z

-                drop.remove("rowid");// keep rowid for audit log
+                String message = String.format("LSID is no longer accepted as a key for data %s. Specify a RowId or Name instead.", isMerge ? "merge" : "update");
+                context.getErrors().addRowError(new ValidationException(message, LSID.name()));
+                return null;


Should here also check for?

if (context.getConfigParameterBoolean(ExperimentService.QueryOptions.UseProvidedLsidForXarImport)) drop.remove(LSID.name());

labkey-nicka · 2026-04-24T19:29:57Z

+                    context.getErrors().addRowError(new ValidationException(String.format(DUPLICATE_COLUMN_IN_DATA_ERROR, ExpDataTable.Column.RowId.name())));
+                    return null;
+                }
+                step0.addNullColumn(Column.LSID.name(), JdbcType.VARCHAR);


Just curious, why is it necessary to add the null column for LSID here? Is it in preparation for DataClassUpdateAddColumnsDataIterator?

labkey-nicka · 2026-04-24T19:31:01Z

                step0.addColumn(nameCol, (Supplier<String>)() -> null);
            }

+            if (Boolean.TRUE.equals(context.getSelectIds()) && !columnNameMap.containsKey(RowId.name()))


Is this something we should support on the sample side as well?

labkey-nicka · 2026-04-24T19:40:15Z

                    .setKeyColumns(propertyKeyColumns)
                    .setDontUpdate(dontUpdate)
                    .setRemapSchemaColumns(((UpdateableTableInfo) _expTable).remapSchemaColumns())
+                    .setCommitRowsBeforeContinuing(shouldCommitRowsBeforeContinuing)


This is already true for the TableInsertDataIteratorBuilder configured for exp.data. If we always set this to true here won't that be consistent and no worse for performance?

I think maybe it's needed for exp.data because downstream task relies on objectid/rowId that's newly generated/queried. For the provisioned table, so far we only ever needed this for reclassify. I do worry user will see performance degradation if we enable this broadly.

labkey-nicka · 2026-04-24T19:41:14Z

        OptionalFeatureService.get().addExperimentalFeatureFlag(ExperimentService.EXPERIMENTAL_FEATURE_FROM_EXPANCESTORS, "SQL syntax: 'FROM EXPANCESTORS()'",
                "Support for querying lineage of experiment objects", false, true);
-        OptionalFeatureService.get().addExperimentalFeatureFlag(SampleTypeUpdateServiceDI.EXPERIMENTAL_FEATURE_ALLOW_ROW_ID_SAMPLE_MERGE, "Allow RowId to be accepted when merging samples",
+        OptionalFeatureService.get().addExperimentalFeatureFlag(ExperimentService.EXPERIMENTAL_FEATURE_ALLOW_ROW_ID_MERGE, "Allow RowId to be accepted when merging samples or dataclass data",


nit: "data class"

XingY added 3 commits March 9, 2026 19:23

Consolidate Dataclass data update methods - use DIB for update only

e661698

Enable upgrade script

09dd6b0

CRLF

b5e2ac2

XingY requested a review from Copilot March 10, 2026 03:07

Copilot started reviewing on behalf of XingY March 10, 2026 03:08 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

XingY added 21 commits March 10, 2026 11:16

Merge remote-tracking branch 'origin/develop' into fb_sourceDIB

dc19ef1

fix build

4d9ff01

merge from develop

a678378

fix merge

a68e8a8

fix merge

9d5aa75

fix update

4893660

Merge remote-tracking branch 'origin/develop' into fb_sourceDIB

4b5e0ef

Merge remote-tracking branch 'origin/develop' into fb_sourceDIB

d3aea2b

merge from develop

44f7811

fix attachment

a35f233

Merge remote-tracking branch 'origin/develop' into fb_sourceDIB

28142df

Fix alias in audit

04f83ff

don't check for fail fast for DataClassUpdateAddColumnsDataIterator

3446ba4

Merge remote-tracking branch 'origin/develop' into fb_sourceDIB

2a91cc6

Make sample consistent with data for failfast

312a13c

Fix reclassify

60898fa

fix folder import

988d5fd

Merge remote-tracking branch 'origin/develop' into fb_sourceDIB

22745ed

Remove altUpdateKeys from QueryInfo

e07508c

CC review

896c5cb

revert null values

53b876f

XingY mentioned this pull request Apr 24, 2026

Remove altUpdateKeys from QueryInfo LabKey/labkey-ui-components#1988

Open

XingY mentioned this pull request Apr 24, 2026

Drop LSID from provisioned dataclass tables LabKey/testAutomation#2965

Open

crlf

edca844

labkey-nicka assigned XingY Apr 24, 2026

XingY requested review from labkey-matthewb and labkey-nicka April 24, 2026 18:07

Merge remote-tracking branch 'origin/develop' into fb_sourceDIB

3080753

labkey-nicka reviewed Apr 24, 2026

View reviewed changes

Merge remote-tracking branch 'origin/develop' into fb_sourceDIB

401896e

		// update date twice specifying the rowId across multiple partitions
		await server.post('query', 'updateRows', {

		OptionalFeatureService.get().addExperimentalFeatureFlag(ExperimentService.EXPERIMENTAL_FEATURE_ALLOW_ROW_ID_MERGE, "Allow RowId to be accepted when merging samples or dataclass data",
		"If the incoming data includes a RowId column we will allow the column but ignore it's values.", false);

		// update date twice specifying the name across multiple partitions
		await server.post('query', 'updateRows', {

Conversation

XingY commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale

Related Pull Requests

Changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

labkey-adam commented Mar 10, 2026

Uh oh!

XingY commented Mar 10, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

XingY Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

XingY commented Mar 10, 2026 •

edited

Loading

XingY Apr 24, 2026 •

edited

Loading