Consolidate Dataclass data update methods - use DIB for update only#7483
Consolidate Dataclass data update methods - use DIB for update only#7483
Conversation
There was a problem hiding this comment.
Pull request overview
This PR consolidates DataClass and Sample update behavior around DataIteratorBuilder (DIB)-based updates, removing LSID as an update/merge key and cleaning up deprecated DataClass storage by dropping the provisioned lsid column via a module upgrade.
Changes:
- Remove LSID-as-key support for sample/data update & merge; enforce RowId or Name-based keys and ignore LSID when provided alongside valid keys.
- Introduce shared
updateRowsUsingPartitionedDIB()logic inAbstractQueryUpdateServiceand route Sample/DataClass updates through it. - Remove provisioned DataClass
lsidcolumn (domain kind + upgrade code + DB scripts) and update integration tests accordingly.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| experiment/src/org/labkey/experiment/samples/AbstractExpFolderImporter.java | Stops setting the removed UseLsidForUpdate option during TSV import. |
| experiment/src/org/labkey/experiment/api/SampleTypeUpdateServiceDI.java | Routes updates through partitioned DIB; removes LSID-key paths and legacy update methods. |
| experiment/src/org/labkey/experiment/api/ExpDataImpl.java | Selects DataClass provisioned properties by RowId (not LSID) to support dropped provisioned LSID column. |
| experiment/src/org/labkey/experiment/api/ExpDataClassDataTableImpl.java | Enforces RowId/Name keys, blocks LSID-only keying, blocks RowId in merge (behind feature), and uses partitioned DIB updates. |
| experiment/src/org/labkey/experiment/api/DataClassDomainKind.java | Removes lsid from provisioned DataClass base properties, indexes, and FKs. |
| experiment/src/org/labkey/experiment/ExperimentUpgradeCode.java | Adds deferred upgrade to drop provisioned DataClass lsid column (and related indices). |
| experiment/src/org/labkey/experiment/ExperimentModule.java | Bumps schema version to 26.005 and centralizes experimental flag constant usage. |
| experiment/src/org/labkey/experiment/ExpDataIterators.java | Updates derivation/update logic to use RowId/Name instead of LSID-for-update config; removes LSID update config usage. |
| experiment/src/client/test/integration/DataClassCrud.ispec.ts | Expands integration coverage for keying behavior (RowId/Name/LSID) and partitioned-duplicate detection. |
| experiment/resources/schemas/dbscripts/sqlserver/exp-26.004-26.005.sql | Executes Java upgrade to drop provisioned DataClass lsid column (SQL Server). |
| experiment/resources/schemas/dbscripts/postgresql/exp-26.004-26.005.sql | Executes Java upgrade to drop provisioned DataClass lsid column (PostgreSQL). |
| api/src/org/labkey/api/query/DefaultQueryUpdateService.java | Removes conversion-table hook and always uses getDbTable() for convertTypes() entrypoint. |
| api/src/org/labkey/api/query/AbstractQueryUpdateService.java | Adds updateRowsUsingPartitionedDIB() and small null-iterator safety in _pump(). |
| api/src/org/labkey/api/exp/query/ExpDataTable.java | Reorders enum constants (no functional behavior change expected). |
| api/src/org/labkey/api/exp/api/ExperimentService.java | Removes UseLsidForUpdate query option; adds shared experimental feature constant. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // using update from file, verify update using rowId for data that doesn't exist on this datacalss should fail. | ||
| errorResp = await ExperimentCRUDUtils.importData(server, "RowId\tDescription\n" + row3RowId + "\tupdate\n", emptyDataClass, "UPDATE", topFolderOptions, editorUserOptions); | ||
| expect(errorResp.text).toContain('Data not found for [' + row3RowId + ']'); |
There was a problem hiding this comment.
Typo in comment: "datacalss" should be "dataclass".
| // update date twice specifying the rowId across multiple partitions | ||
| await server.post('query', 'updateRows', { |
There was a problem hiding this comment.
Typo in comment: "update date" should be "update data".
| OptionalFeatureService.get().addExperimentalFeatureFlag(ExperimentService.EXPERIMENTAL_FEATURE_ALLOW_ROW_ID_MERGE, "Allow RowId to be accepted when merging samples or dataclass data", | ||
| "If the incoming data includes a RowId column we will allow the column but ignore it's values.", false); |
There was a problem hiding this comment.
Grammar in this feature description: "ignore it's values" should be "ignore its values".
| di = new SampleUpdateDerivationDataIterator(di, context, _container, _user, _currentDataType, _checkRequiredParents); | ||
| else | ||
| di = new DataUpdateDerivationDataIterator(di, context, _container, _user, _currentDataType, _checkRequiredParents); | ||
| di = new DataUpdateDerivationDataIterator(di, context, _container, _user, _currentDataType, _checkRequiredParents);// |
There was a problem hiding this comment.
Remove the trailing "//" at the end of this line; it looks like leftover debugging text and adds noise to the code.
| if (hasNext) | ||
| { | ||
| String lsid = (String) get(_lsidCol); | ||
| String lsid = (String) get(_lsidCol); // why lsid?, insert or merge |
There was a problem hiding this comment.
Please remove the inline comment "// why lsid?, insert or merge". It reads like a note-to-self and doesn't clarify behavior for future readers.
| for (Map<String, Object> dataRow : dataRows.values()) | ||
| lsids.add((String) dataRow.get("lsid")); | ||
| lsids.add((String) dataRow.get("lsid")); // ? |
There was a problem hiding this comment.
Remove the "// ?" comment here. If there's an open question about whether LSIDs are always present in these rows, it should be resolved in code (e.g., null-check + clear error) rather than left as an inline question.
| // update date twice specifying the name across multiple partitions | ||
| await server.post('query', 'updateRows', { |
There was a problem hiding this comment.
Typo in comment: "update date" should be "update data".
|
@XingY there's an exp-26.004-26.005.sql script in 26.3 that should be merged to develop shortly. |
OK. Thanks for the heads-up. |
| @Override | ||
| public Object get(int i) | ||
| { | ||
| assert(i <= existingColIndex); |
There was a problem hiding this comment.
Provide a message when this fails so we have a bit more context.
| final IntHashMap<String> lsids = new IntHashMap<>(); | ||
| final DataIteratorContext _context; | ||
|
|
||
| public DataClassUpdateAddColumnsDataIterator(DataIterator in, @NotNull DataIteratorContext context, TableInfo target, Container container, long dataClassId, String keyColumnName) |
There was a problem hiding this comment.
Switch the first argument to be of type CachingDataIterator and remove the casting a couple lines below. I recommend the same for SampleUpdateAddColumnsDataIterator.
| * Queries the LSID from exp.data based on the provided key (rowId or name) and dataClassId. | ||
| * The LSID is needed downstream for attachment handling. | ||
| */ | ||
| public class DataClassUpdateAddColumnsDataIterator extends WrapperDataIterator |
There was a problem hiding this comment.
nit: Feels like this could somehow extend a base class of ExistingRecordDataIterator so we could share a lot of the logic (same for SampleUpdateAddColumnsDataIterator). Not necessarily in scope for this PR but now that we have three implementations...
| _unwrapped.mark(); // unwrapped _delegate | ||
| boolean ret = super.next(); | ||
| if (ret) | ||
| if (!_context.getErrors().hasErrors() && ret) |
There was a problem hiding this comment.
nit: Switch the checks so the "cheaper" check is first.
if (ret && !_context.getErrors().hasErrors())| else if ("Sample".equals(prefix) || ExpMaterial.DEFAULT_CPAS_TYPE.equals(prefix)) | ||
| { | ||
| String xarJobId = "." + XAR_JOB_ID_NAME_SUB; // XarJobId is more concise than XarFileId | ||
| String xarJobId = "." + XAR_JOB_ID_NAME_SUB + "."; // XarJobId is more concise than XarFileId |
There was a problem hiding this comment.
How did this ever work?
There was a problem hiding this comment.
The old way is obviously wrong and generates a LSID that's not ideal, but it 'works' most of the time. The only time actual error would arise is when the following collision happens: XarJob11.11:3 vs XarJob1.111:3, which would have both been XarJob1111:3 with the old format. Otherwise, export/import works fine because the bad LSID does match up between sample/data and experimentRun.
| drop.remove("rowid");// keep rowid for audit log | ||
| String message = String.format("LSID is no longer accepted as a key for data %s. Specify a RowId or Name instead.", isMerge ? "merge" : "update"); | ||
| context.getErrors().addRowError(new ValidationException(message, LSID.name())); | ||
| return null; |
There was a problem hiding this comment.
Should here also check for?
if (context.getConfigParameterBoolean(ExperimentService.QueryOptions.UseProvidedLsidForXarImport))
drop.remove(LSID.name());| context.getErrors().addRowError(new ValidationException(String.format(DUPLICATE_COLUMN_IN_DATA_ERROR, ExpDataTable.Column.RowId.name()))); | ||
| return null; | ||
| } | ||
| step0.addNullColumn(Column.LSID.name(), JdbcType.VARCHAR); |
There was a problem hiding this comment.
Just curious, why is it necessary to add the null column for LSID here? Is it in preparation for DataClassUpdateAddColumnsDataIterator?
| step0.addColumn(nameCol, (Supplier<String>)() -> null); | ||
| } | ||
|
|
||
| if (Boolean.TRUE.equals(context.getSelectIds()) && !columnNameMap.containsKey(RowId.name())) |
There was a problem hiding this comment.
Is this something we should support on the sample side as well?
| .setKeyColumns(propertyKeyColumns) | ||
| .setDontUpdate(dontUpdate) | ||
| .setRemapSchemaColumns(((UpdateableTableInfo) _expTable).remapSchemaColumns()) | ||
| .setCommitRowsBeforeContinuing(shouldCommitRowsBeforeContinuing) |
There was a problem hiding this comment.
This is already true for the TableInsertDataIteratorBuilder configured for exp.data. If we always set this to true here won't that be consistent and no worse for performance?
There was a problem hiding this comment.
I think maybe it's needed for exp.data because downstream task relies on objectid/rowId that's newly generated/queried. For the provisioned table, so far we only ever needed this for reclassify. I do worry user will see performance degradation if we enable this broadly.
| OptionalFeatureService.get().addExperimentalFeatureFlag(ExperimentService.EXPERIMENTAL_FEATURE_FROM_EXPANCESTORS, "SQL syntax: 'FROM EXPANCESTORS()'", | ||
| "Support for querying lineage of experiment objects", false, true); | ||
| OptionalFeatureService.get().addExperimentalFeatureFlag(SampleTypeUpdateServiceDI.EXPERIMENTAL_FEATURE_ALLOW_ROW_ID_SAMPLE_MERGE, "Allow RowId to be accepted when merging samples", | ||
| OptionalFeatureService.get().addExperimentalFeatureFlag(ExperimentService.EXPERIMENTAL_FEATURE_ALLOW_ROW_ID_MERGE, "Allow RowId to be accepted when merging samples or dataclass data", |
Rationale
This PR consolidates DataClass data update operations to use the DataIteratorBuilder (DIB) pipeline exclusively — the same approach already used for SampleType updates.
Related Pull Requests
altUpdateKeysfromQueryInfolabkey-ui-components#1988Changes