Skip to content

UNOMI-139 UNOMI-880 UNOMI-878 UNOMI-884: Multi-tenant platform core#760

Open
sergehuber wants to merge 4 commits into
masterfrom
UNOMI-139-platform
Open

UNOMI-139 UNOMI-880 UNOMI-878 UNOMI-884: Multi-tenant platform core#760
sergehuber wants to merge 4 commits into
masterfrom
UNOMI-139-platform

Conversation

@sergehuber
Copy link
Copy Markdown
Contributor

@sergehuber sergehuber commented May 18, 2026

Stacked PR (merge order)

Role Branch
Base (merge into) UNOMI-888-import-export-javadoc
Head (this PR) UNOMI-139-platform

This PR is stacked: merge into UNOMI-888-import-export-javadoc first (PR #756, UNOMI-888), after that line is merged or rebased. Downstream PRs (UNOMI-904 V2 compat, dev shell, additional ITs, manual) will stack on UNOMI-139-platform.

For more information (each PR targets the branch below until the bottom merges): https://github.github.com/gh-stack/introduction/overview/


JIRA

Summary

Delivers the Unomi 3.1 platform core on top of UNOMI-888: multi-tenant execution, unified caching, cluster-aware scheduling, and V3 migration assets. Services, persistence, GraphQL, and REST are wired for V3 tenant resolution (API keys / roles).

Supersedes the platform scope of PR #757 (closed; monolith branch UNOMI-139-UNOMI-880-multitenancy). UNOMI-904 (V2 compatibility), dev shell commands, additional IT classes, and manual updates will follow in stacked PRs on this line.

UNOMI-139 Multi-tenancy

  • Tenant model and APIs (Tenant, TenantService, API keys, security/audit interfaces, quotas, lifecycle hooks) and tenantId on core types.
  • ExecutionContextManager and thread-local tenant context so services, persistence, and REST resolve the active tenant consistently.
  • Tenant-scoped persistence and definitions (Elasticsearch/OpenSearch), including system-tenant vs tenant-specific overrides.

UNOMI-880 Unified multi-tenant caching

  • MultiTypeCacheService / CacheableTypeConfig and AbstractMultiTypeCachingService for shared cache lifecycle, predefined item bundling, periodic refresh, and statistics.
  • Migrate DefinitionsServiceImpl and SegmentServiceImpl (and related paths) onto the unified cache; remove duplicated ad-hoc cache listeners where superseded.

UNOMI-878 Cluster-aware task scheduling

  • Replace legacy scheduler usage with SchedulerService (scheduled tasks, task executors, persistence-backed vs in-memory tasks, cluster coordination).
  • Integrate periodic work (cache refresh, cluster heartbeat, router scheduling, rule refresh paths) with the new scheduler.
  • Update dependents (e.g. Groovy actions service) to schedule work through the new API.

UNOMI-884 Migration to V3

  • Migration Groovy scripts and request bodies for 3.1.0 (tenant document IDs, system item ID fixes, tenant initialization, legacy queryBuilder updates).
  • Extended MigrationUtils and MigrationUtilsTest for the new steps aligned with multi-tenancy and index updates.

Tests

  • Platform integration tests updated (BaseIT, migration ITs, ScopeIT, and related suites).
  • Full integration test run verified locally with OpenSearch (./build.sh --integration-tests --use-opensearch).

Licence

@sergehuber sergehuber changed the title UNOMI-139/880/878/884: Multi-tenant platform core UNOMI-139 UNOMI-880 UNOMI-878 UNOMI-884: Multi-tenant platform core May 18, 2026
@sergehuber sergehuber requested a review from Copilot May 19, 2026 13:43
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

@sergehuber sergehuber force-pushed the UNOMI-139-platform branch from 4c5828e to d2093d1 Compare June 2, 2026 17:47
@sergehuber sergehuber changed the base branch from UNOMI-888-import-export-javadoc to master June 2, 2026 17:51
@sergehuber sergehuber force-pushed the UNOMI-139-platform branch from d2093d1 to 6f3f768 Compare June 2, 2026 18:29
Log the full event chain once per collapsed stack pattern instead of on
every blocked send(), avoiding log storms during rule recursion loops.
Add no-arg constructor and setUpdated() so clients can deserialize POST
/cxs/eventcollector JSON (aligned with unomi-3-dev, without tracing fields).
@sergehuber sergehuber force-pushed the UNOMI-139-platform branch from 6f3f768 to c36e1fd Compare June 2, 2026 19:44
@sergehuber sergehuber closed this Jun 3, 2026
@sergehuber sergehuber deleted the UNOMI-139-platform branch June 3, 2026 05:46
@sergehuber sergehuber restored the UNOMI-139-platform branch June 3, 2026 05:54
@sergehuber sergehuber reopened this Jun 3, 2026
Increase retry budget from 10 to 20 for the property type removal poll.
ES consistently takes ~11s to propagate the deletion, exceeding the 10s limit.
Copy link
Copy Markdown
Contributor Author

@sergehuber sergehuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: UNOMI-139 Multi-tenant platform core

Scope: 292 files, +30 486 / −4 093 lines (multi-tenancy, unified caching, cluster scheduling, V3 migration).

# Severity Finding
C1 CRITICAL hasTenantAccess() ignores tenantId — any tenant key accesses all tenants
C2 CRITICAL Unknown X-Unomi-Tenant-Id header grants full system context to JAAS users
C3 CRITICAL Private API key committed in plaintext to migration script (permanently in git history)
C4 CRITICAL ApiKey.key serialised to ES and returned by REST with no @JsonIgnore
H1 HIGH executeAsSystem() finally block suppresses original exception on dual failure
H2 HIGH Migration step key uses itemType — multiple rollover indices silently skipped
I1 IMPORTANT SecurityService.SYSTEM_TENANT = "SYSTEM_TENANT" conflicts with all other constants ("system")
I2 IMPORTANT isPublicOperation() throws IndexOutOfBoundsException on empty/fragment-only documents
I3 IMPORTANT migrate-3.1.0-00-tenantDocumentIds.groovy conflicts with existing 00-fixProfileNbOfVisits on master
I4 IMPORTANT ScheduledTask implements Serializable without serialVersionUID
I5 IMPORTANT getCurrentRoles() is dead code calling AccessController removed in Java 21
I6 IMPORTANT isOperatingOnSystemTenant() stub always returns false
A1 Advisory @author tag in SchedulerServiceImpl (ASF policy)

All findings ≥ 80 % confidence. Inline suggestions attached. C1–C4 should be resolved before merge.


@Override
public boolean hasTenantAccess(String tenantId) {
if (hasRole(UnomiRoles.TENANT_ADMINISTRATOR)) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[C1 CRITICAL] Tenant isolation bypass — tenantId parameter is ignored

Every API-key-authenticated tenant receives TENANT_ADMINISTRATOR (see createSubject() line 323). This check then returns true for any tenantId when that role is held — Tenant A’s private key grants full read/write access to Tenant B’s data.

Fix:

@Override
public boolean hasTenantAccess(String tenantId) {
    if (hasRole(UnomiRoles.ADMINISTRATOR)) {
        return true; // system admin bypasses tenant boundary
    }
    String subjectTenantId = getCurrentSubjectTenantId();
    return tenantId != null && tenantId.equals(subjectTenantId);
}

executionContextManager.setCurrentContext(executionContextManager.createContext(tenantId));
} else {
LOG.warn("Invalid tenant ID provided in header: {}", tenantId);
executionContextManager.setCurrentContext(ExecutionContext.systemContext());
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[C2 CRITICAL] JAAS user with unknown tenant ID receives full system privileges

When JAAS authentication succeeds but X-Unomi-Tenant-Id names a tenant that does not exist, this line grants systemContext() — which sets isSystem = true and bypasses every validateAccess() check. Any Karaf user can exploit this by passing a random non-existent tenant ID.

Safe fix: drop the context override and let the request continue under the JAAS user’s own subject-derived context.

Suggested change
executionContextManager.setCurrentContext(ExecutionContext.systemContext());
// Unknown tenant — do not grant system privileges; leave context derived from JAAS login

"lastModifiedBy": "system-migration-3.1.0",
"creationDate" : "${isoDate}",
"lastModificationDate" : "${isoDate}",
"key" : "503BAABB3A14AEB4B50ACF3C82982FBABECDBAEA83879CA8AECA016A6A9EEA85",
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[C3 CRITICAL] Private API key credential hardcoded in source — committed to git history permanently

Both the public key (C606D77D… line 62) and this private key (503BAABB…) will bootstrap the system tenant on every migration run. Anyone with repository access has working credentials. Private keys must never be committed to source.

Replace both key values with keys generated at migration time:

import java.security.SecureRandom
SecureRandom rng = SecureRandom.getInstanceStrong()
byte[] pubBytes  = new byte[32]; rng.nextBytes(pubBytes)
byte[] privBytes = new byte[32]; rng.nextBytes(privBytes)
String generatedPublicKey  = pubBytes.collect  { String.format('%02X', it) }.join()
String generatedPrivateKey = privBytes.collect { String.format('%02X', it) }.join()
// use generatedPublicKey / generatedPrivateKey in the JSON body

The fixed itemId UUIDs on lines 56 and 67 compound the issue — an attacker who knows the key values also knows the exact Elasticsearch document IDs. Consider generating those at runtime too.

/**
* The API key value.
*/
private String key;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[C4 CRITICAL] Raw API key value serialised to Elasticsearch and exposed by the REST layer

This field has no @JsonIgnore. Since Tenant extends Item and is persisted to Elasticsearch, the entire List<ApiKey> — including raw key values — is stored in plaintext and returned by GET /tenants/{id} to any caller with the ADMINISTRATOR role. The @XmlTransient annotations on the derived getters suppress JAXB only, not Jackson.

Private keys are authentication credentials and should be stored as a secure hash, returned to the caller only once at creation time.

Minimal immediate fix (also add import com.fasterxml.jackson.annotation.JsonIgnore; to the class imports):

Suggested change
private String key;
@com.fasterxml.jackson.annotation.JsonIgnore
private String key;

} catch (Exception e) {
LOGGER.error("Error restoring previous context: {}", e.getMessage(), e);
// Still throw the error to ensure it's not silently ignored
throw new SecurityException("Failed to restore security context", e);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[H1 HIGH] Original operation exception suppressed when context restore also fails

If operation.get() throws E1 and setCurrentSubject() then throws E2 in the finally catch, this throw new SecurityException(…) replaces E1. The caller receives a generic restore-failure message and the actual root cause is silently lost (not even added via Throwable.addSuppressed()).

Remove the re-throw — the error is already logged, and rethrowing here swallows E1 every time both the operation and the restore fail together. Also update the stale comment on line 109.

Suggested change
throw new SecurityException("Failed to restore security context", e);
// Context restore failed — already logged above. Do not rethrow to avoid suppressing the original operation exception.

@@ -0,0 +1,179 @@
import org.apache.unomi.shell.migration.service.MigrationContext
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[I3 IMPORTANT] Filename conflict — migrate-3.1.0-00-fixProfileNbOfVisits.groovy already exists on master

migrate-3.1.0-00-fixProfileNbOfVisits.groovy was added by UNOMI-922 (commit c0f145bb0) and is already in master. Both scripts share step priority 0. They both execute (the TreeSet orders them alphabetically by name when priority is equal, so fixProfileNbOfVisits runs first), but:

  1. migrate-3.1.0-05-fixSystemItemIds.groovy refers to “the 3.1.0-00 migration script” in the singular — now ambiguous.
  2. Future operators and developers cannot tell which 00 was intentional.

Please rename this file to migrate-3.1.0-01-tenantDocumentIds.groovy (or another unambiguous number that makes the ordering relative to fixSystemItemIds explicit).

*/
public class ScheduledTask extends Item implements Serializable {

public static final String ITEM_TYPE = "scheduledTask";
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[I4 IMPORTANT] Serializable class without serialVersionUID — rolling-upgrade hazard

ScheduledTask implements Serializable and has ~25 fields (Map, Set, TimeUnit, Date, inner TaskStatus enum). Without a declared serialVersionUID the JVM computes one from the class structure. Adding or removing any field will invalidate persisted serialised blobs in a rolling cluster upgrade and throw InvalidClassException.

Suggested change
public static final String ITEM_TYPE = "scheduledTask";
private static final long serialVersionUID = 1L;
public static final String ITEM_TYPE = "scheduledTask";

});
}

private Set<String> getCurrentRoles() {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[I5 IMPORTANT] Dead code calling AccessController.getContext() — removed in Java 21

getCurrentRoles() calls AccessController.getContext() which was deprecated for removal in Java 17 and removed entirely in Java 21. A project-wide search confirms this method has zero call sites — it is unreachable dead code.

If ever invoked on a Java 21+ runtime it will throw NoSuchMethodError. Please delete the entire method (lines 155–166). If the functionality is ever needed it should be re-implemented using the securityService instance already available in this class.


@Override
public boolean isOperatingOnSystemTenant() {
return false;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[I6 IMPORTANT] isOperatingOnSystemTenant() is a non-functional stub — always returns false

This method is part of the SecurityService API contract. Any caller that uses it to guard a system-only operation will silently never activate that guard.

Suggested change
return false;
return getCurrentSubjectTenantId().equals(SYSTEM_TENANT);

* - Operations that require subservices are queued during initialization
* - Operations are executed once all required services are available
* - Supports different operation types with appropriate handling
*
* @author dgaillard
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[A1 Advisory] @author tag — violates ASF attribution policy

The Apache Software Foundation asks projects not to use @author Javadoc tags; author attribution belongs in git log.

Suggested change
* @author dgaillard
*

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants