security: build keyspace DDL via SchemaBuilder/CqlIdentifier (follow-up to #2472)#2475
security: build keyspace DDL via SchemaBuilder/CqlIdentifier (follow-up to #2472)#2475erichare wants to merge 7 commits into
Conversation
📈 Unit Test Coverage Delta vs Main Branch
|
Unit Test Coverage Report
|
…up to #2472) PR #2472 closed the GHSA-p64p-96pw-mxwv injection vectors with input validation. This follow-up addresses the root cause for the keyspace-name sinks: replace the String.format CQL interpolation in CreateKeyspaceOperation and DropKeyspaceOperation with the driver's SchemaBuilder using CqlIdentifier.fromInternal(name). CqlIdentifier doubles embedded quote characters, so the keyspace identifier is well-formed regardless of upstream validation. Driver-gap finding for datacenter names: SchemaBuilder.withNetworkTopologyStrategy(Map) does NOT escape map keys (see OptionsUtils.extractOptionValue in java-driver-query-builder 4.19.0-preview1 — a hostile DC name with a single quote produces broken CQL). The DC-name allowlist applied by the resolver is therefore the actual security control for the replication map, not just defense-in-depth. The resolver comment is updated to flag this so the allowlist is not removed in a later cleanup. Changes: - CreateKeyspaceOperation: take (name, strategy, strategyOptions) and build the SimpleStatement via SchemaBuilder.createKeyspace(CqlIdentifier). - DropKeyspaceOperation: build via SchemaBuilder.dropKeyspace(CqlIdentifier). - CreateNamespaceKeyspaceCommandResolver: drop the dead getReplicationMap string-building; keep DC-name validation (now wired via validateStrategyOptions). Updated comment documents the driver-gap rationale. - New operation-level tests pin the SchemaBuilder behaviour: identifier escaping for keyspace names, fallback to SimpleStrategy, hyphenated DC name support, and a regression-pinning test that documents the driver's unescaped-map-key behaviour. - Existing resolver tests adjusted to the new record fields and existing injection-rejection tests retained.
333fd1c to
e9097e8
Compare
📉 Integration Test Coverage Delta vs Main Branch (dse69-it)
|
Integration Test Coverage Report (dse69-it)
|
📉 Integration Test Coverage Delta vs Main Branch (hcd-it)
|
Integration Test Coverage Report (hcd-it)
|
amorton
left a comment
There was a problem hiding this comment.
I am still not convinced the rules for the DC are actually correct. That needs to be validated in the C* code
We can do some validation on the dropKeyspace to check the identifier is a regex word char, but not to check the length
some more tweaks to bring the old collection code up to a better standard.
and the comments, um, need a human touch
amorton
left a comment
There was a problem hiding this comment.
some iteration, particularly around the santizing Vs checking name rules for deletion
| } | ||
|
|
||
| static CqlIdentifier keyspaceIdentifierForDrop(String name) { | ||
| if (name == null || !DROP_KEYSPACE_NAME.matcher(name).matches()) { |
There was a problem hiding this comment.
yes we should validate,but we cannot contrain the length because in HCD and C* the keyspac ename can be longer than astra. And it could be made in CQL .
So we need perhaps something on the nameing rules that is sanitizeInput() ? that will check the chars are from the allowed set of char, but not the length
it should prob throw the same or a similar exception inside sanitizeInput() as we do in checkRule()
| @Pattern(regexp = "(?i)(SimpleStrategy|NetworkTopologyStrategy)") | ||
| @JsonProperty("class") |
There was a problem hiding this comment.
the code in cassandra for getting the replication strategy is case sensitive, see AbstractReplicationStrategy.getClass().
SO we should either keep it that way here, OR we need to fix it later.
lets go with leaving this without the (?i) because we do not know how to capitalize, and tehcnically this can be any fully qualified path.
| public record Replication( | ||
| @NotNull() | ||
| @Pattern(regexp = "SimpleStrategy|NetworkTopologyStrategy") | ||
| @Pattern(regexp = "(?i)(SimpleStrategy|NetworkTopologyStrategy)") |
There was a problem hiding this comment.
same as keyspace, -1 on making case sensitive
| UNSUPPORTED_JSON_TYPE_FOR_TEXT_INDEX, | ||
| UNSUPPORTED_LIST_DEFINITION, | ||
| UNSUPPORTED_MAP_DEFINITION, | ||
| UNSUPPORTED_REPLICATION_DATA_CENTER_NAME, |
There was a problem hiding this comment.
I think this may actually be INVALID, for unsupported it is "we understand what you want to do, but you cannot do that" , e.g. see INVALID_USER_DEFINED_TYPE_NAME above
| start.withNetworkTopologyStrategy(strategyOptions != null ? strategyOptions : Map.of()); | ||
| } else { | ||
| int replicationFactor = | ||
| (strategyOptions != null) |
There was a problem hiding this comment.
ok, we marked the options as @NotNull in the command class, but this class does not know and needs to be defensive in it's inputs. So we shoud ensure that all of the values in the map or not null for this check and above.
| return DEFAULT_REPLICATION_MAP; | ||
| // The driver writes NetworkTopologyStrategy map keys as CQL string literals without escaping | ||
| // them. | ||
| private static final Pattern VALID_DATA_CENTER_NAME = Pattern.compile("[^']+"); |
There was a problem hiding this comment.
as discussed, we need a thought out reason why this is the correct pattern to sanitize with. The code to reference from C* will be the NetworkTopologyStrategy
for example, a CQL quoted identifier is delim't with " so should we care about that ?
is this a better idea ?
["'\u2018\u2019\u201C\u201D] That covers: " double, ' single, backtick, ' ' curly single, " " curly double.
There was a problem hiding this comment.
https://cassandra.apache.org/doc/stable/cassandra/developing/cql/ddl.html#create-keyspace
NTS option keys are data-center names in a CQL map, not CQL identifiers, and the driver emits them as raw single-quoted CQL string keys. So ASCII ' is the delimiter we must reject; "/backtick/curly quotes do not close the string literal in that position.
| import java.util.Map; | ||
| import java.util.regex.Pattern; | ||
|
|
||
| final class KeyspaceCommandResolverSupport { |
There was a problem hiding this comment.
unsure on why we need a new class here, and genally a "Support" class with static functions would need to be more widely usable. The pattern with the CreateKeyspace/Namespace is a base class.
We also now have logic spread out for the DDL operations for a Keyspace.
Consider renaming CreateNamespaceKeyspaceCommandResolver to KeyspaceDDLCommandResolver and move all the helper functions into there.
Sublass all the Create and Delete for keyspace / namespace off the same class
Summary
Follow-up to #2472 (GHSA-p64p-96pw-mxwv). After #2474 reverted #2472 with the reasoning "the actual fix is to use a query builder," this PR is the query-builder fix.
CreateKeyspaceOperationandDropKeyspaceOperationnow build theirSimpleStatementvia the driver'sSchemaBuilderwithCqlIdentifier.fromInternal(name).CqlIdentifierdoubles embedded quote characters, so the keyspace identifier stays well-formed regardless of upstream validation — noString.formatinterpolation remains.CreateKeyspaceOperationtakes(name, strategy, strategyOptions)directly; the manual replication-map string-building inCreateNamespaceKeyspaceCommandResolveris gone.NamingRules.KEYSPACE.checkRule— addressing the concern in Revert "security: validate keyspace and datacenter names to prevent CQL injection (GHSA-p64p-96pw-mxwv)" #2474 about blocking drops of keyspaces created outside our rules. TheSchemaBuilder+CqlIdentifierpath makes that resolver-level check unnecessary for safety.Driver-gap finding for datacenter names
While building this PR I verified — by both decompiling
OptionsUtils.extractOptionValueand running a live probe — that the driver'sSchemaBuilder.withNetworkTopologyStrategy(Map)does not escape map keys. Repro:The DC-name allowlist applied in
CreateNamespaceKeyspaceCommandResolver.validateStrategyOptions(...)is therefore the actual security control for the replication map, not defense-in-depth. The resolver comment is updated to flag this so the allowlist isn't accidentally removed in a later cleanup. A regression-pinning test (CreateKeyspaceOperationTest.driverDoesNotEscapeDataCenterMapKeys) documents the driver behaviour so we get a clear signal if a future driver upgrade starts escaping.Files changed
CreateKeyspaceOperation/DropKeyspaceOperation→SchemaBuilder+CqlIdentifierCreateKeyspaceCommandResolver/CreateNamespaceCommandResolver→ pass strategy + options to the operation; callvalidateStrategyOptionsCreateNamespaceKeyspaceCommandResolver→ dropgetReplicationMap; keep DC-name validation with an updated rationale commentSchemaException+errors.yaml→ re-addUNSUPPORTED_REPLICATION_DATA_CENTER_NAME(reverted in Revert "security: validate keyspace and datacenter names to prevent CQL injection (GHSA-p64p-96pw-mxwv)" #2474)CreateKeyspaceOperationTestandDropKeyspaceOperationTestexercise the SchemaBuilder behaviour directlyCreateKeyspaceCommandResolverTestexercises both the resolver fields and DC-name injection rejectionTest plan
./mvnw test -Dtest='CreateKeyspaceCommandResolverTest,DropKeyspaceCommandResolverTest,CreateKeyspaceOperationTest,DropKeyspaceOperationTest'→ 27/27 passing (JDK 21)./mvnw compile→ clean, including google-java-format check (JDK 21)