Performance improvements to CEL rule evaluation#460
Conversation
26bab39 to
232d6a5
Compare
| rules = validate_pb2.Rule() | ||
| rules.id = expression | ||
| rules.expression = expression | ||
| if "now" in rules.expression: |
There was a problem hiding this comment.
This is just a string search, correct? Could there be false positives? Do they matter?
There was a problem hiding this comment.
yeah, you're right. probably not a huge deal (since it's just false positives rather than a correctness issue), but moved over to a regex in f6b1c39. wydt?
There was a problem hiding this comment.
Do you have benchmarks on the difference? You'd need a rule that had now used for time and a rule with now used as a substring and compare.
There was a problem hiding this comment.
My gut feeling is that the string search is fine, but that's just a guess. Maybe there are a large number of rules with the substring "now" that aren't doing time comparisons.
There was a problem hiding this comment.
yeah, the regex ought to be slower; reverted in 6071c3f. I think some false-positives are fine.
Three improvements to hot-path validation, inspired by protovalidate-go. Guard `_validate_cel` when `self._cel` is empty (bufbuild/protovalidate-go#261) Adds an early return in `CelRules._validate_cel` when there are no CEL runners, skipping activation dict creation and `datetime.now()` entirely. Also guards the `_msg_to_cel` call in `MessageRules.validate()` so the message-to-CEL conversion is skipped when there are no CEL rules to run. Benchmark (required-only field, 0 CEL runners): -37%. Skip `now` computation when unused (bufbuild/protovalidate-go#289) Adds `_uses_now` to `CelRules`, set at compile time in `add_rule` by checking whether `"now"` appears in the expression string. Only `timestamp.gt_now`, `timestamp.lt_now`, `timestamp.within`, and custom expressions referencing `now` will call `datetime.datetime.now()`. Benchmarks: `int32.gt` (5 CEL runners) -25%, `timestamp.gt_now` -6%. Early-exit loop in `cel_unique` (bufbuild/protovalidate-go#289) Replaces `len(val) == len(set(val))` with a loop that returns `False` on the first duplicate, avoiding building the full set unnecessarily.
The simple `"now" in expression` substring check could produce false positives for identifiers like `renown` or `knows`, setting `_uses_now` unnecessarily. Replace with `_NOW_RE = re.compile(r"\bnow\b")` so only the standalone identifier is matched. The `\b` assertion matches at the boundary between word characters (`[a-zA-Z0-9_]`) and non-word characters. Since CEL identifiers are `[_a-zA-Z][_a-zA-Z0-9]*`, any occurrence of `now` as an identifier — including inside expressions like `timestamp(now)` — is always flanked by non-identifier characters, so this approach produces no false negatives.
232d6a5 to
f6b1c39
Compare
This reverts commit f6b1c39.
|
@jonbodner-buf going to merge since you approved the first commit and just reverted the second; thanks for taking a look! |
Three improvements to hot-path validation, inspired by protovalidate-go.
Guard
_validate_celwhenself._celis empty (bufbuild/protovalidate-go#261)Adds an early return in
CelRules._validate_celwhen there are no CEL runners, skipping activation dict creation anddatetime.now()entirely. Also guards the_msg_to_celcall inMessageRules.validate()so the message-to-CEL conversion is skipped when there are no CEL rules to run. Benchmark (required-only field, 0 CEL runners): -37%.Skip
nowcomputation when unused (bufbuild/protovalidate-go#289)Adds
_uses_nowtoCelRules, set at compile time inadd_ruleby checking whether"now"appears in the expression string. Onlytimestamp.gt_now,timestamp.lt_now,timestamp.within, and custom expressions referencingnowwill calldatetime.datetime.now(). Benchmarks: int32.gt (5 CEL runners) -25%, timestamp.gt_now -6%.Early-exit loop in
cel_unique(bufbuild/protovalidate-go#289)Replaces
len(val) == len(set(val))with a loop that returnsFalseon the first duplicate, avoiding building the full set unnecessarily.