Filter safety#250
Conversation
Cherry-pick of 02acc13 limited to Part-1-compatible tests. Adds request_validation_test.cpp covering filter parameter validation from 3e33557 and wires it into tests/CMakeLists.txt. The remaining contents of 02acc13 (vector_storage_test.cpp, numeric_index_stress_test.cpp, tests/repo_filter.py, and the new TEST_F additions in filter_test.cpp) exercise Part-2 behavior (bitmap-only bucket state, unified float numeric encoding, upsert cleanup, deleteFilter meta sync) and are deferred to Part 2.
Records the four filter_pass commits skipped from the Part-1 split (546430d, b0e8425, e9cca02, 4cb445d), the hpp->cpp refactor (7743296) deferred to be bundled with the bucket layout change, and the Part-2 test files split out of 02acc13. Documents the Part-1 carry-forwards (Bucket count field, sortable_from_json int branch) that exist to keep filter_safety byte-compatible with master-built indexes and that Part 2 should remove.
Three Hypothesis tests from a46d0b8 (safe filter bitmap
deserialization) assert behavior that only exists after Part 2:
- Hypothesis2.SaturationCreatesBitmapOnlyEntries — expects
Bucket::add to route delta-0 inserts past MAX_SIZE into the
summary bitmap (546430d).
- Hypothesis4.DeserializeRejectsLegacyCountFormat — expects the
count-less deserializer to reject the legacy on-disk shape
(546430d).
- Hypothesis4.ReadSummaryBitmapRejectsLegacyCountFormat — expects
read_summary_bitmap to reject the same shape via an alignment
check; Part 1 intentionally removed that check because the count
field is still part of the layout.
Each test now calls GTEST_SKIP() with a message pointing at
docs/filter_part2_followups.md. Part 2 must remove these skips when
the underlying fixes land.
VectorDB Benchmark - Ready To Run
Post one of the command below. Only members with write access can trigger runs. Available Modes
Infrastructure
Both servers start on demand and are always terminated after the run — pass or fail. How Correctness Benchmarking Works
|
Move the implementations of CategoryIndex, NumericIndex, Bucket, and Filter from their respective headers into new translation units. The headers now expose only types, declarations, and the tiny inline accessors (sortable_from_float family, Bucket::get_value / is_full / is_empty). Behavior is unchanged; this is a build-time refactor. Define NDD_FILTER_SOURCES once in the root CMakeLists.txt and pull it into both NDD_CORE_SOURCES (for the main binary) and the ndd_filter_test target so the implementations are linked in both places. Add #include <thread> to settings.hpp. It uses std::thread::hardware_concurrency() but was relying on a transitive include from the old filter.hpp; the trimmed filter.hpp no longer pulls in <thread>, so the test build broke without this fix. Verified: ndd_filter_test (42 pass, 7 skip, 0 fail) and ndd_request_validation_test (6 pass, 0 fail) match the pre-split results; ndd-avx2 builds clean.
================================================================================
|
================================================================================
|


Pull Request
Summary
Splits the work from
filter_passinto the portion that is byte-compatible with filter indexes built bymasterand ships it asfilter_safety. The portion that changes the on-disk bucket layout, the numeric sortable-key domain, and the upsert semantics — and therefore requires a reindex — is deliberately deferred to a follow-up PR ("Part 2"). The split is documented indocs/filter_bucket_format_followup.md.Net effect on an existing deployment: drop in, restart, queries continue to return the same answers, plus the new
$gt/$gte/$lt/$lteoperators, filter input validation, defensive bitmap deserialization, and a batch of perf and code-hygiene work. No rebuild required.Headline items in this PR:
$gt,$gte,$lt,$lte.:inside filter keys / values.readSafe+internal_validate+ payload-size check). Bitmap byte format unchanged — valid master-built bitmaps still parse.addManynumeric inserts, bounded MDBX write transactions.OperationResultreturn type plumbed through filter call sites; unifiedadd_filters_from_json.xcrun.ndd_request_validation_testcovering the validation work. 42 pass, 7 skip (3 Part-2 regression alarms guarded byGTEST_SKIP, 4 benchmarks gated onENDEE_BENCH_DB), 0 fail.docs/filter_bucket_format_followup.mdenumerates exactly what Part 2 must do to remove the Part-1 carry-forwards (Bucket::countfield,is_number_integerbranch insortable_from_json, etc.).Type of Change
$gt/$gte/$lt/$lteoperators, filter parameter validation)OperationResultplumbing, unifiedadd_filters_from_json, batched inserts)Related Issue
Closes # N/A
Checklist
ndd_filter_test+ndd_request_validation_test. Skips are intentional (3 Part-2 regression alarms with explanatoryGTEST_SKIPmessages, 4 benchmarks gated onENDEE_BENCH_DB).tests/request_validation_test.cppcovers the new validation path.docs/filter.mdexpanded;docs/filter_bucket_format_followup.mdlists everything deferred to Part 2 and the Part-1 carry-forwards Part 2 must remove.Bucket::is_empty()still checks onlyids.empty();Bucket::serialize/deserializestill write/read thecountfield;Filter::sortable_from_jsonstill branches onis_number_integer()→int_to_sortable;store_vectors_batchdoes NOT takeis_new_to_db. Indexes built onmasterremain readable byte-for-byte.