SOLR-13764: Introduce IntervalsQParserPlugin#4582
Conversation
…ns MatchNoDocsQuery
…tend, unordered_no_overlaps, not_within, within, at_least, no_intervals
…13764-intervals-query-parser.yml
Updated the section title and removed outdated content regarding query formats.
Updated the description to reference JSON DSL documentation.
…ucene, dismax, edismax parsers
…ns-job fix: merge split QueryRequest constructor calls to satisfy Spotless
There was a problem hiding this comment.
Pull request overview
Adds a new built-in intervals query parser to Solr that constructs Lucene IntervalQuery instances from a JSON-based DSL carried in the request body (under json_queries), with accompanying ref-guide documentation and tests (standalone + SolrCloud + JSON request handling).
Changes:
- Introduces
IntervalsQParserPluginand registers it as a standard query parser plugin. - Extends JSON request processing to accept/pass-through a new
json_queriestop-level key. - Adds ref-guide documentation and new tests covering standalone parsing, distributed execution, and JSON preservation.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| solr/solr-ref-guide/modules/query-guide/querying-nav.adoc | Adds navigation entry for the new Intervals query parser page. |
| solr/solr-ref-guide/modules/query-guide/pages/other-parsers.adoc | Adds a short overview section for the Intervals query parser. |
| solr/solr-ref-guide/modules/query-guide/pages/intervals-query-parser.adoc | New detailed documentation page for the Intervals query parser DSL. |
| solr/core/src/test/org/apache/solr/search/TestIntervalsQParserPlugin.java | New unit tests for interval rule parsing/execution. |
| solr/core/src/test/org/apache/solr/search/json/TestJsonRequest.java | Adds coverage ensuring json_queries is preserved in parsed JSON. |
| solr/core/src/test/org/apache/solr/search/DistributedQParserTest.java | New SolrCloud test ensuring intervals works in distributed search. |
| solr/core/src/java/org/apache/solr/search/QParserPlugin.java | Registers intervals as a standard plugin. |
| solr/core/src/java/org/apache/solr/search/IntervalsQParserPlugin.java | Core implementation of the JSON-to-Intervals DSL parser. |
| solr/core/src/java/org/apache/solr/request/json/RequestUtil.java | Allows json_queries as a recognized JSON request top-level key. |
| changelog/unreleased/SOLR-13764-intervals-query-parser.yml | Adds an unreleased changelog entry for the feature. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Analyzer analyzer = resolveAnalyzer(params, field, "wildcard"); | ||
| String normalizedPattern = normalizeMultiTerm(field, pattern, analyzer); |
| Analyzer analyzer = resolveAnalyzer(params, field, "fuzzy"); | ||
| String normalizedTerm = normalizeMultiTerm(field, term, analyzer); | ||
|
|
||
| String fuzziness = getOptionalString(params, "fuzziness", "fuzzy"); | ||
| int maxEdits = resolveFuzziness(fuzziness, normalizedTerm); | ||
| int prefixLength = getInt(params, "prefix_length", 0, "fuzzy"); | ||
| boolean transpositions = getBoolean(params, "transpositions", true, "fuzzy"); | ||
|
|
| Object termsObj = params.get("terms"); | ||
| Object intervalsObj = params.get("intervals"); | ||
| if (termsObj == null && intervalsObj == null) { | ||
| throw new SolrException( | ||
| SolrException.ErrorCode.BAD_REQUEST, | ||
| "Rule 'phrase' requires either 'terms' (string array) or 'intervals' (rule array)"); | ||
| } |
| int maxExpansions = | ||
| getInt(params, "max_expansions", Intervals.DEFAULT_MAX_EXPANSIONS, "regexp"); | ||
| IntervalsSource source = Intervals.regexp(new BytesRef(pattern), maxExpansions); |
| int maxExpansions = | ||
| getInt(params, "max_expansions", Intervals.DEFAULT_MAX_EXPANSIONS, "range"); | ||
| BytesRef lowerTerm = lowerTermStr == null ? null : new BytesRef(lowerTermStr); | ||
| BytesRef upperTerm = upperTermStr == null ? null : new BytesRef(upperTermStr); | ||
| return Intervals.range(lowerTerm, upperTerm, includeLower, includeUpper, maxExpansions); |
|
|
||
| The Intervals Query Parser (`IntervalsQParserPlugin`) builds Lucene interval queries from a xref:json-request-api.adoc[JSON DSL] description. | ||
| Interval queries allow you to express positional constraints such as "these terms must appear within N positions of each other" or "this phrase must appear before that one". | ||
| See the https://lucene.apache.org/core/10_4_0/queries/org/apache/lucene/queries/intervals/package-summary.html[Lucene Intervals package documentation] for a detailed description of the underlying interval machinery. |
| |`query` | ||
| |Yes | ||
| |— | ||
| |The text to analyse and match. |
| String analyzerName = getOptionalString(params, "analyzer", ruleName); | ||
| if (analyzerName == null) { | ||
| return req.getSchema().getQueryAnalyzer(); | ||
| } |
| } else if ("json_queries".equals(key)) { | ||
| // passed through as a parsed object for use by SearchComponent.prepare() at subordinate | ||
| // nodes; not processed here | ||
| continue; |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
multiterm analyzer and QueryEqualityTest
|
The JSON DSL and distributed-query pass-through are substantial additions. One edge case worth testing explicitly is validation behavior when |
dsmiley
left a comment
There was a problem hiding this comment.
Glad to see Solr finally having a native intervals query parser?
Not sure I'm glad to see it require JSON... but I sympathize. It'd be nice if it could support QParser parsed Strings -> Query and then convert Query to Interval. Could come later.
| * Distributed search tests for the standard query parsers: {@code lucene}, {@code dismax}, {@code | ||
| * edismax}, and {@code intervals}. | ||
| */ | ||
| public class DistributedQParserTest extends SolrCloudTestCase { |
There was a problem hiding this comment.
Why have this... shouldn't this be only for intervals?
There was a problem hiding this comment.
And if the intent is testing distributed... well we have a whole test framework for that which is quite good: BaseDistributedSearchTestCase. Doesn't even require SolrCloud!
There was a problem hiding this comment.
afiak SolrCloudTestCase is rather faster and convenient. It's a little bit redundant for local-only QP, but there's a https://github.com/apache/solr/pull/4582#issuecomment-4861727946 regarding transferring new JSON request entries. I've decided to double check.
We cad drop it if you wish.
|
|
||
| [source,text] | ||
| ---- | ||
| q={!intervals json_query=myQuery df=title} |
There was a problem hiding this comment.
This looks abnormal... a normal query parser parses v. If v needs to refer to something else that's a string, it'd just be $paramName but I see here there's an attempt to retain the JSON structure, not using a string. If the paramName itself contains "json", maybe we can enhance Solr to reognize that to pass it as parsed JSON? This would mean the referenced json in question would be apart from json=. I wonder if there's precedent in Solr for this. I confess I don't use the Json query stuff regularly so I'm less familiar.
Just brainstorming a bit here, that's all.
There was a problem hiding this comment.
Pardon, just changed it to
q={!intervals df=title}$myQuery
&
json={
"json_queries": {
"myQuery": {
"match": { "query": "apache solr" }
}
}
}
Also, we might name parser as jintervals or json_intervals to give a clue where to look at.
| *** xref:json-combined-query-dsl.adoc[] | ||
| ** xref:searching-nested-documents.adoc[] | ||
| ** xref:block-join-query-parser.adoc[] | ||
| ** xref:intervals-query-parser.adoc[] |
There was a problem hiding this comment.
This is placed illogically, shoved between two join query parser pages
There was a problem hiding this comment.
moved
** xref:other-parsers.adoc[]
*** xref:intervals-query-parser.adoc[]
does it make sense ?
| public Query parse() { | ||
| String jsonQueryName = localParams.get(JSON_QUERY_PARAM); | ||
| if (jsonQueryName == null) { | ||
| return new MatchNoDocsQuery("No " + JSON_QUERY_PARAM + " parameter specified"); |
There was a problem hiding this comment.
sounds more like user error to me
There was a problem hiding this comment.
right. now it's more strict
| private Analyzer resolveAnalyzer(Map<String, Object> params, String field, String ruleName) { | ||
| String analyzerName = getOptionalString(params, "analyzer", ruleName); | ||
| if (analyzerName == null) { | ||
| return req.getSchema().getFieldTypeNoEx(field).getQueryAnalyzer(); |
There was a problem hiding this comment.
Why call getFieldTypeNoEx -- wouldn't we want to throw if not found?
| String analyzerName = getOptionalString(params, "analyzer", ruleName); | ||
| FieldType fieldType = | ||
| analyzerName == null | ||
| ? req.getSchema().getFieldTypeNoEx(field) |
There was a problem hiding this comment.
same -- why getFieldTypeNoEx
| return source; | ||
| } | ||
|
|
||
| private IntervalsSource parseWildcardRule(Map<String, Object> params, String topField) { |
There was a problem hiding this comment.
here and everywhere, "topField" is passed. Not sure what "top" means. Anyway, it would be more typed to use SchemaField. From the SchemaField, you can resolve the field type easily. Strings are essentially typeless and can lead to confusion/bugs of wrong parameters. And use of SchemaField could avoid repeated type lookups.
There was a problem hiding this comment.
topFiled - means the field we build itervals against, until we explicit override it. Exactly the same we have with spans "{!xmlparser df=v_ws}" ok. it's better for name it df. Thanks for typing suggestion. I try to apply it.
extract JSON_QUERIES_KEY constant and update documentation for new syntax
use typed SchemaField
| "intervals qparser should support all_of with nested any_of via df local param", | ||
| req( | ||
| "q", | ||
| "{!intervals df=title_t}$second_query", |
There was a problem hiding this comment.
I see the proposed query request structure as combination of Standard query and JSON query syntax. It could even work if the q parameter were moved into the json request as the query field, either using the local-params string format or the expanded form shown in this example from documentation. Since this is intended for use with the JSON syntax, it would be more consistent and convenient IMHO.
We could also use json query string for readability purpose across the tests.
https://issues.apache.org/jira/browse/SOLR-13764
This pull request adds support for the Intervals Query Parser in Solr, enabling users to construct Lucene interval queries using a JSON-based DSL. The changes include the core implementation, integration with the standard query parser plugins, documentation updates, and new tests to ensure distributed and JSON query support.
Intervals Query Parser Implementation and Integration:
IntervalsQParserPluginto the set of standard query parser plugins inQParserPlugin.java, allowing interval queries to be parsed and executed.RequestUtil.javato recognize and correctly handle the newjson_querieskey in JSON requests, ensuring it is passed through for use by query parsers.The syntax proposed
this QP refers to json.json_queries. I propose to introduce it since I expect more usages for pure json parsing.
The considerable alternatives are
or even
param_ref=$q1like herePropose yours syntax in comments.
Testing and Validation:
DistributedQParserTest.java, which verifies distributed search support for the newintervalsquery parser alongside existing parsers.TestJsonRequest.javato test the handling and preservation of thejson_querieskey in JSON requests.Documentation:
Checklist
Please review the following and check all that apply:
mainbranch../gradlew check.