[Feature][Integration][Java] Add Milvus vector store integration#663
[Feature][Integration][Java] Add Milvus vector store integration#663yanbinyang wants to merge 7 commits into
Conversation
105ca01 to
29978ea
Compare
Signed-off-by: YangYanbin <warlock.yyb@alibaba-inc.com>
29978ea to
2237ed7
Compare
wenjin272
left a comment
There was a problem hiding this comment.
Thanks for your contribution @yanbinyang. Overall looks good to me, and I left serveral minor comments.
I think we should also verify Milvus in the test case Mem0LongTermMemoryTest. This test is currently disabled due to a bug in Flink itself, which is expected to be fixed in the coming weeks. There are also workarounds available for local execution. You can add the test path for Milvus, and I will verify it locally.
|
|
||
| ### Concepts | ||
| * **Document**: Document is the abstraction that represents a piece of text and associated metadata. A document may also carry a pre-computed `embedding` vector and a `score` populated by query results. | ||
| * **Collection**: Collection is the abstraction that represents a set of documents. It corresponds to different concept for different vector store specification, like index in Elasticsearch/OpenSearch and collection in Chroma/Milvus. |
There was a problem hiding this comment.
It looks like there were some conflicts during the rebase. In a previous PR, we removed the Collection data structure and this line of description.
There was a problem hiding this comment.
It looks like there were some conflicts during the rebase. In a previous PR, we removed the
Collectiondata structure and this line of description.
Thanks for catching this. Removed the stale Collection concept line in ae1261a
| } | ||
|
|
||
| /** Returns the lazily-created Milvus client. */ | ||
| private MilvusClientV2 client() { |
There was a problem hiding this comment.
Since flink-agents supports asynchronous execution of actions, meaning that multiple threads can invoke the same vector store simultaneously, connection leaks caused by multi-threading may occur here.
I suggest adding DCL.
There was a problem hiding this comment.
Since flink-agents supports asynchronous execution of actions, meaning that multiple threads can invoke the same vector store simultaneously, connection leaks caused by multi-threading may occur here.
I suggest adding DCL.
Good point. Addressed in 56cc845 by making the Milvus client volatile and using double-checked locking for lazy initialization. I also synchronized close() so concurrent action execution will not create or leak multiple clients.
There was a problem hiding this comment.
It appears that this file is not being used in the CI. I'm wondering if we could also align the current Elasticsearch startup process in the CI with that of Milvus.
There was a problem hiding this comment.
It appears that this file is not being used in the CI. I'm wondering if we could also align the current Elasticsearch startup process in the CI with that of Milvus.
Addressed in 37c5436. The cross-language CI job now starts and stops Elasticsearch via tools/docker/elasticsearch/docker-compose.yml, aligned with the Milvus startup flow.
Signed-off-by: YangYanbin <warlock.yyb@alibaba-inc.com>
Signed-off-by: YangYanbin <warlock.yyb@alibaba-inc.com>
Thanks for the suggestion. Addressed in fa34054 by adding a Milvus path to I couldn't run this e2e test locally because it still depends on the existing Pemja/Flink issue workaround and a local Milvus/Mem0 environment. Could you please help verify the Milvus path locally? |
Removed unnecessary blank line in CI workflow.
Linked issue: #661
Purpose of change
This PR implements the Java version of Milvus-based Vector Store functionality for Flink Agents, following the design proposal in #143. This implementation enables RAG (Retrieval-Augmented Generation) capabilities by providing vector-based context retrieval.
integrations/vector-storesDocumentformatTests
API
Documentation
doc-neededdoc-not-neededdoc-included