Skip to content

[Feature][Integration][Java] Add Milvus vector store integration#663

Open
yanbinyang wants to merge 7 commits into
apache:mainfrom
yanbinyang:dev/milvus-vector-store
Open

[Feature][Integration][Java] Add Milvus vector store integration#663
yanbinyang wants to merge 7 commits into
apache:mainfrom
yanbinyang:dev/milvus-vector-store

Conversation

@yanbinyang
Copy link
Copy Markdown

Linked issue: #661

Purpose of change

This PR implements the Java version of Milvus-based Vector Store functionality for Flink Agents, following the design proposal in #143. This implementation enables RAG (Retrieval-Augmented Generation) capabilities by providing vector-based context retrieval.

  • Added Milvus vector store integration
  • Added Milvus vector store module under integrations/vector-stores
  • Supported Milvus connection and query configuration
  • Converted Milvus search results into the common Document format
  • Added tests for Milvus vector store behavior

Tests

API

Documentation

  • doc-needed
  • doc-not-needed
  • doc-included

@github-actions github-actions Bot added doc-needed Your PR changes impact docs. fixVersion/0.3.0 The feature or bug should be implemented/fixed in the 0.3.0 version. priority/major Default priority of the PR or issue. labels May 12, 2026
@yanbinyang yanbinyang force-pushed the dev/milvus-vector-store branch from 105ca01 to 29978ea Compare May 12, 2026 12:43
Signed-off-by: YangYanbin <warlock.yyb@alibaba-inc.com>
@yanbinyang yanbinyang force-pushed the dev/milvus-vector-store branch from 29978ea to 2237ed7 Compare May 12, 2026 13:15
@wenjin272 wenjin272 self-requested a review May 13, 2026 08:24
Copy link
Copy Markdown
Collaborator

@wenjin272 wenjin272 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution @yanbinyang. Overall looks good to me, and I left serveral minor comments.

I think we should also verify Milvus in the test case Mem0LongTermMemoryTest. This test is currently disabled due to a bug in Flink itself, which is expected to be fixed in the coming weeks. There are also workarounds available for local execution. You can add the test path for Milvus, and I will verify it locally.


### Concepts
* **Document**: Document is the abstraction that represents a piece of text and associated metadata. A document may also carry a pre-computed `embedding` vector and a `score` populated by query results.
* **Collection**: Collection is the abstraction that represents a set of documents. It corresponds to different concept for different vector store specification, like index in Elasticsearch/OpenSearch and collection in Chroma/Milvus.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there were some conflicts during the rebase. In a previous PR, we removed the Collection data structure and this line of description.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there were some conflicts during the rebase. In a previous PR, we removed the Collection data structure and this line of description.

Thanks for catching this. Removed the stale Collection concept line in ae1261a

}

/** Returns the lazily-created Milvus client. */
private MilvusClientV2 client() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since flink-agents supports asynchronous execution of actions, meaning that multiple threads can invoke the same vector store simultaneously, connection leaks caused by multi-threading may occur here.

I suggest adding DCL.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since flink-agents supports asynchronous execution of actions, meaning that multiple threads can invoke the same vector store simultaneously, connection leaks caused by multi-threading may occur here.

I suggest adding DCL.

Good point. Addressed in 56cc845 by making the Milvus client volatile and using double-checked locking for lazy initialization. I also synchronized close() so concurrent action execution will not create or leak multiple clients.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that this file is not being used in the CI. I'm wondering if we could also align the current Elasticsearch startup process in the CI with that of Milvus.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that this file is not being used in the CI. I'm wondering if we could also align the current Elasticsearch startup process in the CI with that of Milvus.

Addressed in 37c5436. The cross-language CI job now starts and stops Elasticsearch via tools/docker/elasticsearch/docker-compose.yml, aligned with the Milvus startup flow.

@yanbinyang
Copy link
Copy Markdown
Author

Thanks for your contribution @yanbinyang. Overall looks good to me, and I left serveral minor comments.

I think we should also verify Milvus in the test case Mem0LongTermMemoryTest. This test is currently disabled due to a bug in Flink itself, which is expected to be fixed in the coming weeks. There are also workarounds available for local execution. You can add the test path for Milvus, and I will verify it locally.

Thanks for the suggestion. Addressed in fa34054 by adding a Milvus path to Mem0LongTermMemoryTest: the test now parameterizes the Mem0 vector store over Elasticsearch and Milvus, and Mem0LongTermMemoryAgent declares a milvusLtmStore resource.

I couldn't run this e2e test locally because it still depends on the existing Pemja/Flink issue workaround and a local Milvus/Mem0 environment. Could you please help verify the Milvus path locally?

Removed unnecessary blank line in CI workflow.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-needed Your PR changes impact docs. fixVersion/0.3.0 The feature or bug should be implemented/fixed in the 0.3.0 version. priority/major Default priority of the PR or issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants