Interfacing a Vector Database with ECL
This project is available as a student work experience opportunity with HPCC Systems. Curious about other projects we are offering? Take a look at our Ideas List.
Student work experience opportunities also exist for students who want to suggest their own project idea. Project suggestions must be relevant to HPCC Systems and of benefit to our open source community.Â
Find out about the HPCC Systems Summer Internship Program.
Project Description
Efficient data processing has become more crucial than ever for applications that involve large language models, generative AI, and semantic search. All of these new applications rely on vector embeddings, a type of vector data representation that carries within it semantic information that is critical for AI to gain understanding and maintain a long-term memory they can draw upon when executing complex tasks. A vector database is a type of database that indexes and stores vector embeddings for fast retrieval and similarity search, with capabilities like CRUD operations, metadata filtering, and horizontal scaling.
The goal of this project is to support a Vector DB by allowing the embedding of Milvus database queries within ECL code running on HPCC Systems.
Completion of this project involves:
Investigating the API for calling Milvus from C++ and learning the ECL embed API.
Creating a simple wrapper that passes lists between the ECL embedded API and the Milvus API using the MongoDB plugin as an example.Â
Extending the simple wrapper to handle structured data.
Develop test cases for the plugin that tests all functionality of Milvus and ensures all data types are passed in and returned properly.
Develop test cases for the plugin ensuring multi-threaded access from the ECL side. This includes performance and throughput of the system for some examples that approximate to real-world usage.
A complete GitHub project with code and documentation.
A blog, a recorded presentation, and a poster artifact about your project (see examples from previous years here).
By the mid term review we would expect you to have:
Understand the ECL embed API and implement a simple example that makes a connection to a Milvus database.
Mentor | Jack Del Vecchio Backup Mentor: TBD |
Skills needed |
|
Deliverables | Midterm
End of project
|
Other resources |
|
All pages in this wiki are subject to our site usage guidelines.