This project is available as a student work experience opportunity with HPCC Systems this summer. Curious about other projects we are offering? Take a look at our Ideas List.
Find out about the HPCC Systems Summer Internship Program.
This project is no longer available
Project Description
There is a detailed description of the work in the JIRA issue TS1, which includes an attachment to the the Open Source Text Search document. This JIRA also details a series of sub-tasks describing the work.
There is a preliminary collection of ECL attributes that were drawn from several earlier proprietary text search applications. The intent is to provide a framework for building generally useful text search applications supporting searching XML text documents.
The sub-projects are:
- Initial build version. Build the inversion datasets.
- Initial search version. Search the initial inversions.
- Regression tests. Regressions for search request parsing, inversion builds, and search resolution.
- Document add, replace, and delete. Attributes to maintain the inversion.
- Slice Rollup. Automation to rollup the incremental data.
- Wildcard processing. Alter the wildcard processing to work with large numbers of terms that match a patterns.
- Retrieval application. An application to retrieve documents from the search resolve hit lists.
- Equivalence terms. Language equivalence (like stemming) and ad hoc phrase equivalencing.
There is enough work that it is unlikely that a single intern would be able to complete all of the sub-projects in a single period.
Completion of this project involves:
Code checkin will be done weekly, and the commit will be pushed. The developer can determine whether to amend a single commit or to provide a sequence of weekly commits.
Each sub-project will be done in sequence, and each sub-project will have a separate pull request.
The attribute exports intended to be used by an application developer using the framework will be documented using java Doc style comments.
By the midterm review we would expect you to have completed:
- Initial build version: See https://track.hpccsystems.com/browse/TS-2
- Initial search version: See https://track.hpccsystems.com/browse/TS-3
- Regression tests: See https://track.hpccsystems.com/browse/TS-4
Mentor | John Holt Backup Mentor: Roger Dev |
Skills needed |
|
Deliverables |
|
Other resources |