...
Curious about other projects we are offering? Take a look at our Ideas List.
Description
More information coming soon...
Include any background information that may help other developers to understand what you want to achieve and why. Also describe what would need to be done preferably in the order in which work needs to be completed. Indicate where there may be links with other areas of the HPCC System.There is a detailed description of the work at
in the Open Source Text Search document. This JIRA also details a series of sub-tasks describing the work. Jira Legacy server HPCC serverId dd31125a-2765-3c6f-bcf5-4c7e6f8192d5 key TS-1
There is a preliminary collection of ECL attributes that were drawn from several earlier proprietary text search applications. The intent is to provide a framework for building generally useful text search applications supporting searching XML text documents.
The sub-projects are:
- Initial build version. Build the inversion datasets.
- Initial search version. Search the initial inversions.
- Regression tests. Regressions for search request parsing, inversion builds, and search resolution.
- Document add, replace, and delete. Attributes to maintain the inversion.
- Slice Rollup. Automation to rollup the incremental data.
- Wildcard processing. Alter the wildcard processing to work with large numbers of terms that match a patterns.
- Retrieval application. An application to retrieve documents from the search resolve hit lists.
- Equivalence terms. Language equivalence (like stemming) and ad hoc phrase equivalencing.
There is enough work that it is unlikely that a single intern would be able to complete all of the sub-projects in a single period.
Completion of this project involves:
Provide details about the following:
- Checked in code
- Documentation
- Test code
- Regression tests
Expected feature list: Code checkin will be done weekly, and the commit will be pushed. The developer can determine whether to amend a single commit or to provide a sequence of weekly commits.
Each sub-project will be done in sequence, and each sub-project will have a separate pull request.
The attribute exports intended to be used by an application developer using the framework will be documented using java Doc style comments.
By the midterm review we would expect you to have:Please add details below including the JIRA ticket details:
- Initial build version,
Jira Legacy server HPCC serverId dd31125a-2765-3c6f-bcf5-4c7e6f8192d5 key TS-2 - Initial search version,
Jira Legacy server HPCC serverId dd31125a-2765-3c6f-bcf5-4c7e6f8192d5 key TS-3 - Regression tests,
Jira Legacy server HPCC serverId dd31125a-2765-3c6f-bcf5-4c7e6f8192d5 key TS-4
Mentor | John Holt |
Skills needed |
|
| |
Deliverables |
|
Other resources |
|