Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This project is available as a student work experience opportunity with HPCC Systems this summer. Curious about other projects we are offering? Take a look at our Ideas Listwas completed by Farah Al Shanik, a PhD student studying Computer Science at Clemson University. Farah Joined the HPCC Systems intern program in 2018.

There are many variants to take into account for this project such as matching plural and singular forms, language variants, punctuation evident in acronyms and the use of initials and alternative spellings. Such as color with and without the ‘u’.

Find out about the HPCC Systems Summer Internship Program.

This project is no longer available

Project Description

There is a detailed description of the work in the JIRA issue TS1, which includes an attachment to the the Open Source Text Search document.  This JIRA also details a series of sub-tasks describing the work.

...

  1. Initial build version: See https://track.hpccsystems.com/browse/TS-2
  2. Initial search version: See https://track.hpccsystems.com/browse/TS-3
  3. Regression tests: See https://track.hpccsystems.com/browse/TS-4
Mentor

John Holt
Contact details

Backup Mentor: Roger Dev
Contact Details 

Skills needed
  • Ability to code in ECL.
  • Knowledge of regular expression parsing.
  • Ability to build and test the HPCC system (guidance will be provided).
  • Ability to write test code.
Deliverables
  • Checked in codeImprove an algorithm to solve initialism with punctuation in the search request ans state names equivalence terms
  • Include equivalences mined from Moby Thesaurus
  • Improve an algorithm to find the synonyms of the terms that appear in the search request.
  • Test cases demonstrating the correct behaviour and performance
  • Documentation
Other resources