Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This project

...

was completed by Farah Al Shanik, a PhD student studying Computer Science at Clemson University. Farah Joined the HPCC Systems intern program in 2018.

There are many variants to take into account for this project such as matching plural and singular forms, language variants, punctuation evident in acronyms and the use of initials and alternative spellings. Such as color with and without the ‘u’.

Find out

...

...

Curious about other projects we are offering? Take a look at our Ideas List

Project Description

There is a detailed description of the work at 

Jira Legacy
serverHPCC
serverIddd31125a-2765-3c6f-bcf5-4c7e6f8192d5
keyTS-1
 in the in the JIRA issue TS1, which includes an attachment to the the Open Source Text Search document.  This JIRA also details a series of sub-tasks describing the work.

...

By the midterm review we would expect you to have completed:

  1. Initial build version Jira LegacyserverHPCCserverIddd31125a-2765-3c6f-bcf5-4c7e6f8192d5key: See https://track.hpccsystems.com/browse/TS-2
  2. Initial search version Jira LegacyserverHPCCserverIddd31125a-2765-3c6f-bcf5-4c7e6f8192d5key: See https://track.hpccsystems.com/browse/TS-3
  3. Regression tests Jira LegacyserverHPCCserverIddd31125a-2765-3c6f-bcf5-4c7e6f8192d5key: See https://track.hpccsystems.com/browse/TS-4
Mentor

John Holt
Contact details

Backup Mentor: Roger Dev
Contact Details 

Skills needed
  • Ability to code in ECL.
  • Knowledge of regular expression parsing.
  • Ability to build and test the HPCC system (guidance will be provided).
  • Ability to write test code.
Deliverables
  • Checked in codeImprove an algorithm to solve initialism with punctuation in the search request ans state names equivalence terms
  • Include equivalences mined from Moby Thesaurus
  • Improve an algorithm to find the synonyms of the terms that appear in the search request.
  • Test cases demonstrating the correct behaviour and performance
  • Documentation
Other resources