/
Create an NLP Dictionary for Tamil

Create an NLP Dictionary for Tamil

This project was completed by a student accepted on to the 2023 HPCC Systems Intern Program.

Project Description

In order to eventually create digital human readers in Tamil, a dictionary must be established. This project will use the Tamil dictionary from Wiktionary. 

Completion of this project involves:

  • Download the Tamil dictionary from wiktionary

  • Write an NLP++ parser to extract the vocabulary from the wiktionary files into text files

  • Write an NLP++ parser to transform the text files into knowledge base files

  • Create Tamil test files for part-of-speech tagging

  • Write an NLP++ part-of-speech tagger

  • Run the tests using the NLP++ Plugin in ECL to show enhancements

  • Create an NLP++ repository for the Tamil dictionary and analyzers

Mentor

David Dehilster

Skills needed
  • Keen interest in natural language

  • Ability to learn and program in NLP++

  • Ability to create test cases

  • Ability to write test code in ECL using the NLP++ plugin to test the enhanced dictionary

Deliverables

Midterm

  • Parts-of-speech text files

End of project

  • A Tamil dictionary repository in the VisualText open source github including the dictionary files and  NLP++ analyzers

Other resources

All pages in this wiki are subject to our site usage guidelines.