Create an NLP Dictionary for Tamil
If you are interested in this project contact David de Hilster.
Find out about the HPCC Systems Summer Internship Program.
Project Description
In order to eventually create digital human readers in Tamil, a dictionary must be established. This project will use the Tamil dictionary from Wiktionary.
Completion of this project involves:
Download the Tamil dictionary from wiktionary
Write an NLP++ parser to extract the vocabulary from the wiktionary files into text files
Write an NLP++ parser to transform the text files into knowledge base files
Create Tamil test files for part-of-speech tagging
Write an NLP++ part-of-speech tagger
Run the tests using the NLP++ Plugin in ECL to show enhancements
Create an NLP++ repository for the Tamil dictionary and analyzers
By the mid term review we would expect you to have:
More details coming soon
Mentor | |
Skills needed |
|
Deliverables | Midterm
End of project
|
Other resources |
|
All pages in this wiki are subject to our site usage guidelines.