Create an NLP Dictionary for Tamil
This project was completed by a student accepted on to the 2023 HPCC Systems Intern Program.
Project Description
In order to eventually create digital human readers in Tamil, a dictionary must be established. This project will use the Tamil dictionary from Wiktionary.
Completion of this project involves:
Download the Tamil dictionary from wiktionary
Write an NLP++ parser to extract the vocabulary from the wiktionary files into text files
Write an NLP++ parser to transform the text files into knowledge base files
Create Tamil test files for part-of-speech tagging
Write an NLP++ part-of-speech tagger
Run the tests using the NLP++ Plugin in ECL to show enhancements
Create an NLP++ repository for the Tamil dictionary and analyzers
Mentor | |
Skills needed |
|
Deliverables | Midterm
End of project
|
Other resources |
|
All pages in this wiki are subject to our site usage guidelines.