Shyamaa Karthik - 2023 Poster Contest Resources
Shyamaa is a high school student at Saint Andrew's School and will be a senior next school year. In addition to coding and computer science, he enjoys going to the beach with his friends and playing basketball. |
Poster Abstract
The more languages that can be used in Natural Language Processing, the more effective it can be as a whole. Therefore, my goal was to expand the NLP++ dictionary to include Tamil, the fifth largest language in the most populated country in the world, India. This project was special to me because Tamil is the language my family speaks, and even though I can speak it fluently, I can’t read and write it, so for the language research portion, I worked closely with my father who is from India and has been reading and writing it for a majority of his schooling. As a summer intern for HPCC Systems, I worked on creating the world’s first and most advanced Tamil dictionary with parts of speech for NLP++. My goal was to use Tamil wiktionary pages and leverage the past English Wiktionary parser to create my own parser for Tamil. This project was heavy on research since it's something that's never been done before, and there were more than a few roadblocks along the way. For example, when using Python to process the Tamil Wiktionary pages, my dad and I thought the pages were a good source to use, but when I was writing out the NLP++ analyzers, I noticed that the pages didn’t have a common format. When reviewing it with my dad, we found out that most of the parts of speech and definitions from the wiktionary pages were nonsense and incomprehensible, so we had to research and look for a new source of Tamil words and parts of speech and eventually found a tagging project that had words and part of speech correctly. It just went to show how new natural language processing is that even the wiktionary site wasn’t a reliable source for the project, and how important it is to build and expand on it so that more and more people from across the world can be a part of this new wave with NLP++ as the medium. My end result was the most thorough Tamil dictionary for NLP++ to date, but my hope is that more people will come along and build on it and expand it to make it more complete, and the same is carried across more languages.
Presentation
In this Video Recording, Shyamaa provides a tour and explanation of his poster content.
Processing the Tamil Wiktionary Pages into a NLP++ Dictionary
Click on the poster for a larger image.
All pages in this wiki are subject to our site usage guidelines.