Jayanth C - 2023 Poster Contest Resources

Jayanth is currently in his 4th semester, pursuing a B.E in Information Science and Engineering from RV College of Engineering, Bangalore, India.
His interests include reading, doing research, playing outdoor games and traveling.

Poster Abstract

Introduction:

Developing an updated version of the English NLP Dictionary by effectively parsing and analyzing the English Wiktionary files using NLP++ analyzers. The goal is to generate a comprehensive Knowledge base and dictionary files. To achieve this, the project aims to leverage HPCC Systems and its ECL-based architecture to study and analyze the data in the files.

Objective:

The objective is to explore the potential of HPCC Systems in efficiently examining large-scale language resources and extract relevant linguistic information for enhancing the NLP Dictionary. The study will focus on optimizing the parsing and analysis process, ensuring accurate extraction of lexical data and evaluating the performance of the updated dictionary.

Methodology:

Setting up the scalable, open-source big data processing platform- HPCC Systems. Install and configure the necessary components required to run ECL for data processing. ECL code Development: Write ECL scripts that utilize the capabilities of HPCC Systems to efficiently process and analyze the generated knowledge base and dictionary files. ECL is a high-level declarative language specifically designed for data manipulation and analysis within the HPCC Systems platform. This will involve necessary queries and transformations to explore and analyze the linguistic data effectively. Data Exploration and

Analysis:

Execute the ECL scripts to explore and analyze the data in the knowledge base and dictionary files. Perform linguistic analysis and computations based on the requirements of the updated version of the English NLP Dictionary.

Optimization and Performance Evaluation:

Fine-tuning the ECL code and the overall process to optimize performance and scalability. Assessing the efficiency and effectiveness of the implemented methodology by evaluating its performance. Measure factors such as processing time, resource utilization and the quality of the extracted linguistic information. Iterate on the methodology, analyzers, and data processing steps based on the evaluation results and feedback. Refining the processing techniques to enhance the accuracy, coverage, and overall quality of the English NLP Dictionary.

Presentation

In this Video Recording, Jayanth provides a tour and explanation of his poster content.

Scalable Analysis of English Dictionary Files on HPCC Systems

Click on the poster for a larger image.

All pages in this wiki are subject to our site usage guidelines.