Ilhan Gelle - 2024 Poster Contest Resources

Ilhan Gelle.jpg

Ilhan Gelle is a Junior Undergraduate student at The University of Texas at Arlington, pursuing a degree in Computer Science. With a keen interest in big data and software engineering, Ilhan has made significant contributions to the field. During her HPCC Systems Summer 2024 internship, she developed a comprehensive test suite for a Parquet plugin, enhancing its robustness and reliability.

 

Poster Abstract

The goal of this project was to develop a comprehensive test suite for the HPCC Systems Parquet Plugin, addressing the critical need for robust validation in big data processing tools. This test suite ensures the Parquet plugin meets high standards of performance, functionality, and reliability, making it superior to traditional file formats like CSV and XML.

Efficient data processing is crucial for handling massive volumes of data. The Parquet format, known for its columnar storage, offers better compression and faster queries compared to CSV and XML. To fully leverage Parquet's capabilities, the plugin must operate flawlessly.

This project provides a test suite that:

  1. Ensures data integrity and compatibility across various ECL and Arrow data types.

  1. Optimizes performance by evaluating different compression algorithms and file sizes.

  1. Handles real-world scenarios, including large datasets, concurrent operations, and schema evolution.

  1. Addresses edge cases and errors to maintain system stability.

  1. Facilitates continuous improvement to keep pace with evolving big data technologies.

The test suite also highlights the Parquet Plugin for HPCC Systems, which enables efficient data transfer between Parquet and ECL formats. The plugin supports reading and writing both regular and partitioned Parquet files, leveraging Apache Arrow for efficient data streaming and offering various compression options.

The HPCC Systems Parquet Plugin Test Suite is crucial for enhancing the reliability and efficiency of big data processing workflows, empowering HPCC Systems to leverage the Parquet format confidently and effectively.

Presentation

In this Video Recording, Ilhan provides a tour and explanation of her poster content.

Test suite for the Parquet Plugin:

Click on the poster for a larger image.

Poster Competition- Ilhan Gelle.png

 

All pages in this wiki are subject to our site usage guidelines.