/
Ilhan Gelle - 2024 Poster Contest Resources

Ilhan Gelle - 2024 Poster Contest Resources

Ilhan Gelle.jpg

Ilhan Gelle is a Junior Undergraduate student at The University of Texas at Arlington, pursuing a degree in Computer Science. With a keen interest in big data and software engineering, Ilhan has made significant contributions to the field. During her HPCC Systems Summer 2024 internship, she developed a comprehensive test suite for a Parquet plugin, enhancing its robustness and reliability.

 

Poster Abstract

The goal of this project was to develop a comprehensive test suite for the HPCC Systems Parquet Plugin, addressing the critical need for robust validation in big data processing tools. This test suite ensures the Parquet plugin meets high standards of performance, functionality, and reliability, making it superior to traditional file formats like CSV and XML.

Efficient data processing is crucial for handling massive volumes of data. The Parquet format, known for its columnar storage, offers better compression and faster queries compared to CSV and XML. To fully leverage Parquet's capabilities, the plugin must operate flawlessly.

This project provides a test suite that:

  1. Ensures data integrity and compatibility across various ECL and Arrow data types.

  1. Optimizes performance by evaluating different compression algorithms and file sizes.

  1. Handles real-world scenarios, including large datasets, concurrent operations, and schema evolution.

  1. Addresses edge cases and errors to maintain system stability.

  1. Facilitates continuous improvement to keep pace with evolving big data technologies.

The test suite also highlights the Parquet Plugin for HPCC Systems, which enables efficient data transfer between Parquet and ECL formats. The plugin supports reading and writing both regular and partitioned Parquet files, leveraging Apache Arrow for efficient data streaming and offering various compression options.

The HPCC Systems Parquet Plugin Test Suite is crucial for enhancing the reliability and efficiency of big data processing workflows, empowering HPCC Systems to leverage the Parquet format confidently and effectively.

Presentation

In this Video Recording, Ilhan provides a tour and explanation of her poster content.

Test suite for the Parquet Plugin:

Click on the poster for a larger image.

Poster Competition- Ilhan Gelle.png

 

Related content

Eatesam Khan - 2024 Poster Contest Resources
Eatesam Khan - 2024 Poster Contest Resources
Read with this
Test suite for the HPCC Systems Parquet plugin
Test suite for the HPCC Systems Parquet plugin
More like this
Henrique Antonio Buzin Vargas - 2024 Poster Contest Resources
Henrique Antonio Buzin Vargas - 2024 Poster Contest Resources
Read with this
Tech Talk 11 - February 15th 2018
Tech Talk 11 - February 15th 2018
More like this
Rohith Pogudu - 2024 Poster Contest Resources
Rohith Pogudu - 2024 Poster Contest Resources
Read with this
HPCC Systems at ODSC East 2024
HPCC Systems at ODSC East 2024
More like this

All pages in this wiki are subject to our site usage guidelines.