Nivedha Sivakumar - 2023 Poster Contest Resources

Nivedha is a second year student at Georgia State University, majoring in Computer Science. She has worked on developing a test suite for Roxie on Kubernetes. Here is the link to her blog journal link, where she kept track of her experience during her internship. Here is the link to her GitHub where you can replicate her test methodology.

Poster Abstract

HPCC Systems is an open-source platform offering other businesses high-performance data processing and analytics. There are two main components of the HPCC Systems platform that work with big data. Thor cluster is responsible for manipulating massive amounts of data, while Roxie cluster supports high-performance data delivery applications using indexed data files. Each cluster environment has unique requirements for functional and performance testing. In the bare-metal world, this has always been the case, and it is now also common in the containerized world.  

My project focuses on creating a test suite for Roxie designed to provide more in-depth understanding of how different query, cluster and infrastructure configurations can affect functionality and performance of Roxie in the cloud. Unlike the bare metal, the cloud environment provides more options and flexibility to build and customize your cluster infrastructure. The primary goal for the test suite is to give indications or guidelines to what configuration will be suitable for each use case of Roxie in the future. The test cases will be executed to simulate different usage patterns and a wide range of complex queries and can help identify potential problems and areas for improvement within the cluster which can be particularly useful during new releases of HPCC Systems or changes in the infrastructure. 

The test cases are intended to test ECL queries and massive datasets for Roxie cluster on AKS with different storage types, number of nodes, cluster sizes and find the optimal one to deliver the intended functionality and high-performance. As storage is very important in modern computing and data management, it enables efficient data processing and supports data sharing making it an essential foundation for businesses. And nodes are vital for a Kubernetes cluster to function because they are the building blocks of a functioning and reliable cluster. And cluster size is essential to ensure a stable and responsive computing environment for running applications. 

At the start of my project, I used Docker Desktop as my main work environment to understand the basic setup and the workflow of loading data onto the Thor cluster and deploying queries on the Roxie cluster. But later on, I moved onto Azure Kubernetes using Terraform to obtain measurements closer to a real-world workload. 

Presentation

In this Video Recording, Nivedha provides a tour and explanation of his poster content.

Test Suite for a Roxie Cluster on Kubernetes

Click on the poster for a larger image.

All pages in this wiki are subject to our site usage guidelines.