...
13 students joined our intern program in 2024. Our students presented about their projects to the team during the year and 12 of them entered our 2024 Poster Contest held hosted at the virtual HPCC Systems Community Day Summit in October 2024.
...
Name | Project Title | Description | Mentor(s) | Resources |
---|---|---|---|---|
Charan Nagaraj Bachelor of Computer Science RV College of Engineering, India | Migrate and Improve Regression Testing in GitHub actions | At HPCC Systems, we use two main test systems: Overnight Build and Test (OBT) and Smoketest. Regression testing of ECL bundles, initially handled by OBT, is now integrated into Continuous Integration (CI) using GitHub Actions, automatically testing bundles when a pull request (PR) is raised. Additionally, I implemented automated testing of hyperlinks in our documentation files, also using GitHub Actions. This ensures that broken links are detected early, keeping the documentation accurate without requiring manual verification. | Attila Vamos | |
Eatesam Khan Masters in Computer Science California State University, USA | Create a New HPCC Command Line Tool | As part of my internship, I developed a command-line tool that simplifies interaction with HPCC Systems ESDL services, offering powerful features for describing and testing services. The describe command provides detailed information about available services, methods, and request-response structures, while the test command allows users to send test requests, supporting various formats like XML and JSON. Key options include setting authentication credentials and server details. A standout feature is dynamic tab auto-completion, which helps users input commands accurately and efficiently. | Terrence Asselin Tim Klemm | |
El Arbi Belfarsi PhD in Computer Science Kennesaw State University, USA | Update and Improve the Generation of Platform Artifacts for HPCC Systems Builds | This project focuses on transitioning HPCC Systems CI/CD workflow from Jenkins to GitHub Actions, automating platform artifact generation using Python. A Python script replaces an existing web service, handling tasks like fetching assets, extracting metadata, and saving data as JSON. The workflow automates setup of AWS credentials, Docker image management, and uploads to GitHub and AWS S3, with security provided by GitHub secrets. This project streamlines the build process, reduces manual effort, and improves automation, benefiting the HPCC Systems platform and the open-source community. | Michael Gardner Ming Wang | |
Elizabeth Lorti Bachelor of International Development, | Technology Marketing and Branding | For this year's Tech Summit, I coordinated communication with stakeholders, collected speaker bios and abstracts for uploads, and worked closely with the project management team. I also managed all social media channels and key event aspects. Leveraging two years of prior experience, including last year's Summit, I efficiently referenced past spreadsheets to streamline bio and content management. | Jessica Lorti | |
Gagana Premnath Masters in Computer Science Syracuse University, USA | Integration of HPCC Systems Terraform CI with GitHub Actions | This project integrates HPCC Systems Terraform-based infrastructure management with GitHub Actions to streamline the deployment of HPCC Systems clusters. Terraform modules - vnet, storage, aks, and HPCC Systems - are deployed sequentially using GitHub Actions workflows. Key steps include configuring Terraform, managing Azure authentication, handling data persistence, and securing sensitive information with GitHub Secrets. By automating deployments through GitHub Actions, the project ensures consistency, reduces manual intervention, and improves deployment efficiency, while fostering collaborative development and maintaining reliable, version-controlled infrastructure across environments. | Godji Fortil Ming Wang | |
Girikratna Premnath Bachelor of Tech Data Science | Integration of PowerBI with HPCC Systems platform | My project established a connection between Power BI and HPCC Systems using WsSQL for SQL-based data retrieval. I automated SOAP requests from Power BI to HPCC Systems, enhancing data analytics and visualization workflows. Using a Bare Metal System on WSL, I handled the Power BI integration with M code/Power Query and successfully tested it on various data sample sizes, ensuring smooth functionality. | Srinivasan Kothandam Aryaman Gautam | |
Harsh Raj Bachelor of Tech Data Science | Vehicle Build Contributory System | The goal of this project was to develop an end-to-end pipeline that automates data extraction using Python libraries such as Beautiful Soup and Selenium. Data transformation and cleaning were performed using HPCC Systems platform capabilities, and insights were visualized through tools like Power BI, creating a streamlined process from extraction to visualization. | Srinivasan Kothandam Aryaman Gautam | |
Ilhan Gelle Bachelor of Computer Science University of Texas, USA | Test Suite for the HPCC Systems Parquet Plugin | This project developed a comprehensive test suite for the HPCC Systems Parquet Plugin, crucial for ensuring performance, functionality, and reliability in big data workflows. The test suite validates data integrity across ECL and Arrow data types, evaluates compression algorithms and file sizes, and simulates real-world scenarios like large datasets and schema evolution. It addresses edge cases to maintain stability, enabling HPCC Systems to leverage the Parquet format’s columnar storage for faster queries and better compression compared to CSV and XML, ensuring efficient data processing and transfer. | Jack del Vecchio | |
Nisha Bagdwal Masters in Computer Science Information Technology Kennesaw State University, USA | Develop an Automated ECL Watch Test Suite | This project aims to develop a comprehensive automated test suite for the ECL Watch UI, a key component of the HPCC Systems platform for high-speed data engineering. The suite will validate functionality, usability, performance, and error handling, ensuring a seamless user experience. By simulating human interactions, the tests will verify navigation, interactive features, and data presentation. Developed using Java, Selenium, and Unix, the project will include robust documentation for future maintenance. This initiative enhances ECL Watch's reliability and contributes to overall system efficiency. | Attila Vamos Chris Lo | |
Rohith Surya Podugu Masters in Computer Science California State University, USA | Refactoring and Releasing PyHPCC | PyHPCC is a Python package and wrapper for HPCC Systems web services, initially developed as an internal tool at LexisNexis Risk Solutions to automate tasks on HPCC Systems. Since its introduction in 2022, interest in PyHPCC has grown across the organization and the broader community. In response to user feedback, we have enhanced its usability, maintenance, and documentation. We are now excited to announce that PyHPCC will be open-sourced, fostering collaboration within the HPCC Systems community. | Amila de Silva | |
Sabrina Harris Masters in Applied Data Science New College of Florida, FL, USA | HPCC Systems Machine Learning Tutorials | This project explores machine learning bundles in HPCC Systems, focusing on Gaussian Process Regression (GPR), Support Vector Machines (SVM), and the General Linear Model (GLM). It highlights the development of tutorials to help users apply these algorithms, including dataset selection, preprocessing, and coding. The project also identified and resolved a critical error in the SVM implementation, enhancing the robustness of these tools and supporting HPCC Systems educational and open-source goals. | Bob Foreman | |
Scarlett Huang High School Student at A. W. Dreyfoos School of the Arts West Palm Beach, FL, USA | Investigate Third-Party Environments (Google Big Query) | This project integrates HPCC Systems with Google Cloud's BigQuery, utilizing two data transfer methods to streamline migration and analysis. The first method involves migrating large datasets from HPCC Systems to BigQuery via Google Cloud Storage, ensuring secure transfer and automated loading using the BigQuery Data Transfer Service. The second method leverages Google Cloud Pub/Sub for real-time data streaming in JSON format, facilitating continuous data flow for immediate processing. Both methods enhance HPCC Systems capabilities in managing big data efficiently and open opportunities for further integration. | Ming Wang Terrence Asselin | |
Shounak Joshi Bachelor of Computer Science University of Florida, USA | Investigate Third-Party Environments (Azure Synapse Analytics) | This project explores integrating Azure Synapse Analytics with HPCC Systems platform endpoints. Azure Synapse, a limitless analytics service, complements HPCC Systems functionality by offering improved visualization and diverse data analysis. The "Linked Service" feature facilitates connections to various data sources, enabling efficient data ingestion into the HPCC Systems Landing Zone. Users can then query data within Synapse SQL Pools, leveraging its powerful analytics capabilities to gain valuable insights. This project demonstrates the potential of third-party environments to enhance HPCC’s capabilities. | Ming Wang Michael Gardner |