Find out more about the HPCC Systems Summer Intern Program including how to apply and read this blog introducing the students and their projects.
15 students joined our intern program in 2023. Our students presented about their projects to the team during the year and 14 of them entered our 2023 Poster Contest held at the virtual HPCC Systems Community Day Summit in October 2023. Starting this year, we also welcomed two students from our LNRS office in Mumbai as honorary members to our internship program.
Meet the Class of 2023
Name | Project Title | Description | Mentor(s) | Resources |
---|---|---|---|---|
Aryaman Gautam Bachelor of Tech Data Science | HPCC Systems local deployment on K3D cluster | The goal of this project was to establish an initial setup for a local deployment of HPCC Systems on K3D. K3D is a lightweight wrapper to run K3S (Rancher Lab's minimal Kubernetes distribution) in docker which makes it very easy to create single and multi-node K3S clusters in docker. | Xiaoming Wang Godji Fortil Chinmay Desai Sidharth Ganesan | |
Boqiang Li Ph.D. in Computer Science, Clemson University, USA | Convert Generalized Neural Network bundle (GNN) to native Tensorflow 2.0 | Neural Networks have emerged as a powerful tool for analyzing complex datasets like images, video, and time-series data, surpassing classical methods in their effectiveness. To leverage this potential, HPCC Systems offers the Generalized Neural Network Bundle (GNN), which combines the parallel processing capabilities of HPCC Systems with the robust Neural Network functionalities of Keras and TensorFlow. This project upgraded the GNN bundle to utilize the native Tensorflow 2 interface. The upgraded GNN with Tensorflow 2 demonstrated several significant advantages over its previous version. | Lili Xu Roger Dev | View Poster |
Carlos Caceres High School Student | Practical Application of Generative AI Technology | During this project a generalized interface was created for HPCC Systems to access GPT and ChatGPT. From there the steps were taken to use HPCC Systems to train a neural network model capable of classifying faces into different emotions. These emotions would then be processed by the interface to create a call to OpenAI’s API from which an appropriate response would be generated. | Lili Xu Roger Dev | |
Davi Charvi Bachelor of Tech Data Science | Resume analyzer in NLP++ | A Resume Analyzer is the implementation of an approach to apply various techniques for analyzing the resumes a company receives and retrieving the main sections. This project has leveraged the NLP++ plugin to process resumes and extract the main headers and sections of the resume, such as skills, work experience, email, and education. | David de Hilster Umesh Mahind Nandhini Velu | |
Elizabeth Lorti Bachelor of International Development, | HPCC Systems Marketing and Branding | As a returning HPCC Systems intern and one that has worked year-round on maintaining social media, this year, I completed a review of my own social media contributions and strategy to see what could be done to improve, as well as will conducted interviews among stakeholders and recorded minutes to best understand and communicate the needs of the Technology Summit and Community Day stakeholders. | Jessica Lorti | |
Hiroki Sato Masters in Computer Science | Automation of HPCC Systems Cloud Native Deployment to AWS with Terraform | This project leveraged Terraform to explore the deployment of the HPCC Systems containerized application onto AWS Elastic Kubernetes Service cluster (EKS). During the internship, we developed a hpcc-aws-terraform module. This consisted of building a necessary AWS infrastructure such as virtual private cloud (VPC), subnets, necessary security group, EKS cluster and node group. | Wayne Carty Godson Fortil | |
Jessie Mao High School Student | HPCC Systems Deployment with Various Helm Chart Configurations | This project provided two solutions for HPCC Systems deployments. The overrides solution utilizes the default values.yaml file while using other files to modify it. Overrides can be used to make small changes to the values.yaml, and mainly concentrates on Roxie and Thor. The HPCC-lite, on the other hand, does not require a custom values.yaml file, so can be used with other files to create more scenarios. | Xiaoming Wang Godson Fortil | |
Johnny Huang Bachelor of Computer Science | Improve Error Handling and Reporting for Automated Test Systems | This project concentrated primarily on refining the GitHub Actions scripts, a vital tool for automated testing within the HPCC Systems environment. These scripts analyze the logs generated from tests, providing a granular breakdown of the executed tests. I also introduced enhancements to the scripts to improve the fault tolerance of our testing systems. These included adding logic to retry failed actions, increasing the resilience of the system to transient issues, reducing test failures, and decreasing the need for manual interventions. | Attila Vamos | |
K Dheemonth Bachelor of Computer Science and Engineering | Sentiment Analysis in English | During my internship we created a number of parsers and an analyzer using NLP++(Visual Text). To do this, we defined the different rules that map to a very generic manner of supplying the sentiments rather than having for specific ones. NLP++ assisted in constructing the parsers for assigning different sentiments depending on user, cricket terms, player and team interests and team supports. The second phase centered on the sentiments that were given to emojis. Emojis in the dictionary, a capability offered by NLP++, were used to assign sentiments to the cricket tweets. | David de Hilster | |
Kruthika Pinnada Bachelor of Computer Science and Engineering | Resume Analyzer | The project "Resume Analyzer" leverages the power of NLP++ programming language to build a digital human reader that parses the resume text in the same we humans do. The system has made use of the “zoning” of a resume (done by a previous intern) and aims at doing an in-depth analysis of text and extracting valuable information in the way a human does. | David de Hilster | |
Logan Patterson Masters in Data Science | Designing Test Algorithms for Causal Model Discovery Within the HPCC Systems Causality Framework | The discovery model testing algorithm was used on four different algorithms, one of which was already implemented within Because: PC (Peter-Clark), GES (Greedy Equivalence Search), IGCI (Information Geometric Causal Inference), and RCC (Randomized Causation Coefficient. Each of the models were compared to one another based on performances with various datasets to determine viability of both the testing algorithm and the models themselves. This algorithm hopefully paves the way for easier integration and implementation of causal discovery algorithms for future developments within the HPCC Causality Framework | Roger Dev Lili Xu | |
Narayan Kandel Ph.D. in Computer Science, Clemson University, USA | Enhancing Performance of Distributed Neural Network with GNN Bundle | Our work addresses the challenges of parallelizing neural network training, recognizing that, in certain scenarios, superior results can be achieved by training on a single high-powered node (e.g., with GPU) or a limited number of nodes. To enhance network performance and accuracy, we pursued two distinct approaches. Firstly, we optimized neural network training by strategically setting a limit on the number of nodes used for training, thereby reducing communication overhead. Secondly, we investigated an alternative multi-node approach to network training, varying the starting points across multiple nodes and averaging the results. This technique shows promises yielding improved predictions compared to a single-network setup. | Lili Xu Roger Dev | |
Nivedha Sivakumar Bachelor of Computer Science | Test Suite for a Roxie Cluster on Kubernetes | My project focuses on creating a test suite for Roxie designed to provide more in-depth understanding of how different query, cluster and infrastructure configurations can affect functionality and performance of Roxie in the cloud. Unlike the bare metal, the cloud environment provides more options and flexibility to build and customize your cluster infrastructure. The primary goal for the test suite is to give indications or guidelines to what configuration will be suitable for each use case of Roxie in the future. | Krishna Turlaphati Attila Vamos | |
Noah Seligson Bachelor of Computer Science | Convert Automated Test Systems from Python2 to Python3 | The main objective for this project is to convert the Python files from the Smoketest and OBT from Python 2 to Python 3. There are tools such as 2to3 that automate the conversion process by changing segments of codes based on pre-existing conditions in its algorithm. This tool is not enough to ensure a healthy conversion, which is why manual review and testing are a mandatory part of this project as well. In addition, another principal goal is a clean-up of both the Python and Bash files for the testing systems. This involves removing commented code that does not serve a purpose to the file anymore, unused variables and uncalled functions. | Attila Vamos | |
Ryan Rao High School Student | HPCC Systems Storage Support With Container Storage Interface (CSI) | The goal of my project is to create the 3rd storage lifecycle for the EFS implementation to provide users with a more permanent storage and to complete as much as I can of the FSx implementation. This will include configuring all the necessary storage components: PVs, PVCs, storage classes, EFS access points, the CSI driver, etc. In addition, I will build the necessary helm charts and improve any existing code and documentation. | Xiaoming Wang Godson Fortil | |
Sarah Nash Masters in Data Science | Causal Discovery and Validation with Categorical Data | The HPCC Systems Causality Framework “Because” is a toolkit for multiple areas of causal analysis, including discovery and validation. The discovery algorithms previously implemented in the toolkit are mainly compatible with two data types: continuous numeric and discrete numeric. This project’s focus is to expand the discovery portion of the toolkit to additionally handle the remaining data type: categorical data. In all, we were able to determine strengths and weaknesses of this particular model through various tests, as well as areas for improvement within the Causality toolkit. | Roger Dev Lili Xu | |
Shyamaa Karthik High School Student Saint Andrew's School Boca Raton, FL, USA | Processing the Tamil Wiktionary Pages into a NLP++ Dictionary | As a summer intern for HPCC Systems, I worked on creating the world’s first and most advanced Tamil dictionary with parts of speech for NLP++. My goal was to use Tamil wiktionary pages and leverage the past English Wiktionary parser to create my own parser for Tamil. My end result was the most thorough Tamil dictionary for NLP++ to date, but my hope is that more people will come along and build on it and expand it to make it more complete, and the same is carried across more languages. | David de Hilster |