HPCC Systems Intern Program - Class of 2021
Find out more about the HPCC Systems Summer Intern Program including how to apply and read this blog introducing the students and their projects.
12 students joined our intern program in 2021. Our students presented about their projects to the team during the year and 9 of them entered our 2021 Poster Contest held at our virtual HPCC Systems Community Day Summit in October 2021.
Due to COVID-19 all internships were completed remotely.
Meet the Class of 2021
Name | Project Title | Description | Mentor(s) | Resources |
---|---|---|---|---|
Achinthya Sreedhar | Improving conditional probability calculations using kernel methods in Reproducing Kernel Hilbert Space (RKHS) as a part of the Causality Project | Conditional Probability is a key enabling technology for Causal Inference. For real valued variables, calculating conditional probabilities is particularly challenging because they can take on an infinite set of values. With the increase in conditional dimensions, the data appears sparser and sparser making it difficult to derive accurate results. After looking at various ways of modelling conditional probabilities, we found that using RKHS kernel methods, it was possible to estimate the density and cumulative density of conditional probabilities with a single conditioning variable. | Roger Dev | |
Alexander Parra | Implement a PMML Processor | The aim of this project was to implement a Predictive Model Markup Language (PMML) Processor using ECL and providing a user friendly interface. The converter works in both ways for simple basic (and multiple) Linear Regression machine learning models. The converter takes in a .pmml/.xml file and returns a .ecl file, containing the code needed to make predictions. Conversely, the converter also takes in a .ecl file and compiles it, turning it into a PMML model in the process. This work makes it easier for users to convert files and provides support for other algorithms, such as Logistic Regression, Random Forests, Neural Networks, etc. | Dan Camper | |
Amy Ma | Ingress Configuration | An Ingress is an object that allows access to Kubernetes services from outside the Kubernetes cluster. Ingress is made up of an Ingress object and the Ingress Controller. An Ingress Controller is the implementation of the Ingress. In this project, two Ingress implementations, HAProxy and Nginx were examined on Azure environment. These two Ingress controllers both use the in-cluster Ingress solutions, where load balancing is performed by pods within the cluster. My works explore the different setup used to configure Ingress features through annotations and Kubernetes ingress specifications. | Kevin Wang | |
Atreya Bain | Improvements on HSQL: A SQL-like language for HPCC Systems | Big Data has become an important field, and there is a steep learning curve to getting used to handling Big Data, especially in distributed systems. HSQL for HPCC Systems is a solution that is developed for allowing users to get used to its architecture and the ECL (Enterprise Control Language) language with which it primarily operates. HSQL aims to provide a seamless interface for data science developers to use, for working with data. It is designed to work in conjunction with ECL, the primary programming language for HPCC Systems, and should prove to be easy to work with and robust for general purpose analysis. | Arjuna Chala | |
Carina Wang | Processing Student Images with Kubernetes on HPCC Systems Cloud Native Platform | In order to foster a safe learning environment, measures to bolster campus security have emerged as a top priority around the world. The developments from my internship will be applied to a tangible security system at American Heritage High School (AHS). Processing student images on the HPCC Systems Cloud Native Platform and evaluating the HPCC Systems Generalized Neural Network (GNN) bundle on cloud ultimately facilitated a model’s classification of an individual as “AHS student” or “Not an AHS student”. This will allow a person to receive confirmation from the robot that they are in the student database and retrieve information as part of a larger, interactive security feature. | David DeHilster | |
Christopher Connelly | Ingestion and Analysis of Collegiate Women's Baskteball GPS Data in HPCC Systems and RealBI
| In the past NC State Strength and Conditioning has worked with HPCC Systems to create solutions for taking different data streams and bringing them together for a comprehensive analysis to improve athlete wellbeing and performance. Here you will see some solutions using HPCC Systems and RealBI to provide insight from data collected with the NC State Women's basketball team. You will see some differences from working with a Bare Metal environment to a Kubernetes environment. See how these solutions can help our understanding of this data to provide better service to these student athletes. | Raja Sundarrajan | |
Eleanor Carl | Continue Novel COVID-19 Tracker and Global Map Using HPCC Systems ECL Watch | HPCC Systems contains an active Covid-19 portal as a part of our web footprint. Connecting the major Covid-19 databases together with Airport Data provides a number of possible applications of this tool such as, data analysis, public safety applications, educational resources and as a traveling tool. As a travel tool, it could provide the ability to view COVID-19 data and metrics alongside a user's input Itinerary. An interactive map, colored coded by vaccine percentages and other data such as IATA codes, airport information, airport locations, confirmed cases, school closing data, contagion risk percentages, deaths, gathering restrictions and mask restrictions etc. All this data can be dynamically populated in the interactive map. | Arjuna Chala | |
Jefferson Mao | Toxicity Detection
| Not only was the creation of the internet the largest technological breakthrough of the 20th century, it also happened to become a hidden double-edged sword. The internet has allowed us to access information and communicate at unprecedented levels, across the globe. Yet, this comes at an enormous cost. The human cost. Hidden behind computer screens, we enjoy a security blanket of anonymity, which emboldens some to say and do things that are labeled as disturbing in a public setting. By creating a Toxicity Detection Platform, I aim to curb this harassment and provide a healthier web environment for everyone. | Bob Foreman | |
Mara Hubelbank | Causal Inference in Machine Learning | The HPCC Systems platform is dedicated to research and development within the groundbreaking field of causal statistics, which seeks to understand and model the complex causal mechanisms of our everyday lives. This project focused on designing the interventional and counterfactual modules of this platform; these algorithms tease apart the structure of the input data to get to the how and why of the relationships which link them together. Moreover, this project demonstrates multiple use cases on synthetically generated data, identifies real-world datasets for exploration, and outlines areas for future extension of the platform extracted from cutting-edge causality research. | Roger Dev | |
Mayank Agarwal | Independence Testing with RCoT : Causal Validation and Discovery for HPCC System Causal Toolkit | The new science of Causality promises to open new frontiers in Data Science and Machine Learning, but requires an accurate model of the causal relationships between variables. This causal model takes the form of a Directed Acyclic Graph (DAG). Nature provides a few subtle cues to the structure of the causal model, the most important of which is the independencies or conditional independencies between variables. These independencies allow us to test a causal model to determine if it is consistent with the observed data, and in some cases to discover the causal model from data alone. | Roger Dev | |
Nikita Jha | Apply Docker Image Build and Kubernetes Security Principles
| With cybersecurity attacks becoming more prevalent in the United States every year, organizations are constantly looking for ways to improve the security outlook of their platforms. Recently, HPCC Systems has begun transitioning to a cloud-native platform in which they use Docker containers managed by Kubernetes to store and manage data. With this new change, it is of utmost importance that HPCC Systems has a secure cloud environment since they are using it to manage secure data from other companies. | Michael Gardner | |
Roshan Bhandari | Use Azure Spot Instance with HPCC Systems for Cost Optimization | Minimizing the cost of setting up cloud infrastructure is very important for all companies. Azure spot instances can provide great cost savings for cloud infrastructure setup. Azure Spot Instances are unused computing resources (virtual machines) azure has. Azure gives it for a lower price compared to normal virtual machines. It is found that Azure gives these instances at a rate that can be as low as 90% below the normal instance. The price can vary based on region and size. In this project, we try to analyze different aspects related to the use of Azure Spot Instance with HPCC Systems. | Godson Fortil |
Profile of our intern program in 2021
12 students - 4 High School, 6 Undergraduates, 1 Masters and 1 Researcher
Global and inclusive program, with three students located in Asia (India) and 1 international student studying in the USA.
2 returning students
All remote working
Spread of projects: 3 Cloud, 7 Machine Learning, 2 platform related
12 mentors involved including 2 academic mentors
HPCC Systems platform and cloud related projects
Use Azure Spot Instance with HPCC Systems for Cost Optimization
Apply Docker Image Build and Kubernetes Security Principles
Improvements on HSQL: A SQL-like language for HPCC Systems *
Ingress Configuration
Continue Novel COVID-19 Tracker and Global Map Using HPCC Systems ECL Watch
Machine learning related projects
Independence Testing with RCoT : Causal Validation and Discovery for HPCC System Causal Toolkit
Toxicity Detection *
Ingestion and Analysis of Collegiate Women's Baskteball GPS Data in HPCC Systems and RealBI
Processing Student Images with Kubernetes on HPCC Systems Cloud Native Platform *
Improving conditional probability calculations using kernel methods in Reproducing Kernel Hilbert Space (RKHS) as a part of the Causality Project
Implement a PMML Processor
Causal Inference in Machine Learning
* Projects suggested by students themselves
All pages in this wiki are subject to our site usage guidelines.