/
HPCC Systems Intern Program - Class of 2022

HPCC Systems Intern Program - Class of 2022

11 students joined our intern program in 2022. Our students presented about their projects to the team during the year and 9 of them entered our 2022 Poster Contest held at our virtual HPCC Systems Community Day Summit in October 2022.

Meet the Class of 2022

Name

Project Title

Description

Mentor(s)

Resources

Name

Project Title

Description

Mentor(s)

Resources

Amy Ma
High School Student
Stoneman Douglas High School, Florida

Document Data Patterns


Data Patterns was an existing feature of HPCC Systems, however it was never formally documented. The information about usage was documented in three separate files. The purpose of this project was to gather the information from the various files and consolidate them into a book accessible to users from the Documentation area of the HPCC Systems website

Jim DeFabia

View Poster 

Blog Journal

Ananya Gupta
PhD Human-Centered Computing
Clemson University

Nepali Wiktionary Initiative and Translation


During first initiative, we developed a parser and an analyzer, NeWiktionary to build a knowledge base for words using HPCC Systems to build a dictionary record structure from the Wiktionary data. Ultimately, this dictionary will be used in doing NLP in Nepali using HPCC Systems. The second initiative focused on trying to involve community to build a better dictionary for Nepali language.

David Dehilster

View Poster

Blog Journal

Arun Gaonkar
Masters of Computer Science
North Carolina State University

Applying Causality Toolkit to Real World Datasets


This project is focused on the analysis of causality and causation-based inference. The main aim of the research is to understand the causal relationship between the factors that are involved in the real-world dataset by applying the Causality Toolkit developed by HPCC systems.

From the CDC dataset that included details from the health survey, I have analyzed diabetes and the effect of a few variables on the probability of diabetes. Using the Because module developed by HPCC Systems, we can observe and analyze the cause and effect of each variable of data.

Roger Dev

View Poster

Blog Journal

Arya Adesh
Bachelor of Computer Science and Engineering 
RVCE

Local Outlier Factor algorithm for Anomaly detection in ECL


 

Local Outlier Factor(LOF) is an unsupervised anomaly detection method that identifies anomalies without training. It is a density-based anomaly detection algorithm that assigns a degree of outlier-ness (called Local outlier factor) to each point in the dataset. LOF can find both global and Local Outliers. Local anomalies are points that are outlying with respect to their neighbors. Other anomaly detection algorithms accurately find global anomalies, however, they fail to identify local outliers as they assume the dataset to exhibit uniform data distribution. LOF is most suitable for uneven distribution datasets as it doesn’t make assumptions about the distribution. It can identify both global outliers (outlying with respect to all the points in the dataset) and local outliers.

Lili Xu

View Poster

Blog Journal

2022 Community Day Presentation

Elizabeth Lorti
Bachelor of International Development
King's College, London

Technology Marketing and Branding


This project required a different lens in understanding and evolving the HPCC Systems community. Social media and marketing strategies are crucial in ensuring that the company and platform are increasing engagement and expanding their reach. This project focused on completing a completed a competitive analysis of HPCC Systems vs competitors in order to further understand marketing strategies and how HPCC Systems can improve and stand out from competitors, updating and collecting collateral documents for review and re-branding and implementing new social media strategies to imporve engagement.

Jessica Lorti

View Poster

Blog Journal

Jack Del Vecchio
Bachelor of Computer Engineering
Miami of Ohio University

Interfacing MongoDB into ECL

This project provides details of the API that was used by my plugin to communicate between MongoDB and HPCC Systems. The two systems have similar data types that are native to them, but there are some differences.

The plugin allows for a wide variety of commands to be sent to the MongoDB database. MongoDB uses documents wh