Implement a Preprocessing Bundle for HPCC Systems Machine Learning Library
This project was completed during the 2020 HPCC Systems Intern Program by Vannel Zeufack, Kennesaw State University, USA.
Find out about the HPCC Systems Summer Internship Program.
Resources Available to Learn More about this completed project:
Project Description
The student will produce a pre-processing bundle as part of the HPCC System Machine Learning Library that assists the user in performing some of the basic tasks of preparing their data for use with various ML Algorithms. The aim is to create tools in ECL to be added to the HPCC Systems machine learning library in the form of bundle to prepare data. Some examples of tools to be added are
One-hot encoding/decoding
Variable normalization and standardization
Scaling
Various sampling methods
Other important pre-processing tasks identified during the course of the project
The project is open to accepting other suggested tools that users of the HPCC Systems ML library may find useful.
Completion of this project involves:
Implementation of proposed pre-processing tools in ECL
Unit Testing
Code check in on Github
Documentation
White Paper
By the mid term review we would expect you to have:
Implemented at least 60% of the proposed tools
Mentor | TBD Backup Mentor:Â TBD |
Skills needed |
|
Deliverables |
|
Other resources |
|
All pages in this wiki are subject to our site usage guidelines.