Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

The proposal application period for the 2021 HPCC Systems intern Program is now open.

The deadline date for proposal applications is Friday 19th March 2021.

Discuss your ideas with the project mentor and send your final proposal to Lorraine Chapman.

This project was completed during the 2020 HPCC Systems Intern Program by Vannel Zeufack, Kennesaw State University, USA.

Find out about the HPCC Systems Summer Internship Program.

Resources Available to Learn More about this completed project:

Project Description

The student will produce a pre-processing bundle as part of the HPCC System Machine Learning Library that assists the user in performing some of the basic tasks of preparing their data for use with various ML Algorithms. The aim is to create tools in ECL to be added to the HPCC Systems machine learning library in the form of bundle to prepare data. Some examples of tools to be added are

  • One-hot encoding/decoding
  • Variable normalization and standardization
  • Scaling
  • Various sampling methods
  • Other important pre-processing tasks identified during the course of the project

The project is open to accepting other suggested tools that users of the HPCC Systems ML library may find useful.

Completion of this project involves:

  • Implementation of proposed pre-processing tools in ECL
  • Unit Testing
  • Code check in on Github
  • Documentation
  • White Paper

By the mid term review we would expect you to have:

  • Implemented at least 60% of the proposed tools
Mentor

TBD
Contact Details

Backup Mentor: TBD
Contact Details 

Skills needed
  • Knowledge of ECL. Training manuals and online courses are available on the HPCC Systems website.
  • Knowledge of distributed computing techniques
  • Familiar with HPCC Systems Machine Learning Library
  • Familiar with Data Pre-Processing
  • Familiar with Github
Deliverables
  • Midterm

    • Implement at least 60% of the proposed tools

    End of project

    • Implement 100% of the proposed tools
    • Unit Testing
    • Code check in on Github
    • Documentation
    • White Paper
Other resources
  • No labels