/
Vannel Zeufack - 2020 Poster Contest Resources

Vannel Zeufack - 2020 Poster Contest Resources

Browse: HomeAbstracts, Winners and runners upAwards Ceremony (Watch Recording)Posters by HPCC Systems InternsPosters by Academic Partners, Poster Judges, Virtual Judging, Virtual Poster Booths

Vannel Zeufack is studying for a Masters in Computer Science at Kennesaw State University.

Vannel joined our intern program in 2020 for the second year. He has a keen interest in machine learning algorithms, having completed a project last year that involved contributing to a machine learning bundle focusing on anomaly detection algorithms.

Poster Abstract

In this flourishing era of Artificial Intelligence, Machine Learning algorithms are having an increasingly bigger impact into our daily lives. They are extensively used to power applications in various domains including self-driving cars, weather forecasting, marketing, robotics, anomaly detection and many more. 

A machine learning project can be broadly divided into five main phases: data collection, data preprocessing, model selection and setup, inference and evaluation. Among all those phases, it is well known that the most time-consuming phase is data preprocessing which could account for about 80% of the whole project. 

As machine learning has showed his importance since the last ten years, HPCC Systems, the end-to-end data lake management solution, have made itself up to date by providing a fully-fledged machine learning library. It contains a wide range of machine learning algorithms both supervised and unsupervised. 

However, it currently lacks a data preprocessing package to help machine learning engineers speed up the data preprocessing phase of their projects and therefore enhance their productivity. They still have to write a lot of custom-made modules and functions. 

To fill that gap, we implemented a Preprocessing Bundle for HPCC Systems Machine Learning Library. The current version includes the following modules and functions:

  • LabelEncoder and OneHotEncoder: modules to process categorical features 

  • StandardScaler and MinMaxScaler: modules for scaling data 

  • MLNormalize: a function for normalizing data 

  • Split, StratifiedSplit, RandomSplit and StratifiedRandomSplit: functions for easily splitting datasets into training and test data 

The Preprocessing Bundle will be included into HPCC Systems ML_CORE Library. It comes along with a tutorial showcasing how its services could be used into an end-end machine learning project to speed up the data preprocessing phase.

Presentation

In this Video Recording, Vannel provides a tour and explanation of his poster content.

Poster Title: Preprocessing Bundle for HPCC Systems Machine Learning Library

Click on the poster for a larger image. 

Related content

HPCC Systems Intern Program - Class of 2017
HPCC Systems Intern Program - Class of 2017
More like this
Posters - 2022 HPCC Systems Intern Projects
Posters - 2022 HPCC Systems Intern Projects
More like this
Machine Learning Tech Talks
Machine Learning Tech Talks
More like this
Poster Presentations 2019 by HPCC Systems Interns
Poster Presentations 2019 by HPCC Systems Interns
More like this
2019 Tech Talks Catch-up
2019 Tech Talks Catch-up
More like this
2022 Poster Contest Judges
2022 Poster Contest Judges
More like this

All pages in this wiki are subject to our site usage guidelines.