Sudershan is currently in his 6th semester, pursuing a Bachelor of Engineering in Computer Science and Engineering at RV College of engineering in Bengaluru, India. He is very enthusiastic and has a positive attitude and approach. He is currently working on various R&D projects.

Poster Abstract

Introduction:

Multi-Node computation, also known as distributed computing, is a paradigm that allows for the efficient utilization of multiple interconnected nodes or machines to perform complex computational tasks. Some core ECL functions, like the ones handling Learning Tree algorithms of Machine Learning Bundle are recursive in nature and hence high computational time is needed. This project aims to achieve optimized processing of ML learning tree algorithms in ECL through embedding python libraries. Execution time and Accuracy were the two important parameters based on which our approach could be effectively measured against the existing Learning tree ECL functions.

Objective:

The current ML bundle of ECL language implements Learning Tree based algorithms through the LOOP function which is essentially recursive in nature and hence while being accurate faces an issue of high computational complexity. The proposed project discusses the method of improving the efficiency of this ML bundle through decreasing the cluster runtime using the EMBED function to use the built-in machine learning libraries of Python language. The accuracy is expected to be around the same as already provided by the existing ECL bundle.

Methodology:

Concepts of multi-node processing and multi-threading are effectively made use of in this project. Firstly, we spray the dataset onto the nodes using delimited spraying. Through the EMBED function we use the concept of multithreading in Python to allow the user to specify the number of threads to perform the Learning Tree function on each node. In addition, the user can also specify the number of trees to be created for each dataset to minimize errors. Regression and Classification are handled through functions DecisionTreeRegressor and DecisionTreeClassifier of the Scikit-Learn library respectively. The final prediction is made by taking the mean of all predictions for Regression data and taking the mode of predictions for Classification data. The final prediction is returned back through EMBED along with the model used for prediction. We tested the proposed solution with multiple classification and regression datasets on both single and multi-node systems and we were able to achieve a much significantly lower execution time maintaining around the same accuracy as the Learning tree ECL functions.

Presentation

In this Video Recording, Sudershan provides a tour and explanation of his poster content.

Performance optimization of Learning Trees modules in the ECL repository

Click on the poster for a larger image.

HPCC

Sudershan K S - 2023 Poster Contest Resources

Poster Abstract

Presentation

Performance optimization of Learning Trees modules in the ECL repository

Related content