Mohammed Hamza - 2024 Poster Contest Resources
My name is Mohammed Hamza and I am a 2nd year student at RV University studying Computer Science. I have recently found my love for applying machine learning to satisfy my curiosities that lie around my interests. Whether it is Football player rating prediction or water quality prediction I have worked on many different problems that help me and my environment around me. I have also recently won the HPCC systems hackathon and have a good grasp of using ECL alongside/parallel to my primary programming language, python. With this competition I intend to improve my understanding of HPCC and the use of its various bundles and libraries that would help me, as mentioned above, solve real world problems. |
Poster Abstract
The rapid advancement in nanotechnology has led to the expansion of nanomaterials research with diverse properties and applications, ranging from medicine and electronics to environmental protection. These materials, defined by their nanoscale dimensions, exhibit unique physical, chemical, and biological characteristics that distinguish them from their bulk counterparts. Vast amounts of unstructured data are available in the scientific community in the form of images,
tables, and text within research articles, making the extraction of meaningful data challenging for researchers. Various classes of nanomaterials have very specific properties that make them applicable to specific industries. Nanomaterials data is available in various forms, with images being the most significant contributor. These images are crucial for identifying the type of nanomaterials.
We plan to use a publicly available, human-annotated dataset specifically curated for the classification of nanomaterials from scanning electron microscope (SEM) images. The dataset comprises 18,577 high-resolution SEM images, categorized into ten distinct classes based on their characteristics. These categories include porous sponges, patterned surfaces, particles, films coated surfaces, powders, tips, nanowires, biologicals, MEMS devices, and fibers. Each category encompasses unique structural details crucial for various applications, such as absorption, filtration, electronics, pharmaceuticals, and biotechnology.
To achieve efficient classification, we will employ HPCC ECL Machine Learning, making use of its robust data handling and parallel processing capabilities. The steps planned are:
Data Preprocessing:
We will use HPCC ECL scripts to preprocess the SEM image data, including image resizing, pixel value normalization, and data augmentation techniques such as rotation and flipping to enhance model robustness.
Feature Extraction:
We will utilize the HPCC Systems Image Library for feature extraction. It is capable of performing image processing operations such as texture, shape, and structural patterns which are necessary for classification.
Model Selection:
Model selection will be implemented using HPCC-ML. We will fine-tune a pretrained Convolutional Neural Network (CNN) model using the HPCC Machine Learning library. Pretrained models like ResNet or VGG, integrated within HPCC Systems, will be adapted for the specific problem of classifying nanomaterials. HPCC ML library uses Keras API for these models which are built into HPCC nodes.
Training and Validation:
Training and validation will be monitored and executed using ECL Watch. The annotated dataset will be processed using HPCC Systems' distributed computing capabilities for efficient handling of large datasets.
Evaluation:
Evaluation will be conducted using ECL Workunits and ROXIE. Model performance will be assessed through standard metrics such as accuracy, precision, recall, and F1-score. ECL Workunits will help with batch processing and modeling, while ROXIE (Rapid Online XML Inquiry Engine) will allow real-time querying and validation with new data.
Our method using HPCC ECL for Machine Learning is designed to improve the speed and accuracy of identifying nanomaterials in SEM images. This progress will help speed up research in nanotechnology by offering a tool for organizing and retrieving data. HPCC Systems allows for scalability and effectiveness managing datasets and intricate calculations, with ease
Presentation
In this Video Recording, Mohammed provides a tour and explanation of his poster content.
Classification of Nanomaterials:
Click on the poster for a larger image.
All pages in this wiki are subject to our site usage guidelines.