Browse: Home, Winner and runners up, Awards Ceremony (Watch Recording / View Slides), Posters by Academic Partners, Posters by HPCC Systems Interns
...
Since January 2019, I have been surveying various log analysis techniques to detect abnormal activates on computing/networking systems in the literature as a part of my thesis supervisor’s research collaboration with LexisNexis Risk SolutionSolutions Group. During the course of my study, unsupervised anomaly detection has attracted my attention as this technique has a good potential to detect unknown cybersecurity threat. Hence, I was really excited to learn this internship opportunity, which allowed me to learn various machine learning and big data analysis technique and adopt them to implement an algorithm for this real world problem.
...
Andre Felipe Santos Martins - Federal Institute of Espírito Santo (IFES) - Campus Serra
Infrastructure Analysis of Elementary Schools in Brazil Using HPCC Systems (View larger image)
The purpose of this project was to analyse the infrastructure statistics of elementary schools in Brazil in urban and rural areas. The goal was to investigate the basic infrastructure available for all students and school dependents such as water supply, electrical network, sewage network, internet access and the availability of ramps, handrails, signage and accessible toilets for people with special needs. The dataset used was public, covering data from 2015-2018.
...
The project consists of three modules: (1) Telematics system simulation, (2) Apache Kafka message system and (3) HPCC Systems analysis system implementation.
...
HPCC Data Analysis is an ECL program which processes data based on HPCC Systems Platform. This demo fetches message data from Kafka message queue, parses these messages to the required format, cleans the unnecessary data, saves them to datasets, analyzes these datasets, and send the result to our client. There are
...
In summary, our project based on the HPCC system Systems implemented a data processing pipeline for the vehicle industry which demonstrated the big value for the industry, and the analysis result could also potentially reduce the damage of the vehicle accident resulted by human behaviors.
...
Text cleaning is becoming an essential step in text classification. Stop word removal is a crucial space-saving technique in text cleaning which saves huge amounts of space in text indexing. There are many domain-based common words which different from one domain to another and have no value within particular domain. It based on the document-collection, for example, the word "protein" would be a stop word in a collection of articles addressing bioinformatics, but not in a collection describing the events of political issues. Eliminating these words will reduce the size of the corpus and enhance the performance of text mining. In this project we used text vectors bundle (CBOW) in HPCC Systems to find the domain based common words. The idea behind using text vectors is the ability to map each unique token in the corpus to a vector of n dimension. Text vectors maps words into a high dimensional vector space such that similar words are grouped together, and the distances between words can reveal relationships. By using the vector representation of words we can find the center of space, and by finding the distance between each unique word in corpus and center we can find the domain based common words which have the shortest distance from center. To test our methodology we applied some of the commonly used text classification such as ClassificationForest before and after eliminating the common words. Eliminating domain based common words will enhance the performance of the classification methods.
...
Today, with the increasing availability of smart phones and other handheld recording devices, people are generating vast amounts of data in the form of digital audio. Yet, despite a move from analog equipment to digital plug-ins, many of the fundamental processes used in forensic sound analysis have remained relatively unchanged. This project aims to use HPCC systems Systems ECL, in tandem with TensorFlow’s Machine learning libraries, to offer a modern solution to some of the problems presented in forensic sound analysis.
The aim of this project was to design a program that can take an audio file as an input and give a description of where it was recorded as an output, by utilising a combination of machine learning techniques and convolution reverb. Mainly, I am interested in the forensic applications of this type of sound analysis system and the research conducted during this project, as well as the program itself, was explored from this perspective.
Akshar Prasad - Rashtreeya Vidyalaya College of Engineering (RVCE) Bangalore
Machine Learning Techniques to Understand Partial and Implied Data Values for the Conversion of Natural Language to SQL Queries on HPCC Systems (View larger image)
...
Application of CNN models on the transactional data for identification of anomalies and fraud has proved to be quite promising as the model itself takes care of most of the static and dynamic feature extraction hence making the detection easier than random forests. In this project we aim to solve the problem outlined above by the application of CNN model in ML for dynamic detection of fraud in stored-value cards.
Sathvik K R - Rashtreeya Vidyalaya College of Engineering (RVCE) Bangalore
Octave Plugin for HPCC Systems (View larger image)
...