/
Poster Presentation Abstracts 2019

Poster Presentation Abstracts 2019

Browse: Home, Winner and runners up, Awards Ceremony (Watch Recording / View Slides)Posters by Academic Partners, Posters by HPCC Systems Interns

Robert Kennedy - Florida Atlantic University

 1st Place Winning Entry

GPU Accelerated Neural Networks on HPCC Systems Platform (View larger image)

The aim of this project was to research and develop GPU accelerated deep learning algorithms on HPCC Systems and to incorporate many points in the project outlined in the Create HPCC Systems VM on Hyper-V project, which would greatly benefit other aspects of this proposed project both the current proposed scope and for future work.

The training process for modern deep neural networks require big data and large computational power. Though HPCC Systems excels at both of these, HPCC is limited to utilizing the CPU only. It has been shown that GPU acceleration vastly improves Deep Learning training time. This project will greatly expand upon HPCC Systems by not only being the first GPU accelerated library (to my knowledge) but also greatly expand upon HPCC System’s deep neural network capabilities.

This project provides a software library (consisting of ECL and Python code) that provides HPCC Systems GPU accelerated neural network training, as well as expand and improve the existing Deep Learning framework. Additionally, it will increase the number of configurations HPCC can be deployed on by offering a Hyper-V image. This project’s outcome would also serve as a building block for future development for different distributed configurations that were not previously available on HPCC, such as model parallelism as well as enabling HPCC to Deep Learn using asynchronous algorithms; current implementations do are synchronous and were suited to synchronous algorithms.

Alyssa Messner - Wayne State University 

 2nd Place Winning Entry

Exploring Co-occurring Mental Illness and Substance Abuse Disorder Using HPCC Systems (View larger image)

Substance use disorders and mental illness affect a large number of people in the United States. In 2017, an estimated 19.7 million individuals aged 12 or older had a substance use disorder (SUD) and 46.6 million individuals aged 18 or over had any mental illness (AMI) in the past year (SHAMSHA, 2018).

The burden of both mental illness and substance use continues to get progressively worse and overdose deaths are increasing (Pew Research Center, 2018). Anxiety/depression now tops the list of problems teenagers see among their peers, with drug addiction and alcohol use listed third and fourth (Pew Reseaarch Center, 2019). Almost half of U.S. adults report that they have either had a family member or close friend who wa addicted to drugs (Pew Research Center, 2017), while suicide was the second leading cause of death in 2017 among people aged 10 to 34, with the total suicide rate increasing over time (NIMH, 2019).

The purpose of this study is to utilise data analytics to gain a deeper understanding of mental illness and substance abuse. Using the National Survey on Drug Use and Health and the HPCC Systems Machine Learning Library, this study will explore the relationship between co-occurring diagnoses of mental illness and substance use disorder as follows:

  • Instances of co-occurring substance abuse and mental illness - Percentage with a SUD that also have AMI and vice versa

  • Incidence of mental illness by type of substance abuse used - Specific substances that have a strong relationship with mental illness.

Vannel Zeufack - Kennesaw State University 

 3rd Place Winning Entry

Unsupervised Log-based Anomaly Detection (View larger image)

Since January 2019, I have been surveying various log analysis techniques to detect abnormal activates on computing/networking systems in the literature as a part of my thesis supervisor’s research collaboration with LexisNexis Risk Solutions Group. During the course of my study, unsupervised anomaly detection has attracted my attention as this technique has a good potential to detect unknown cybersecurity threat. Hence, I was really excited to learn this internship opportunity, which allowed me to learn various machine learning and big data analysis technique and adopt them to implement an algorithm for this real world problem. 

This project is mainly based on the following paper: Experience Report: System Log Analysis for Anomaly Detection by Shilin He and Al, published in 2016 at the IEEE 27th International Symposium on Software Reliability Engineering.

Log Analysis can be divided into four main steps:

  • Log Collection: getting the raw logs

  • Log parsing: getting log templates from raw logs

  • Feature extraction: getting relevant log sequences that would be further fed to the machine learning algorithm

  • Anomaly Detection: running unsupervised learning algorithms on our extracted features. I intend to use two clustering approaches: K-Means and Hierarchical Agglomerative Clustering.

I used HPCC Systems Roxie Logs, downloadable from ECL Watch.

Yash Mishra - Clemson University
Automated provisioning and de-provisioning of HPCC Systems on Amazon Web Services (AWS) (View larger image)

Many commercial cloud resources are available to deploy high performance systems but managing these resources can be tedious. It can also be expensive when resources are run for longer than they are needed. It is also possible that developers and researchers may spend more time confirguring cluster and environments manually. This project looks at how to automate the steps involved in provisioning clusters, the execution of jobs on those clusters and the deprovisioning of the cluster in a single step when the job have completed.

Jack Fields - American Heritage School
American Heritage School Autonomous Security Robot (View larger image)

Digital threats and compromised data are growing at epic proportions, meanwhile safety concerns on school campuses and other public places is just as colossal of a concern. When it comes to safety, especially that of school aged children, it is a disturbing reality, that today, they are just not safe to be alone, online or outside. So what is the solution? By integrating HPCC Systems into our Autonomous Security Robot, we will be able to ingest data and apply advanced sorting techniques to develop a safety tool that has the ability to recognise potential risks on campus that might otherwise be missed by the human eye. 

Elimar Rodrigues Da Macena - Federal Institute of Espírito Santo (IFES) - Campus Serra
An Exploratory Analysis of Crime Patterns in Sao Paulo State (View larger image)

Intelligence led policing (ILP) refers to technology-driven crime data analysis activities to support the design of effective crime prevention and prosecution strategies. This is a new approach to fighting crime that has been gaining strength due to the convergence of two technological streams: the digitization and release of public information related to the occurrence of crimes and the development of technological platforms that allow the proper handling of such information, such as the HPCC Systems platform.

In this context, an element that can be exploited through this approach and that can be of great potential value for the public safety of a city is the understanding of the criminal mind or, more specifically, studying the patterns of victims and places of preference of criminals. Based on such knowledge, it could be possible to develop more precise actions for crime prevention and repression, as well as the development of predictive models to estimate the probability of occurrence of crime toward a specific individual profile or geographical location.

Motivated by this context, the objective of this study is to use the HPCC platform to analyze crime patterns in the state of São Paulo in Brazil between the years of 2006 and 2017. For this purpose, a public database was used. The choice of the database of the state of São Paulo is justified by the fact that it is the most populous state in Brazil, with just over 45 million inhabitants, including areas with the highest crime rates in the world. These features provide a rich crime database to be exploited by a high performance computing platform such as HPCC Systems.

The database used has information about the person who was injured by the crime occurrence, having for example information related to the characteristics of a person, such as their age, gender, profession and educational level. The database also contains information regarding the type and the location of the crime.

Based on the type of information available in this database, the analysis conducted in this work focused on the creation of patterns around victim profiles, crime concentration, time of occurrence, as well as seeking trends among crime types.

Andre Felipe Santos Martins - Federal Institute of Espírito Santo (IFES) - Campus Serra
Infrastructure Analysis of Elementary Schools in Brazil Using HPCC Systems (View larger image)

The purpose of this project was to analyse the infrastructure statistics of elementary schools in Brazil in urban and rural areas. The goal was to investigate the basic infrastructure available for all students and school dependents such as water supply, electrical network, sewage network, internet access and the availability of ramps, handrails, signage and accessible toilets for people with special needs. The dataset used was public, covering data from 2015-2018.

Huafu Hu - Georgia State University
Build Big Data Procesing Infrastructure with HPCC Systems for the Connected Car Industry (View larger image)

Vehicle traffic crashes increases more than 25 percent from 2010 to 2016 derived from National Statistic. The Average Auto insurance fee rises around 20 percent from 2008 to 2016 by 2018 National Association of Insurance Commissioners (NAIC). This project simulates the telematics system and performs a preliminary analysis to evaluation the utility of telematics data in driving habit. The analytics result of this project will benefit the business applications in vehicle insurance risk assessment and offer initial observations to explore the value of large amount of existed data in telematics system. The study will also contribute the behavior analysis in both academia and industry.

The project consists of three modules: (1) Telematics system simulation, (2) Apache Kafka message system and (3) HPCC Systems analysis system implementation.

The simulation of Telematics system consists multiple instances on google cloud. This system will generates more than 1 million trips data of 10,000 vehicles in a period of 180 days. There will be four to seven trips per day for each vehicle. The differential in the date of all the data is no more than 180 days. The clients with google instances in this system could export these real-time data to JSON messages and delivery them to Kafka message queue time by time.

Apache Kafka is an open-source stream-processing software platform. Kafka can deliver in-order, persistent, scalable messaging. It enables you to build distributed applicants. Kafka can connect real-time data from Telematics demo to HPCC Platform via Kafka Connect and provides Kafka Streams.

HPCC Data Analysis is an ECL program which processes data based on HPCC Systems Platform. This demo fetches message data from Kafka message queue, parses these messages to the required format, cleans the unnecessary