Watch recordings of all presentations made during our Community Day Summit held during October 2021. Find out more about this event and read our blog review of the event.
In addition to these presentations, we also held a Technical Poster Competition for students to present work they have completed on projects leveraging HPCC Systems. See our poster contest wiki to find out more about our 2021 Poster Contest participants and winners.
Plenary Sessions
The 8th annual HPCC Systems Community Virtual Summit began with keynotes from top industry leaders and technologists including Microsoft, BitPay and DataSeers:
Community Recognition & Poster Awards Ceremony
Trish McCall, Director Program Management & Lorraine Chapman, Consulting Business Analyst and HPCC Systems Intern Program Manager, LexisNexis Risk Solutions Group
Join us as we announce the recipients of the 2021 HPCC Systems Community Recognition and David Kan Ambassador Awards. Winners of the 2021 Poster Competition will also be unveiled.
Academia and Industry – BFFs (Best Friends Forever)
Moderator: Bahar Fardanian, Technology Evangelist, LexisNexis Risk Solutions Group
Guest Panelists:
Dawn Tatum, Director of CCSE Partnerships and Engagements, Kennesaw State University
Burcin Bozkaya, Director, Graduate Program in Data Science, New College of Florida
Geoffrey Machin, Metadata and Information Architect, Cirium
Jesse Shaw, Principal Data Scientist II, LexisNexis Risk Solutions
Industry and Academia have a long and prosperous partnership history, providing mutual benefit through project collaboration, real-world opportunities for students, and preparation for the next generation of professionals into the workforce. This partnership needs work from both sides to thrive, needing mutual respect, equal contribution, and alignment to goals and outcomes. This discussion will be focused on the opportunities and challenges of this partnership. Our guest panelists will feature industry and academic experts talking through their experience in establishing value, best practices, pitfalls, and methods to ensure a successful long-lived relationship. This panel discussion will be moderated by Bahar Fardanian, Technology Evangelist, LexisNexis Risk Solutions Group, who works closely with our community partners.
Wrap-up & Adjourn, Flavio Villanustre
LexisNexis Risk Solutions Group
Flavio closes the exciting day with a wrap-up and thank you to our Community.
Learn about the new features and enhancements in the latest HPCC Systems platform, including cloud native topics:
ROXIE Troubleshooting
Mark Kelly, LexisNexis Risk Solutions Group
ROXIE services on cloud/Troubleshooting: What changes will need to occur in the ROXIE code to run on the cloud native platform?
Hear from our ML experts on the latest machine learning libraries and algorithms available in HPCC Systems:
New Advancements to Logistic Regression and the ML Library
Lili Xu, LexisNexis Risk Solutions Group
Logistic Regression is one of the most important analytic tools in the social and natural sciences such as natural language processing and image recognition. One of our Machine Learning advancements is to renovate the current HPCC Systems Logistic Regression bundle and add the ability to handle both binary and multi-classes predictions tasks. Another advancement is to improve the performance and remove the bottlenecks of the Preprocessing bundle. The improved version is more scalable and more efficient for Big Data preprocessing tasks.
The Causality Analytics Toolkit for HPCC Systems
Roger Dev, LexisNexis Risk Solutions Group
Causal Reasoning is at the heart of most human thought and action, yet has only recently been formalized as a mathematical and scientific field of study. It is hard to conceive of achieving a true AI without such a capability. Although the science of Causality has not advanced to the threshold of AI, it can unlock capabilities that are beyond the realm of statistical observation. Current Machine Learning methods assess observational patterns, and learn to replicate the results of patterns previously detected. They make no effort to disentangle true causal effects from observed correlation. They lack the ability to respond to changes in the scenarios that generated the data, or to predict the effect of new actions on the outcome. Causal Science provides a path toward a deeper understanding of our data. It defines mechanisms that can separate causal influences from spurious correlation and infer causal effects from observational data. As these techniques evolve, they stand to revolutionize our understanding and uses of data. Causality 2021 is an HPCC Systems research and development program. The goal is to increase our understanding of the latest causal algorithms, assess and challenge the current state-of-the art, and develop a Causality Toolkit for HPCC Systems Platform. This project encompasses all three levels of the "Ladder of Causality": “Seeing”, “Doing”, and “Imagining”, as well as Causal Model Validation, and Causal Discovery. This project includes work from three interns who joined the HPCC Systems Intern Program in 2021.
The Forecast of COVID-19 Spread Risk at The County Level
Murtadha Hssayeni, Florida Atlantic University
The early detection of the coronavirus disease 2019 (COVID-19) outbreak is important to save people's lives and restart the economy quickly and safely. People's social behavior, reflected in their mobility data, plays a major role in spreading the disease. Therefore, we used the daily mobility data aggregated at the county level beside COVID-19 statistics and demographic information for short-term forecasting of COVID-19 outbreaks in the United States. The daily data are fed to a deep learning model based on Long Short-Term Memory (LSTM) to predict the accumulated number of COVID-19 cases in the next two weeks. A significant average correlation was achieved (r=0.83 (p=0.005)) between the model predicted and actual accumulated cases in the interval from August 1, 2020 until January 22, 2021. The model predictions had r > 0.7 for 87% of the counties across the United States. A lower correlation was reported for the counties with total cases of <1,000 during the test interval. The average mean absolute error (MAE) was 605.4 and decreased with a decrease in the total number of cases during the testing interval. The model was able to capture the effect of government responses on COVID-19 cases. Also, it was able to capture the effect of age demographics on the COVID-19 spread. It showed that the average daily cases decreased with a decrease in the retiree percentage and increased with an increase in the young percentage. Lessons learned from this study not only can help with managing the COVID-19 pandemic but also help with early and effective management of possible future pandemics. The project used the HPCC Systems platform for collecting, hosting, and analyzing the data.
Learn about efficient and secure ways for handling your data, analytics as well as cool tools and extensions:
Design Considerations for Migrating Your HPCC Systems Data Lake to the Cloud
Krishna Turlapathi & Michael Gardner, LexisNexis Risk Solutions Group
During this session, we share lessons learned and design best practices through our own cloud migration experience. The beginning of our presentation is a simple installation of our cluster on Azure using the community helm charts. During this demo we hit topics such as how the HPCC Systems platform differs between the Kubernetes cluster that we are deploying and the bare metal installations that community members are familiar with. Dive into helm for HPCC Systems, the value of .yaml files and a few different ways that the cluster can be configured and explain storage in the cloud compared to bare metal. Then learn about ROXIE and Thor usage in the cloud. Krishna covers some details about getting query lists, suspended queries, and doing package file deployments. Michael expands on basic security features that end users will want to enable in the cloud, including encryption in transit and at rest in a cloud environment such as Azure.
Terasort with HPCC Systems on Azure Kubernetes Service and High Performance Storage
Shrikrishna Khose & Steve Griffith, Microsoft
The speakers discuss challenges, AKS considerations and storage options, including a demo covering the setup and configuration of HPCC Systems on AKS with Blob NFS 3.0 and performing a Terasort.
Taming the Data Demon with the DataSeers HPCC Systems Appliance
Gurjot Bandasha & Adwait Joshi, DataSeers
The core of any data solution lies in data management. What is needed is a solution that will integrate and coordinate compliance, reconciliation, fraud monitoring, and visualization. Hear from the DataSeers experts how they are helping companies in the FinTech and Banking industry to manage money, fight fraud and maintain compliance using a solution built from the ground up leveraging HPCC Systems.
Hear success stories on how HPCC Systems is being used in the industry and academia in proven solutions:
Cooperative actions between University of São Paulo and LexisNexis Risk Solutions
Renato de Oliveira Moraes, University of São Paulo
Prof. Renato discusses the successful conjoint initiatives being held between University of São Paulo (USP) and LexisNexis Risk Solutions in Brazil for leveraging HPCC Systems for teaching & learning, research and extensions activities in academia, including recent machine learning projects.
Processing Student Image Data with Kubernetes and HPCC Systems GNN on the Cloud
Carina Wang, American Heritage School and HPCC Systems Intern 2021
In order to foster a safe learning environment, measures to bolster campus security have emerged as a top priority around the world. In this session, I will share how HPCC Systems was leveraged to process student images with Kubernetes running on the Cloud Native Platform while utilizing the Generalized Neural Network (GNN) bundle for image classification. The result is a trained model which can be implemented on the autonomous security robot we built to help campus security personnel identify visitors, students, and staff.
Athlete 360: Leveraging HPCC Systems and RealBI for Athlete Wellness and Performance
Christopher Connelly, North Carolina State University and HPCC Systems Intern 2021
There is a lot that plays into an athlete being able to perform at their best when it matters most. Not only are there physical demands, but factors that come from outside of their sport that affect their wellbeing and readiness to perform. In team sports, there are many external variables that cannot be controlled, which makes the process of gauging performance of individual athletes difficult. The better the understanding of what an athlete does and how their body responds, the better we can support them to be at their best. Within collegiate athletics, and sports in general, there is a struggle to be able to interpret data from different streams together in a single report. Furthermore, streamlined data collection, can further aid our understanding of what an athlete does and how their body responds. This involves data from all aspects of an athlete’s day including wellness questionnaires, practice training loads, weight room training loads, and weight room assessments of strength, power, and fatigue. In the past we have shown the impact of using HPCC Systems with the NC State Men’s soccer team. Here you will see some solutions using HPCC Systems and RealBI to provide insight from data collected with the NC State Women's basketball team as well as how this system can serve not only the Strength and Conditioning department, but the athletics department as a whole.
Our technical engineers explain how to complete specific tasks for configuring and using your HPCC Systems platform:
A Simple HPCC Systems Cloud Deployment for Open-Source Users
Xiaoming Wang & Godson Fortil, LexisNexis Risk Solutions Group
This talk will cover how to setup and implement a basic HPCC Systems cluster in the cloud using Azure Kubernetes. We will walk through the deployment configuration leveraging Terraform, GitOps/Flux2 and storage settings. Note: This talk is intended for the wider open source HPCC Systems community. It is advised to check with your organization for any specific security protocols.
All About the HPCC Systems Metrics Framework
Ken Rowland, LexisNexis Risk Solutions Group
This presentation is for anyone interested in HPCC Systems metrics. It covers a description of the metrics framework and how its components operate, a brief explanation of how HPCC Systems components are instrumented for metric collection, configuration using helm charts, and a discussion of how HPCC Systems is planning on using metrics in areas of cluster health and scaling.
HPCC Systems Logging in the Cloud and an Elastic Stack Solution
Greg Panagiotatos & Rodrigo Pastrana, LexisNexis Risk Solutions Group
As HPCC Systems continues its journey to the cloud, one major challenge faced is the ephemeral nature of log data and the accessibility of distributed application-level logs. This presentation discusses these challenges, the HPCC Systems logging architecture, and a simple Elastic Stack-based solution to the challenge. We'll demonstrate in detail the end-to- end solution, which includes Helm-based deployment, Kibana configuration, HPCC Systems log exploration, querying, and filtering. We'll also discuss an advanced topic that improves log data query performance by utilizing Elastic Search Ingest Pipelines. Finally, we'll touch on other possible solutions such as Azure Log Analytics.
A Customized HPCC Systems Cloud Deployment for Your Company
Jon Burger, LexisNexis Risk Solutions Group
Jon discusses the approach and methods used in a custom Azure cloud deployment, based on our own company experience. This includes how to perform robust deployments of HPCC Systems environments on the cloud, with a focus on security, maintainability and supportability, and using the precepts of zero touch and zero trust.