HPCC Systems Community Day 2022
Watch recordings of all presentations made during our Community Day Summit held during October 2022. Find out more about this event and read our blog review of the event.
In addition to these presentations, we also held a Technical Poster Competition for students to present work they have completed on projects leveraging HPCC Systems. See our poster contest wiki to find out more about our 2022 Poster Contest participants and winners.
Resources
Presentation Tracks and Content
Plenary Sessions
The 9th annual HPCC Systems Community Virtual Summit began with keynotes from top industry leaders and technologists including speakers from LexisNexis Risk Solutions including Flavio Villanustre, SVP Technology and CISO, and Gavin Halliday, Enterprise/Lead Architect. We also welcome our community keynote speakers: Adwait Joshi, Chief Seer, DataSeers; and our HPCC Systems academic partner from RV College of Engineering Dr G. Shobha, Professor, and Rohan Maheshwari, CSE student.
Welcome and Opening Plenary
Flavio Villanustre, SVP Technology and CISO, and Gavin Halliday, Enterprise/Lead Architect, Adwait Joshi, Chief Seer, DataSeers, and our HPCC Systems academic partners from RV College of Engineering, Dr G. Shobha and Rohan Maheshwari, CSE student.After Flavio's energizing welcome and reveal of the all new HPCC Systems "electric tech" reboot, he introduces our keynote speakers.
• Adwait shares how his company has a successful business that has seen huge growth in a very short time with the help of HPCC Systems.
• The HPCC Systems Centre of Excellence on Cognitive Intelligent Systems for Sustainable solutions was established by RV College of Engineering in June 2022. Hear from Dr Shobha on how her staff has leveraged HPCC Systems in the classroom for a number of applications on Big Data analysis. Rohan will then share one example on how he used HPCC Systems to investigate block data stored on the blockchain to gain insight on transactions and underlying user behavior to help find criminals utilizing the bitcoin network for illicit activities.
• Gavin wraps up the plenary with an overview of the features that have been added to the platform to improve productivity and help the journey to the cloud, and what changes are on the horizon.
Closing Plenary Sessions
This was a wonderful community keynote session on how we are partnering with CodeDay on fun and exciting hackathons to attract students and hire talent, then also hear the announcement of the 2022 community award and poster competition winners before Flavio Villanustre closes out the day’s event. We also welcomed our community keynote speaker Tyler Menezes, Executive Director, CodeDay.
LexisNexis Risk Solutions speakers include Bahar Fardanian, Technology Evangelist, Lorraine Chapman, Manager Business Analyst, Trish McCall, Sr Director Program Management, and Flavio Villanustre, SVP Technology & CISO
Cloud Deployment
Come learn about building and maintaining an optimal Cloud Native HPCC Systems platforms:
HPCC Systems Platform Cloud Build and Deployment Pipeline
Xiaoming Wang, Godji Fortil & Michael Gardner, LexisNexis Risk Solutions
With the continued focus on cloud integration for the HPCC Systems platform, our team has made exciting progress in the way we build, develop, and deploy HPCC Systems platform clusters. Join us as we take you through a full development pipeline; from building the Github repository directly in docker containers that are easily accessible to outside developers, optimizing builds to take full advantage of ML, GNN and GNN+GPU stacks in cloud environments like Azure, and finally deployment of your builds onto your cloud infrastructure using HPCC Systems platform Terraform modules.Cost savings on Cloud Native Systems Based on ECL Metrics
Shamser Ahmed, LexisNexis Risk Solutions
Finding opportunities for cost savings starts with understanding where the costs are being incurred and the size of these costs. HPCC Platforms cloud-native system produces costs metrics that will be invaluable for developers to maximize cost savings. This presentation will discuss the cost information available through the Platform, how they may be used to minimize cost and the anticipated enhancements of this feature.Advances in Containerized HPCC Systems Logging
Rodrigo Pastrana, LexisNexis Risk Solutions
Learn about the newly enhanced toolset designed to unlock the power of logging analytics for HPCC Systems containerized environments. Exciting new usability features allow HPCC Systems users and admins to plug-and-play powerful log processing platforms such as Elastic Stack, and Azure Log Analytics.
Cloud Delivery
Hear more about delivering your data via ROXIE and Cloud Native:
Investigating ROXIE Queries
Krishna Turlaparthi, LexisNexis Risk Solutions
This talk will discuss ROXIE architecture and how to understand more about ROXIE query performance by looking at various fields in the ROXIE logs and some new stats that have recently been added. An example of collecting and analyzing stats for a completed query will be provided. Some interesting information will be presented about our own experiences with containers and an important ROXIE optimization about graph dependencies and how that affects when activities can start.ROXIE Migration Tool
XHarsh Desai, Sathish Kumar Seenivasan, Rajeev Rajvaidya, Pradheesh Moorthy, LexisNexis Risk Solutions
During the process of system/process migration on any Infra (Cloud/Bare Metal) the utmost important step is testing. There are numerous of process that needs to be repeated until we achieve correct results. Testing manually and repeating process manually for a large project is tiresome and error prone. As a result we have built the ROXIE Migration Tool which will assist in these roles and moreover it can be used redundantly over any products and it does not restricts itself to the Roxie Migration only. In essence, it can scale to any segment in regular Roxie development. This tool will not only help in reduction of efforts ,but will relatively reduce error, helping to save revenue . With the help of RMT, repeated task can be accomplished in few minutes - helping to increase in overall lead time.Journey to the Cloud – What every ECL Developer Should Know & Common Misconceptions when using HPCC Systems Cloud Native
Bob Foreman & Hugo Watanuki, LexisNexis Risk Solutions
The transition from a traditional bare metal environment to a containerized or cloud based HPCC Systems platform should be transparent. Once the platform is set up, configured, and properly aligned for the types of work being done, it should be veritably the same as a bare metal installation. This could vary significantly depending on the developer’s current role in the product life cycle. This talk targets the specifics to date regarding what every ECL developer should know and explores and answers the common questions we have received in the community, and also highlights some issues that you might have overlooked, like monitoring and cost control.
Developer's Corner
Talks will focus on ECL and Tips and Tricks using the HPCC Systems platform:
ECL Source Code Control with Git
Greg Panagiotatos, LexisNexis Risk Solutions
For many years, HPCC Systems has used the eclccserver to compile ECL code from GitHub repositories, using the Githook mechanism. The latest version of the platform contains enhancements to improve the capabilities and make it easy to use GitHub. The new multiple repository feature allows teams to develop code in their own independent repositories but have versioned dependencies between those repositories. Come learn how to use git with HPCC Systems and see the recent changes to the platform that significantly improve support for compiling from the Git repositories.ECL Notebooks
Jim De Fabia, LexisNexis Risk Solutions
HPCC Systems now supports ECL Notebooks in VS-Code with the ECL Language Extension. ECL Notebooks are useful for interactive tutorials, demos, proofs-of-concept, and visualizations. ECL Notebooks allow you to create and share documents that contain narrative text and cells with “live” ECL code that other users can edit and run. ECL Notebooks are like Jupyter Notebooks, but they support ECL, and render and run inside of VS-Code (with the ECL extension). This presentation will demonstrate how to create and use ECL Notebooks.ECLS Scanner Tool
Rahul Jain, LexisNexis Risk Solutions
What is the ECLS tool? The ECL Scanner tool is currently in Beta version, but it is more than just a working model. It’s about an idea with some work in progress already. This tool is developed in C# .NET which connects to ECL GIT Repos for scanning .ecl files for unwanted imports. It is eventually an .exe file and hence can be installed on any local system. The goal of this presentation is to leverage this idea and current working methodology to help the HPCC Systems ECL developer community to save on code clean up time.
Beyond HPCC Systems
Presenters will share and discuss tools and techniques that extend the power of HPCC Systems:
Game Changer: HPCC Systems to Power BI Automated Connectivity / Interfacing MongoDB into ECL
Lee Saunders, LexisNexis Risk Solutions / Jack Del Vecchio, Miami University
This session includes two great talks on how you can use HPCC Systems with different data sources. First up, hear from Lee Saunders on how his team is using Power BI. Then Jack Del Vecchio will talk about his summer internship project interfacing MongoDB with ECL. A session not to miss! Excerpt from Lee: Are you currently manually de-spraying CSV files to keep your Power BI reports up-to-date? Once upon a time our team was doing just that spending 5 days a month updating reports, now it takes minutes. In this talk we will show you:A process to set up and publish ECL queries to generate a REST URL (JSON output).
How to link your Power BI reports for direct data refresh.
Demonstrate how gateways can allow a scheduled refresh of this data, removing the need for any manual intervention in report updates.
Tombolo and Real BI
Dan Camper & Yadhap Dahal, LexisNexis Risk Solutions
Tombolo is a data lake curation system, designed to work closely with the HPCC Systems platform. Tombolo provides tracking, analysis, and documentation for all resources in your data lake. Real BI is a dashboarding/visualization tool built to support both data and queries living in an HPCC Systems cluster. In this talk, Dan Camper will focus on recent developments in these tools, both created and maintained by the HPCC Systems Solutions Lab group. Yadhap Dahal will close the session with a deep dive demo on Tombolo.Processing Radiology Reports with NLP
David de Hilster, LexisNexis Risk Solutions, Dr. Amy Apon, Ashton Williamson & Dhruvisha Patel, Clemson University
Radiology reports consist of observations of medical images by a medical clinicians. In order to properly understand these reports, computers have to combine their knowledge of the human body with linguistic knowledge about how clinicians describe images of the body in order to understand what are the state of body parts in the image. The ultimate goal is to standardize radiology description so computers can detect what is normal and what is not from the medical dictations.
Machine Learning
Hear from our ML experts on the latest machine learning libraries and algorithms available in HPCC Systems:
Analysis on Medical Images / Implementation of Local Outlier factor Algorithm for Anomaly Detection in HPCC Systems
Sarvesh Prabhu, Lambert High School / Arya Adesh, RVCE
If you are a machine learning fanatic, you won't want to miss this session including two very interesting talks from Sarvesh Prabhu and Arya Adesh who interned with HPCC Systems this summer.PyHPCC
Amila De Silva, LexisNexis Risk Solutions
This talk will highlight the functionality and uses of PyHPCC—a Python package that enables users to interact with HPCC Systems. PyHPCC is relevant for technologists working with HPCC Systems and offers a convenient approach for beginners and community members when interacting with HPCC Systems. They can temporarily bypass the complexities of learning a new language like ECL and work on HPCC Systems. Diversifying the capabilities of HPCC Systems using wrappers like PyHPCC will further promote the use and ease of use of HPCC Systems within RSG and externally.Machine Learning and Analytics Update / Visual Analysis of Data Relationships
Roger Dev, LexisNexis Risk Solutions
In this talk I will discuss our latest developments in the Machine Learning and Analytics Library. Specifically: Gaussian Process Regression – A non-linear regression bundle, accelerated using Random Fourier Features. This is a flexible kernel-based regression method, with enhanced scalability. HPCC Systems Causality Toolkit – Our official release of the Causality toolkit. Provides synthetic data generation and powerful probability analysis, as well as causal model analysis, and causal inference. Then, I will move into visual analysis of data relationship. The Joint Probability Space of a multi-variate dataset is a remarkably complicated object, encompassing everything that is knowable from the data alone. We use the powerful analytic and visualization capabilities of the HPCC Systems Causality Toolkit to examine the nature of probabilistic and causal relationships within datasets. We start by examining synthetic data with known relationships in order to recognize essential patterns. Then we move on to natural datasets to see how those patterns can be recognized, and how they differ from those observed in synthetic data.
ECL Workshops
A three-hour workshop, ECL Alive! - a Deep-Dive into Definitive HPCC Systems (Vol II), is for HPCC Systems Summit attendees who want to expand their knowledge of the HPCC Systems platform and ECL in three different phases. The workshop will take students through ECL expert Richard Taylor’s upcoming new book: Definitive HPCC Systems (Vol II): Data Transformation and Delivery
Definitive HPCC Systems (Vol II): Data Transformation and Delivery
Bob Foreman, Software Engineering Lead, LexisNexis Risk Solutions, and Hugo Watanuki, Senior Software Engineer, LexisNexis Risk Solutions
Definitive HPCC Systems (Vol II): Data Transformation and Delivery - This volume teaches the fundamentals of the ECL programming language used by the HPCC Systems Platform, allowing the reader to quickly get up to speed. ECL is a very terse and expressive language, specifically designed for working with huge amounts of data. Data ingest, transformation, manipulation, and delivery are the key topics along with code examples of specific ECL techniques to accomplish many relatively common tasks. Code examples and lesson materials will be included. Attendees can attend any one-hour session or the whole workshop. All three sessions will be recorded for playback.
Part 1 – ETL with ECL
Data Ingest, Profiling, Hygiene, Standardization, ExportingData Delivery with ROXIE
Distilling the Product, Indexing and PublishingPart 3 – The ECL Cookbook
File Tips and Tricks, INDEX Tips, String and Date Tricks, and more!
All pages in this wiki are subject to our site usage guidelines.