Causal Discovery Algorithms

This project was completed by a student accepted on to the 2023 HPCC Systems Intern Program.

Project Description

Nature has left us with some very subtle signals within data to indicate the causal relationships that generated that data.  These signals can be hard to detect, prone to statistical error, and while we do have mechanisms to construct causal models from data, at this point, they tend to be brittle, expensive, and unreliable. This makes the HPCC Systems Platform a natural environment for doing Causality research and application, since far more data can be processed and causal discovery algorithms can parallelize nicely, leading to much faster causal analysis results.

A wide variety of causal discovery algorithms have been described and implemented to date but remain to be thoroughly evaluated in HPCC Systems.  This project will evaluate the available algorithms against mixed-data-type, real-world datasets using open-source implementations.  Algorithms will be evaluated for power, practicality, and applicability to different data-types. 

The work involves identifying candidate datasets, defining appropriate analytics, performing causal analysis and publishing results.  The student will design tests, perform tests, and document their results comparing various algorithms. 

The successful candidate should have a background in mathematics and statistics, machine learning, and preferably knowledge of Causal Science, Causal algorithms and Causal analysis packages. 

If you are interested in this project, please contact the mentor shown below.

More information about the HPCC Systems Causality Toolkit is available in our blog Causality Toolkit.

 

All pages in this wiki are subject to our site usage guidelines.