...
Find out about the HPCC Systems Summer Internship Program.
Project Description
Nature has left us with some very subtle signals within data to indicate the causal relationships that generated that data. These signals can be hard to detect, prone to statistical error, and while we do have mechanisms to construct causal models from data, at this point, they tend to be brittle, expensive, and unreliable. HPCC Systems can help overcome the challenge of identifying causal signals from data by allowing far more data to be processed efficiently via complex causal discovery algorithms.
A wide variety of causal discovery algorithms have been described and implemented to date. This project will evaluate the available algorithms against mixed-data-type, real-world datasets using open-source implementations. Algorithms will be evaluated for power, practicality, and applicability to different data-types.
The work involves identifying candidate datasets, define defining appropriate analytics, perform performing causal analysis and publish publishing results. The student will design tests, perform tests, and document their results comparing various in-house and publicly available algorithms. Assessment of algorithms will be both qualitative and quantitative, and will include run-time performance as well as accuracy.
The successful candidate should have a background in mathematics and statistics, machine learning, and preferably knowledge of Causal Science, Causal algorithms and Causal analysis packages.
...