/
Sarah Nash - 2023 Poster Contest Resources

Sarah Nash - 2023 Poster Contest Resources

Sarah is a Master's student entering her second year of studying Applied Data Science at New College of Florida.

Poster Abstract

Causality is a growing field centered around detecting cause and effect relationships within observational data. It is a common saying that “correlation does not imply causation.” However, correlation does imply causation — perhaps in a different way than we expected, and between different variables than we were originally observing. Causal discovery and validation methods are focused at uncovering the causal relationships within large datasets, determining where cause and effect relationships appear between the variables. When there is a causal relationship generating our data, there are subtle hints left behind marking the existence of that relationship. By tuning in to these signals, we are able to discover causal relations from that data with the help of a number of different causal discovery methods. The HPCC Systems Causality Framework “Because” is a toolkit for multiple areas of causal analysis, including discovery and validation. The discovery algorithms previously implemented in the toolkit are mainly compatible with two data types: continuous numeric and discrete numeric. This project’s focus is to expand the discovery portion of the toolkit to additionally handle the remaining data type: categorical data.

This task had multiple parts:

  • Creating a framework for generating categorical data

  • Implementation of a particular model, the Uniform Channel Model, within the Causality toolkit

  • Testing with data generated from the Synthetic Data Generation subpackage of the

  • Causality toolkitTesting with real CDC data

In all, we were able to determine strengths and weaknesses of this particular model through various tests, as well as areas for improvement within the Causality toolkit. By the end of this project, we have created a foundation for generating and testing categorical or binary data, expanded the toolkit’s causal analysis capabilities, and made room for additional categorical discovery methods to be added in the future

Presentation

In this Video Recording, Sarah provides a tour and explanation of his poster content.

Causal Discovery and Validation with Categorical Data

Click on the poster for a larger image.

All pages in this wiki are subject to our site usage guidelines.