Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Browse Poster Wiki: 2022 Poster Award Winners, Awards Ceremony (Watch Recording from minute marker 1630), Posters by 2022 HPCC Systems InternsPosters by Academic Partners, Poster Judges, About Virtual Judging, 2022 Poster Contest Home Page, Poster Contest Previous Years

Image Modified

Arya Adesh is studying for the Bachelor of Computer Science and Engineering at RVCE in India. Arya suggested this project idea himself, producing a proposal to complete a piece of research in an area of interest to him. His proposal was accepted and he joined the 2022 HPCC Systems Intern Program to complete this research, contributing a new anomaly detection algorithm to the HPCC Systems Machine Learning Library as one of the deliverables.

As well as the resources included here, read Arya's intern blog journal which includes a more in depth look of his work. 

Poster Abstract

Anomaly detection is the process of finding unexpected abnormalities in the dataset. Anomalies are rare occurrences that differ from the norm. An anomaly in a real-time dataset may indicate critical incidents like bank frauds, data compromise, infrastructure failure, and other deviations. Hence it is critical to identify such anomalies for further action. Local Outlier Factor(LOF) is an unsupervised anomaly detection method that identifies anomalies without training. It is a density-based anomaly detection algorithm that assigns a degree of outlier-ness (called Local outlier factor) to each point in the dataset. LOF can find both global and Local Outliers. Local anomalies are points that are outlying with respect to their neighbors. Other anomaly detection algorithms accurately find global anomalies, however, they fail to identify local outliers as they assume the dataset to exhibit uniform data distribution. LOF is most suitable for uneven distribution datasets as it doesn’t make assumptions about the distribution. It can identify both global outliers (outlying with respect to all the points in the dataset) and local outliers. The determination of outliers is based on the density between each data point and its neighbor points.

...