Kennesaw State University Hackathon 2018

HPCC Systems is a sponsor this event as part of our academic program. It is hosted by the College of Computing and Software Engineering at KSU.

We also took part in this event in 2017. For more information about how it went, read our blog post: Fly on the wall - Our first Hackathon.

Date of the event

September 12 - 15, 2018

Location

Marietta Campus, J/Atrium Building

Cost

Free

Eligibility

This event is open to all KSU CCSE students (ACS, BASIT, IT, CS, SWE, and CGDD) who have passed their first few programming courses. Graduate students who have exempted all transitional courses or have passed at least three 5000 transition courses are also eligible.

Registration 

More information about the event

HPCC Systems Hackathon Team

Big Data Analytics on HPCC Systems

Are you interested in how analytics can identify trends which help businesses from a wide range of markets improve their decision making?

Join our team and learn how to use the open source HPCC Systems platform and ECL-ML machine learning libraries, to build predictive models in the financial industry.

Challenge

You will be presented with two problems, you can choose either of the one. Both are real world problems and you will be working with real dataset.

Problem 1

You have been given a dataset of users on a banking platform. This dataset contains about a 100 "bad actors" a.k.k individuals on an ofac sanctions list. Given the entire data set of data and a full ofac list you would have to write an algorithm to detect the bad actors in the main dataset.

Problem 2

Merchant name cleaning and grouping is fairly common challenge in payments industry. You have been given a transactional dataset which contains transactions performed at various merchants. Given dataset contains inconsistent format of merchant names. For example, ‘walmart’, ‘wal-mart’, wal mart etc. However, all these merchants are really the same. We think using a combination of data cleaning technique and machine learning algorithm you will be required to cluster the merchant name together in the best possible way. To make your machine learning algorithm more efficient you could use the other fields provided to besides the merchant name.



Data Source

TBA

General Instructions

  1. Please follow all the guidelines specified by the data provider.

  2. Provide a brief design/tech document describing the proposed solution (2-3 pages max).

  3. Provide the bio of all participants and their roles on the project.

  4. Use the HPCC Systems platform for executing the solution.

Rating Criteria

  1. Innovation involved

  2. Understanding of Big Data patterns and its application to HPCC Systems.

  3. Team work and execution

  4. Presentation quality

Mentors available during the Hackathon

The following mentors will be on site for the duration of the event: Dan Camper, Arjuna Chala and Richard Taylor, as well as our community users from DataSeers, Adwait Joshi and Gurjot Bandasha.

The following mentors will be available to give assistance remotely: Roger Dev.

More information about our mentors and how to contact them

Slack Channel

https://join.slack.com/t/ksuccsehackathon/shared_invite/enQtNDI4MzUwMjExNzk5LTY5ODQ0ZTNjNzliZmMzYmVjZGM3ZGE3MTNkZWFkZWVkZDNhZTE1ODUyZDkxNDg3ZWM0MjdkM2NmOGNjZWMwNmM

What can I do to prepare?

We have many resources available for you to use to familiarise yourself with our technology and how to use it. If you are new to us, then take some time to find out about HPCC Systems and what we do. Find out how HPCC Systems works, what ECL is and take a look at what goes on in our community.

  • Watch a quick overview video about HPCC Systems

  • Download the HPCC Systems VM. Select the operating systems you are using first and then check the VM download. Follow the installation guide instructions.

    • Note:  We will be using a cloud-based HPCC Systems cluster for the hackathon.

  • You can use your preferred editor to write code but we do have our own, a Windows-based ECL IDE which you can download. On the download page, select Gold and under Operating System, select Windows. Download both the ECL IDE and Client Tools.

  • VS Code is a good code editor if you don't use Windows.  Installation is slightly more complicated:

    1. Download and install VS Code from here if you don't already have it installed.

    2. Download the HPCC Systems Client Tools from here.

      1. Choose your operating system from the popup list.

      2. Choose the appropriate "Client Tools" option for your operating system.  Make sure only one checkbox on the entire page is selected.

      3. Download and install.

    3. Launch VS Code, then search for and install the extension named "ECL (Enterprise Control Language) support for Visual Studio Code".

  • Once you’re up and running, try out a few examples from the installation guide and tutorials.

  • Learn some ECL. This is the language used to write queries. It's easy to use, try it for yourself. Read the documentation or take a training course.

  • Take a look at some video tutorials

  • Take a look at the information and training examples in this GitHub repository. In particular, please look at the Taxi_Tutorial where you will find the DataSeers contribution which provides examples showing some basic ECL functionality in action.

  • Take a look at our Machine Learning Documentation and Sources.

I have questions, who do I ask?

All pages in this wiki are subject to our site usage guidelines.