Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This project is available as an internship was completed as a intern opportunity with HPCC Systems this summer.

Find out more about the HPCC Summer Internship Program.
Deadline for machine learning project proposals - Friday March 25th 2016

Curious about other in 2016. Curious about projects we are offering for future internships? Take a look at our Ideas ListDeadline for non-machine learning project proposals - Friday April 15th 2016

Find out about the HPCC Systems Summer Internship Program.

Project Description

SVD has many applications. For example, SVD could be applied to natural language processing for latent semantic analysis (LSA). LSA starts with a matrix whose rows represent words, columns represent documents, and matrix values (elements) are counts of the word in the document. It then applies SVD to the input matrix, and uses a subset of most significant singular vectors and corresponding singular values to map words and documents into a new space, called ‘latent semantic space’, where documents are placed near each other measured by co-occurrence of words, even if those words never co-occurred in the training corpus.

...

  • Written the ECL needed to process the text documents into a dataset of term vectors.
Mentor

John Holt
Contact Details

Backup Mentor: Edin Muharemagic
Contact Details  

Skills needed
  • Knowledge of ECL. Training manuals and online courses are available on the HPCC Systems website.
  • Knowledge of distributed computing techniques
Deliverables
  • Test code demonstrating the correctness and performance of the algorithm.
  • Supporting documentation.
Other resources