Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Project Description
SVD has many applications. For example, SVD could be applied to natural language processing for latent semantic analysis (LSA). LSA starts with a matrix whose rows represent words, columns represent documents, and matrix values (elements) are counts of the word in the document. It then applies SVD to the input matrix, and uses a subset of most significant singular vectors and corresponding singular values to map words and documents into a new space, called ‘latent semantic space’, where documents are placed near each other measured by co-occurrence of words, even if those words never co-occurred in the training corpus.
The LSA’s notion of term-document similarity can be applied to information retrieval, creating a system known as Latent Semantic Indexing (LSI). An LSI system calculates similarity several terms provided in a query have with
documents by creating k-dimensional query vector as a sum of k-dimensional vector representations of individual terms, and comparing it to the k-dimensional document vectors.

The implementation can be done completely in the ECL language and only a knowledge of ECL and distributed computing techniques is required. 

...

Mentor

John Holt
Contact details: Contact Details

Skills needed
  • Knowledge of ECL. Training manuals and online courses are available on the HPCC Systems website.
  • Knowledge of distributed computing techniques
Deliverables
  • Test code demonstrating the correctness and performance of the algorithm.
  • Supporting documentation.
Other resources