Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 17 Next »

Project Description

The HPCC Systems platform currently supports embedding Python, Java, Javascript, R, MySQL and C++ and Cassandra code.The goal of this project is to support:

  • Additional embeddable languages (such as Scala, Haskell, Clojure, Julia, SAS, MatLab etc.)
  • Additional database queries (MongoDB, Postgres, MariaDB etc).
  • Any others not already implemented that you are familiar with and would like us to support.

We are also looking at using similar techniques to provide simple interfaces to some value stores like Ceph, S3 and Kafka (though they may not look like embedded languages). This will allow native reads and writes directly from and to the Object Store, to reduce the extra latency currently created by the requirement to move the data into the internal distributed filesystem prior to processing. The interface to Kafka is already under development.

One of the challenges of this project, is to address how an external key-value store interacts with a distributed thor query so that the external datastore acts like a distributed file read by each node in the thor or where only a portion of a result is written. This is currently something the HPCC Systems developers are looking at and are actively discussing but have not resolved.

The HPCC platform supports hooks to add additional file formats, such as reading directly from archives or from git repositories, and these may be used as the basis for the S3 support.

Additional languages are added to the system via a “plugin” system, and one of the existing plugins such as MySQL (available here), or Python (available here). Use these as examples of the sort of work required. Each completed plugin is considered to be a new feature addition to the HPCC Platform.

Completion of this project involves:

  • Investigating the API for calling the target language from C/C++.
  • Creating a simple wrapper for scalar values between the ECL embed API and the target language API using one of the existing embed plugin implementations as an example.
  • Extending the simple wrapper to handle structured data.
  • In parallel with the above, developing test cases for the plugin that include coverage of all data types both passed in and returned, including multi-threaded access from the ECL side. This includes testing the performance and throughput of the system for some examples that approximate to real-world usage.

By the GSoC mid term review we would expect you to have implemented a simple example that passes and returns scalar values (which are usually much simpler than passing/returning structures).

Mentor

Richard Chapman
Contact details: Richard.Chapman@lexisnexis.com

Backup Mentor: Jamie Noss
Contact Details: James.Noss@lexisnexis.com

Skills needed
  • Ability to code in C++.
  • Ability to build and test the HPCC system (guidance will be provided).
  • Knowledge of the target system being integrated would be helpful,at least sufficient to write test cases and test them.
  • Ability to write test code. Knowledge of ECL is not a requirement since it should be possible to re-use existing code with minimal changes for this purpose. Links are provided below to our ECL training documentation and online courses should you wish to become familiar with the ECL  language.
Deliverables

Midterm

  • A simple example that passes and returns scalar values.

End of project

  • A plugin that supports interfacing to the target language from ECL, that will implement the ECL embedded language API and make calls to the language being embedded via its C/C++ api (assuming it has one!).
  • Test cases demonstrating the correct behaviour and performance of the plugin.
  • Documentation of how datatypes and structures in ECL are mapped to the target language.
Other resources
  • No labels