Execute multiple workflow items in parallel
This project was completed during the 2020 HPCC Systems Intern Program by Nathan Halliday.
Find out about the HPCC Systems Summer Internship Program.
Resources Available to Learn More about this completed project:
Project Description
Currently the workflow engine executes a single item at a time, and waits for that workflow item to complete before continuing. Some jobs would benefit significantly if separate persists or independent actions were executed in parallel. The workflow information already contains all the dependencies, and information about items that need to be executed sequentially. It may be sensible to initially only support this for roxie/thor, and then revisit hthor.
This possibly becomes even more significant on cloud environments since it would be quite possible to spin up extra thors on demand, so allow multiple graphs to be processed in parallel.
If you are interested in this project, please contact Gavin Halliday.
Completion of this project involves:
Restructure the workflow engine to create a graph of tasks that can be use to track which tasks have been executed, and which tasks should be executed next.
Ensure that there are no multi threading issues in the workflow engine (e.g. the way persist information is calculated).
Check eclagent for any multi threading issues
By the mid term review we would expect you to have:
Mentor | Gavin Halliday Backup Mentor: TBC |
Skills needed |
|
Deliverables | End of project
|
Other resources |
|
All pages in this wiki are subject to our site usage guidelines.