Update and improve the generation of package files for HPCC Systems builds

This project is available as a student work experience opportunity with HPCC Systems. Curious about other projects we are offering? Take a look at our Ideas List.

Student work experience opportunities also exist for students who want to suggest their own project idea. Project suggestions must be relevant to HPCC Systems and of benefit to our open source community. 

Find out about the HPCC Systems Summer Internship Program.

Project Description

The HPCC Systems platform sources are hosted on GitHub and historically Jenkins has been used to automate building and packaging of the platform as part of a continuous integration and delivery (CI/CD) workflow. However, GitHub Actions now makes it easier to automate how you build, test, and deploy your GitHub projects. In order to migrate the HPCC Systems CI/CD workflow from Jenkins to GitHub Actions, a web service that 1) receives a specific version of the HPCC Systems platform, 2) parses the different package files associated to this specific version stored in a staging location, and 3) returns several metadata in JSON format about these packages, needs to be replaced by a bash script that provides similar functionality. Below is a screenshot from part of the JSON file returned by a call to this web service.

The objective of this project is to develop an alternative and improved solution to this web service using bash scripts so it can be supported in a GitHub Actions workflow.

To see how current implement works reference https://github.com/xwang2713/HPCC_Build_Staging/

Specially this file:

https://github.com/xwang2713/HPCC_Build_Staging/blob/master/BuildStaging/stagingconfig.py

The build package file pattern: r'(?:(\d{1,2})\.)(?:(\d{1,2})\.)(\d{1,})(\-|\.|\~)(\d{1,2}|rc|closedown|beta|alpha|trunk)(\d{1,2})?'

The Github Release examples: https://github.com/hpcc-systems/HPCC-Platform/releases/tag/community_9.4.28-1

Original file pattern: a tree list file,tree-CE-9.4.20-bin,  is provided in https://github.com/xwang2713/HPCC_Build_Staging/. This file list CE-Candidate-9.4.20/bin which contains two builds: 1) A candidate build: 9.4.20-rc1 2) A gold build: 9.4.20-1

To accomplish this, the following requirements must be considered:

  • The script needs to be capable of reading the released file names in GitHub using regular expressions.

  • The script needs to be capable of parsing the files names stored in a staging location and output metadata such as specific file names and OS distributions.

  • The script could be invoked either via command line or API (to be decided).

  • JSON is the suggested output format but improved alternatives are welcome.

  • The script should be automated to run whenever the packages are updated.

  • Packages can be stored either on AWS S3 buckets or Azure Storage Accounts.

  • Recommend to implement as Docker Image, i.e. to include everything into a single Docker container which will be easy to deploy and maintain

Completion of this project involves:

  • Learning of general HPCC Systems build process

  • Learning CI tools like Jenkins and GitHub Actions

  • Development of a new and improved solution for the generation of package files

  • Creation of documentation including a GitHub repository for the project

  • A blog, a recorded presentation, and a poster artifact about your project (see examples from previous years here).

Mentor

Michael Gardner
Michael.Gardner@lexisnexisrisk.com 

Backup Mentor: 

Ming Wang
Xiaoming.Wang@lexisnexisrisk.com 

Skills needed 
  • General knowledge of Linux and web services development

  • Self-motivated to learn about new technologies, such as HPCC Systems, Jenkins, GitHub Actions and Azure.

  • Basic programming skills such as Unix shell (bash), WSDL, Python, etc

  • Docker Image Build

Important resources

All pages in this wiki are subject to our site usage guidelines.